See assignment attachment. Use R data frame attachments.Problem 1. The data frame Dryers found in...

Question

See assignment attachment. Use R data frame attachments.

Problem 1. The data frame Dryers found in Dryers.RData has 36 rows (observations) and 3 columns (variables). The data come from an experiment in which various types of clothing were dried in various types of dryers. The variables are the following: Clothing One of three types of clothing: Towels, Jeans, or Thermal Clothing Dryer One of four types of dryers: Electric, BD (bi-directional) Electric, Town Gas, or LPD kWh.kg Energy effectiveness measured in kilowatt hours per kilogram of clothing dried For each of the 12 settings of Clothing and Dryer there are 3 replications, giving a total sample size of 36. (a) Fit a two-way ANOVA model where kWh.kg is the response variable and Clothing and Dryer are factors. Conduct the sequence of hypothesis tests that examines the main effects and their interactions using a test level of α = .05. (b) State the assumptions of the classical ANOVA model. Provide and interpret a set of diagnostic plots that address their validity. (c) Assuming that the model you fit in (a) is valid, conduct multiple comparisons of dryers to identify a single dryer or group of dryers that is the most energy efficient. (d) Find the “best” Box-Cox transformation of kWh.kg and redo (a) with this transformation. Does the transformed model have better diagnostics than the original model? Problem 2. The data set Mutual.RData contains two objects, mu and V, that are the mean vector and covariance matrix of annual rates of return of five investment funds. Here is mu: SP500 HighTech SmallCap USTreas CorpBond 0.06 0.10 0.08 0.02 0.04 This says, for example, that the annual mean return on investment for the US Treasuries fund is 2 percent. Assume that the rates of return are jointly distributed as multivariate normal. Suppose that Tom and Ellen each invest $1000 at the beginning of the year. Tom puts $500 in the High Tech fund and $500 in the Small Cap fund. Ellen puts $200 in each of the five funds. What is the probability that Ellen will have made more money than Tom at the end of the year? Problem 3. The data frame BCMort88 found in the file BCMort88.RData gives breast cancer mortality rates for 217 counties in nine states (six New England states plus NY, NJ, and PA) for the year 1988. The rates are adjusted to account for differing demographic characteristics across the counties. Here is a data description: Variables: Pop = Population of county AdjRate = Adjusted mortality rate (per 100,000) SE = Estimated standard error (per 100,000) We want to identify counties that have mortality rates that are significantly more than 18 per 100,000 population, in order to devote resources to them. For this problem assume that the adjusted rates are unbiased and normally distributed although the latter clearly is not the case for smaller counties. Limit your analysis to those counties that have a population of at least 20,000. Use a multiple testing method to identify those counties that should receive resources according to the criterion stated above and explain why you chose it over alternative methods. Problem 4. The data frame MTVR found in the file MTVR.RData gives information on n = 112 USMC Medium Terrain Vehicle Replacements (MTVRs) at Camp Lejeune, NC. Here is a data description: Data on n = 112 MTVRs taken from internal Caterpillar engine diagnostic readings. Variables are as follows: PTO Percent of time in Power Takeoff (PTO) mode Idle Percent of time in idle mode Miles Number of miles driven Load.factor Percent of max. available power used by the engine MPG Fuel efficiency (miles per gallon) Source: Penn State Applied Research Laboratory, 2013 (a) Use the pairs() command to identify the most obvious outlier: describe what it is and whether you believe it would be justified to delete this observation. (b) Fit a least-square regression model to predict MPG from the other variables both with and without the outlier included. Does the outlier have a large effect on the fitted model? (c) Produce a 95% lower confidence bound (LCB) for the true regression coefficient on Load.factor (i) before removing the outlier and (ii) after removing the outlier.

assignment-nbcw12sx.docx bcmort88-xift0qpw.rdata mtvr-rpz4nhjh.rdata mutual-3mybsycg.rdata

Mohd · Accepted Answer

Untitled
Untitled
-
11/20/2021
library(readr)
library(magrittr)
library(dplyr)
library(ggplot2)
library(rmarkdown)
library(MASS)
library(skimr)
library(ggeffects)
1. Fit a two-way ANOVA model where kWh.kg is the response variable and Clothing and Dryer are factors. Conduct the sequence of hypothesis tests that examines the main effects and their interactions using a test level of α = .05.
load("~/data/dryers.rdata")
head(Dryers)
##   Clothing       Dryer kWh.kg
## 1   Towels    Electric  1.157
## 2   Towels    Electric  1.189
## 3   Towels    Electric  1.190
## 4   Towels BD Electric  1.236
## 5   Towels BD Electric  1.244
## 6   Towels BD Electric  1.264
skim(Dryers)
Data summary
	Name
	Dryers
	Number of rows
	36
	Number of columns
	3
	_______________________
	
	Column type frequency:
	
	factor
	2
	numeric
	1
	________________________
	
	Group variables
	None
Variable type: factor
	skim_variable
	n_missing
	complete_rate
	ordered
	n_unique
	top_counts
	Clothing
	0
	1
	FALSE
	3
	Tow: 12, Jea: 12, The: 12
	Dryer
	0
	1
	FALSE
	4
	Ele: 9, BD : 9, Tow: 9, LPG: 9
Variable type: numeric
	skim_variable
	n_missing
	complete_rate
	mean
	sd
	p0
	p25
	p50
	p75
	p100
	hist
	kWh.kg
	0
	1
	1.62
	0.4
	1.16
	1.34
	1.49
	1.85
	2.5
	▇▅▁▂▂
mod_1aF)    
## Dryer           3  5.230  1.7432 263.397  %
  filter(Dryer==c("LPG","Town Gas"))
mod_1c1%
  filter(Dryer!=c("LPG","Town Gas"))
mod_1c2F)    
## Dryer           3  3.830  1.2766 167.501 2.45e-13 ***
## Clothing        2  0.151  0.0755   9.905  0.00126 ** 
## Dryer:Clothing  6  0.037  0.0061   0.804  0.57983    
## Residuals      18  0.137  0.0076                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1. Find the “best” Box-Cox transformation of kWh.kg and redo (a) with this transformation. Does the transformed model have better diagnostics than the original model?
#model |t|)
## (Intercept)                                0.14865    0.01360  10.929 8.44e-11
## DryerBD Electric                           0.04515    0.01924   2.347 0.027491
## DryerTown Gas                              0.35216    0.01924  18.308 1.32e-15
## DryerLPG                                   0.19194    0.01924   9.979 5.13e-10
## ClothingJeans                              0.14347    0.01924   7.459 1.07e-07
## ClothingThermal Clothing                   0.06935    0.01924   3.605 0.001419
## DryerBD Electric:ClothingJeans            -0.04476    0.02720  -1.645 0.112935
## DryerTown Gas:ClothingJeans               -0.12513    0.02720  -4.600 0.000115
## DryerLPG:ClothingJeans                    -0.10666    0.02720  -3.921 0.000643
## DryerBD Electric:ClothingThermal Clothing -0.01455    0.02720  -0.535 0.597755
## DryerTown Gas:ClothingThermal Clothing    -0.05246    0.02720  -1.929 0.065682
## DryerLPG:ClothingThermal Clothing         -0.07739    0.02720  -2.845 0.008941
##                                              
## (Intercept)                               ***
## DryerBD Electric                          *  
## DryerTown Gas                             ***
## DryerLPG                                  ***
## ClothingJeans                             ***
## ClothingThermal Clothing                  ** 
## DryerBD Electric:ClothingJeans               
## DryerTown Gas:ClothingJeans               ***
## DryerLPG:ClothingJeans                    ***
## DryerBD Electric:ClothingThermal Clothing    
## DryerTown Gas:ClothingThermal Clothing    .  
## DryerLPG:ClothingThermal Clothing         ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02356 on 24 degrees of freedom
## Multiple R-squared:  0.9754, Adjusted R-squared:  0.9641 
## F-statistic: 86.46 on 11 and 24 DF,  p-value: |t|)    
## (Intercept)  2.518e+00  3.820e-01   6.593 1.68e-09 ***
## PTO         -8.382e-03  4.125e-02  -0.203    0.839    
## Idle         4.724e-03  5.855e-03   0.807    0.422    
## Miles        5.452e-05  6.853e-06   7.956 1.98e-12 ***
## Load.factor  6.752e-04  5.225e-03   0.129    0.897    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3839 on 107 degrees of freedom
## Multiple R-squared:  0.4594, Adjusted R-squared:  0.4392 
## F-statistic: 22.73 on 4 and 107 DF,  p-value: 1.311e-13
Adjusted r square has increased significantly after removing outliers. As we know adjusted r square represents percentage variability explained by model.
summary(MTVR)
##       PTO              Idle           Miles        Load.factor    
##  Min.   :0.8669   Min.   :13.69   Min.   :   16   Min.   : 19.00  
##  1st Qu.:1.6089   1st Qu.:41.57   1st Qu.:10474   1st Qu.: 25.

Problem 1. The data frame Dryers found in Dryers.RData has 36 rows (observations) and 3 columns (variables). The data come from an experiment in which various types of clothing were dried in various...

Answer To: Problem 1. The data frame Dryers found in Dryers.RData has 36 rows (observations) and 3 columns...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment