Problem 1. The data frame Dryers found in Dryers.RData has 36 rows (observations) and 3 columns (variables). The data come from an experiment in which various types of clothing were dried in various...

1 answer below »
See assignment attachment. Use R data frame attachments.


Problem 1. The data frame Dryers found in Dryers.RData has 36 rows (observations) and 3 columns (variables). The data come from an experiment in which various types of clothing were dried in various types of dryers. The variables are the following: Clothing One of three types of clothing: Towels, Jeans, or Thermal Clothing Dryer One of four types of dryers: Electric, BD (bi-directional) Electric, Town Gas, or LPD kWh.kg Energy effectiveness measured in kilowatt hours per kilogram of clothing dried For each of the 12 settings of Clothing and Dryer there are 3 replications, giving a total sample size of 36. (a) Fit a two-way ANOVA model where kWh.kg is the response variable and Clothing and Dryer are factors. Conduct the sequence of hypothesis tests that examines the main effects and their interactions using a test level of α = .05. (b) State the assumptions of the classical ANOVA model. Provide and interpret a set of diagnostic plots that address their validity. (c) Assuming that the model you fit in (a) is valid, conduct multiple comparisons of dryers to identify a single dryer or group of dryers that is the most energy efficient. (d) Find the “best” Box-Cox transformation of kWh.kg and redo (a) with this transformation. Does the transformed model have better diagnostics than the original model? Problem 2. The data set Mutual.RData contains two objects, mu and V, that are the mean vector and covariance matrix of annual rates of return of five investment funds. Here is mu: SP500 HighTech SmallCap USTreas CorpBond 0.06 0.10 0.08 0.02 0.04 This says, for example, that the annual mean return on investment for the US Treasuries fund is 2 percent. Assume that the rates of return are jointly distributed as multivariate normal. Suppose that Tom and Ellen each invest $1000 at the beginning of the year. Tom puts $500 in the High Tech fund and $500 in the Small Cap fund. Ellen puts $200 in each of the five funds. What is the probability that Ellen will have made more money than Tom at the end of the year? Problem 3. The data frame BCMort88 found in the file BCMort88.RData gives breast cancer mortality rates for 217 counties in nine states (six New England states plus NY, NJ, and PA) for the year 1988. The rates are adjusted to account for differing demographic characteristics across the counties. Here is a data description: Variables: Pop = Population of county AdjRate = Adjusted mortality rate (per 100,000) SE = Estimated standard error (per 100,000) We want to identify counties that have mortality rates that are significantly more than 18 per 100,000 population, in order to devote resources to them. For this problem assume that the adjusted rates are unbiased and normally distributed although the latter clearly is not the case for smaller counties. Limit your analysis to those counties that have a population of at least 20,000. Use a multiple testing method to identify those counties that should receive resources according to the criterion stated above and explain why you chose it over alternative methods. Problem 4. The data frame MTVR found in the file MTVR.RData gives information on n = 112 USMC Medium Terrain Vehicle Replacements (MTVRs) at Camp Lejeune, NC. Here is a data description: Data on n = 112 MTVRs taken from internal Caterpillar engine diagnostic readings. Variables are as follows: PTO Percent of time in Power Takeoff (PTO) mode Idle Percent of time in idle mode Miles Number of miles driven Load.factor Percent of max. available power used by the engine MPG Fuel efficiency (miles per gallon) Source: Penn State Applied Research Laboratory, 2013 (a) Use the pairs() command to identify the most obvious outlier: describe what it is and whether you believe it would be justified to delete this observation. (b) Fit a least-square regression model to predict MPG from the other variables both with and without the outlier included. Does the outlier have a large effect on the fitted model? (c) Produce a 95% lower confidence bound (LCB) for the true regression coefficient on Load.factor (i) before removing the outlier and (ii) after removing the outlier.
Answered 1 days AfterNov 19, 2021

Answer To: Problem 1. The data frame Dryers found in Dryers.RData has 36 rows (observations) and 3 columns...

Mohd answered on Nov 20 2021
113 Votes
Untitled
Untitled
-
11/20/2021
library(readr)
library(magrittr)
library(dplyr)
library(ggplot2)
library(rmarkdown)
library(MASS)
library(skimr)
library(ggeffects)
1. Fit a two-way ANOVA model where kWh.kg is the response variable and Clothing and Dryer are factors. Conduct the sequence of hypothesis tests that examines the main effects and their interactions using a test level of α = .05.
load("~/data/dryers.rdata")
head(Dryers)
## Clothing Dryer kWh.kg
## 1 Towels Electric 1.157
## 2 Towels Electric 1.189
## 3 Towels Electric 1.190
## 4 Towels BD Electric 1.236
## 5 Towels BD Electric 1.244
## 6 Towels BD Electric 1.264
skim(Dryers)
Data summary
    Name
    Dryers
    Number of rows
    36
    Number of columns
    3
    _______________________
    
    Column type frequency:
    
    factor
    2
    numeric
    1
    ________________________
    
    Group variables
    None
Variable type: factor
    skim_variable
    n_missing
    complete_rate
    ordered
    n_unique
    top_counts
    Clothing
    0
    1
    FALSE
    3
    Tow: 12, Jea: 12, The: 12
    Dryer
    0
    1
    FALSE
    4
    Ele: 9, BD : 9, Tow: 9, LPG: 9
Variable type: numeric
    skim_variable
    n_missing
    complete_rate
    mean
    sd
    p0
    p25
    p50
    p75
    p100
    hist
    kWh.kg
    0
    1
    1.62
    0.4
    1.16
    1.34
    1.49
    1.85
    2.5
    ▇▅▁▂▂
mod_1a<-aov(kWh.kg~Dryer*Clothing,data=Dryers)
summary(mod_1a)
## Df Sum Sq Mean Sq F value Pr(>F)
## Dryer 3 5.230 1.7432 263.397< 2e-16 ***
## Clothing 2 0.176 0.0882 13.332 0.000128 ***
## Dryer:Clothing 6 0.036 0.0060 0.912 0.502992
## Residuals 24 0.159 0.0066
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1. State the assumptions of the classical ANOVA model. Provide and interpret a set of diagnostic plots that address their validity. Independence
Normality of residuals
#define model residuals
resid <- mod_1a$residuals
#create histogram of residuals
hist(resid, main = "Histogram of Residuals", xlab = "Residuals", col = "steelblue")
Analyzing Treatment Differences
plot(TukeyHSD(mod_1a,conf.level = 0.95),las=2)
1. Assuming that the model you fit in (a) is valid, conduct multiple comparisons of dryers to identify a single dryer or group of dryers that is the energy efficient.
Dryers_1<-Dryers%>%
filter(Dryer==c("LPG","Town Gas"))
mod_1c1<-aov(kWh.kg~Dryer*Clothing,data=Dryers_1)
summary(mod_1c1)
## Df Sum Sq Mean Sq
## Dryer 1 0.6501 0.6501
## Clothing 2 0.0259 0.0130
## Dryer:Clothing 2 0.0148 0.0074
Dryers_2<-Dryers%>%
filter(Dryer!=c("LPG","Town Gas"))
mod_1c2<-aov(kWh.kg~Dryer*Clothing,data=Dryers_2)
summary(mod_1c2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Dryer 3 3.830 1.2766 167.501 2.45e-13 ***
## Clothing 2 0.151 0.0755 9.905 0.00126 **
## Dryer:Clothing 6 0.037 0.0061 0.804 0.57983
## Residuals 18 0.137 0.0076
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1. Find the “best” Box-Cox transformation of kWh.kg and redo (a) with this transformation. Does the transformed model have better diagnostics than the original model?
#model <- lm(kWh.kg~Dryer*Clothing,data=Dryers)
#find optimal lambda for Box-Cox transformation
bc <- boxcox(kWh.kg~Dryer*Clothing,data=Dryers)
(lambda <- bc$x[which.max(bc$y)])
## [1] -1.232323
#fit new linear regression model using the Box-Cox transformation
new_model <- lm(((kWh.kg^lambda-1)/lambda) ~ Dryer*Clothing,data=Dryers)
summary(new_model)
##
## Call:
## lm(formula = ((kWh.kg^lambda - 1)/lambda) ~ Dryer * Clothing,
## data = Dryers)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.048733 -0.008632 0.001447 0.009516 0.037945
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.14865 0.01360 10.929 8.44e-11
## DryerBD Electric 0.04515 0.01924 2.347 0.027491
## DryerTown Gas 0.35216 0.01924 18.308 1.32e-15
## DryerLPG 0.19194 0.01924 9.979 5.13e-10
## ClothingJeans 0.14347 0.01924 7.459 1.07e-07
## ClothingThermal Clothing 0.06935 0.01924 3.605 0.001419
## DryerBD Electric:ClothingJeans -0.04476 0.02720 -1.645 0.112935
## DryerTown Gas:ClothingJeans -0.12513 0.02720 -4.600 0.000115
## DryerLPG:ClothingJeans -0.10666 0.02720 -3.921 0.000643
## DryerBD Electric:ClothingThermal Clothing -0.01455 0.02720 -0.535 0.597755
## DryerTown Gas:ClothingThermal Clothing -0.05246 0.02720 -1.929 0.065682
## DryerLPG:ClothingThermal Clothing -0.07739 0.02720 -2.845 0.008941
##
## (Intercept) ***
## DryerBD Electric *
## DryerTown Gas ***
## DryerLPG ***
## ClothingJeans ***
## ClothingThermal Clothing **
## DryerBD Electric:ClothingJeans
## DryerTown Gas:ClothingJeans ***
## DryerLPG:ClothingJeans ***
## DryerBD Electric:ClothingThermal Clothing
## DryerTown Gas:ClothingThermal Clothing .
## DryerLPG:ClothingThermal Clothing **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02356 on 24 degrees of freedom
## Multiple R-squared: 0.9754, Adjusted R-squared: 0.9641
## F-statistic: 86.46 on 11 and 24 DF, p-value: < 2.2e-16
1. Use the pairs() command to identify the most obvious outlier: describe what it is and whether you believe it would be justified to delete this observation.
1. Fit a least-square regression model to predict MPG from the other variables both with and without the outlier included. Does the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here