1. The data set Xdata.RData contains a data frame, Xdata, that has n = 100 observations (rows) and...

Question

1. The data set Xdata.RData contains a data frame, Xdata , that has n = 100 observations (rows) and five variables named X1, X2, X3, X4 , and Y . This is a simulated data set generated from the...

1 answer below »

1. The data set
Xdata.RData
contains a data frame,
Xdata, that has n = 100 observations

(rows) and five variables named
X1, X2, X3, X4, and
Y. This is a simulated data set generated

from the following model:

The error term ε is a normal random variable with mean 0 and standard deviation σ. In the

simulation the following parameter values were used: β₀
= 0, β₁
= 1, β₂
= 1, β₃
= 1, β₄
= 1, and

σ = 0.5.

How successfully can the iterative process described above identify the model that generated the

data? To answer this question, ask what you should expect to see for the Box-Cox parameter λ,

and transformations of the predictor variables. Do you get something similar from the data?

Can you identify the model coefficients reasonably well?

2. Use the
Boston Housing
R Data(BHD0.RData)to give a 95% prediction interval for the median home value of a census tract that has the following characteristics: NOX = .65, RM = 5.5, AGE = 80, and LSTAT = .16. Use a logarithm transformation with MEDV, and assume that the predictors are correctly treated without using transformations.

3. For the
prostate
R data, fit a model with
lpsa
as the response and the other variables as predictors. Answer the following questions:

(a) Check for outliers.

(b) Check for influential points.

(c) Check the structure of the relationship between the predictors and the response.

4. Use the
fat
R data, fitting the model described in Section 4.2.

> data(fat,package="faraway")

> lmod

(a) Compute the condition numbers and variance inflation factors. Comment on the degree of collinearity observed in the data.

(b) Cases 39 and 42 are unusual. Refit the model without these two cases and recompute the collinearity diagnostics. Comment on the differences observed from the full data fit.

(c) Fit a model with
brozek
as the response and just
age,
weight
and
height
as predictors. Compute the collinearity diagnostics and compare to the full data fit.

(d) Compute a 95% prediction interval for
brozek
for the median values of
age,
weight
and
height.

(e) Compute a 95% prediction interval for
brozek
for
age=40,
weight=200 and
height=73. How does the interval compare to the previous prediction?

(f) Compute a 95% prediction interval for
brozek
for
age=40,
weight=130 and
height=73. Are the values of predictors unusual? Comment on how the interval compares to the previous two answers.

5. Ankylosing spondylitis is a chronic form of arthritis. A study was conducted to determine whether daily stretching of the hip tissues would improve mobility. The R data are found in
hips. The flexion angle of the hip before the study is a predictor and the flexion angle after the study is the response.

(a) Plot the data using different plotting symbols for the treatment and the control status.

(b) Fit a model to determine whether there is a treatment effect.

(c) Compute the difference between the flexion before and after and test whether this difference varies between treatment and control. Contrast this approach to your previous model.

problem-0envs35e.docx supporting-material-2-1tr1q2tn.docx xdata-dvwvibyn.rdata bhd0-4y1vf2pa.rdata prostate-data-no2zkngq.r

Answered 1 days AfterDec 07, 2021

Subhanbasha · Accepted Answer

Report
Question 1:
Ans:
The generated output data used that simulation process in the R to get the sample generated data by giving the values of coefficients of the predictors.
The model where the data is generated by is 
Y= sqrt(X1^2)+sqrt(X2^2)+sqrt(X3^2)+sqrt(X4^2)
The co efficient values are same as we round it into single digit value. So, that the values given by our model is approximately equal to the actual simulated data results. We can say that the results are near similar. And tried lot more models to identify the similar result model finally got the model with these values.
Question 2:
Ans:  we have the data in the file so used the load function to read the data into R. The code as follows
# Reading data
bhd |t|)    
(Intercept)  0.669337   1.296387   0.516  0.60693    
lcavol       0.587022   0.087920   6.677 2.11e-09 ***
lweight      0.454467   0.170012   2.673  0.00896 ** 
age         -0.019637   0.011173  -1.758  0.08229 .  
lbph         0.107054   0.058449   1.832  0.07040 .  
svi          0.766157   0.244309   3.136  0.00233 ** 
lcp         -0.105474   0.091013  -1.

1. The data set Xdata.RData contains a data frame, Xdata , that has n = 100 observations (rows) and five variables named X1, X2, X3, X4 , and Y . This is a simulated data set generated from the...

Answer To: 1. The data set Xdata.RData contains a data frame, Xdata , that has n = 100 observations (rows) and...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment