Microsoft Word - midterm-spring2021.docx 3 Problem 1 For the following 15 questions, answer True or False and explain your answer: a) Suppose we have a new value of X, call is Xnew, and we want to...

1 answer below »
please see the file for more description.


Microsoft Word - midterm-spring2021.docx 3 Problem 1 For the following 15 questions, answer True or False and explain your answer: a) Suppose we have a new value of X, call is Xnew, and we want to predict the corresponding Ynew using a regression model. Prediction uncertainty for Ynew does not depend on the value of Xnew. b) If assumptions are met, least squares residuals are correlated with the fitted values. c) A confidence interval for β1 is centered at β1. d) It is possible to reject a null hypothesis when the null hypothesis is true. e) Least squares estimates of the regression coefficients b0, and b1 are chosen to maximize R2. f) Uncertainty about the regression coefficients depends upon the variance of the residuals. g) The R-sq for a regression of Y onto X is the same as R-sq for the regression of X onto Y. h) Coefficients b0 and b1 from a simple linear regression model are independent. i) If the least squares estimates are large, we can be certain that the t-ratio printed out by R will be large as well. j) A leverage point may have a small residual. k) SSE increases as you add additional explanatory variables to the model. l) A leverage point may have a small residual. m) In simple linear regression, the slope of the regression line is related to the correlation between X and Y. n) The correlation between Y and Ŷ is one. o) Our linear regression model implies an error variance that is the same for all values of the explanatory variable. Problem 2 Assume the assumptions of linear regression hold and we fit a least squares line to our data as shown below: Suppose we then add one additional observation to our data set at the location (x̄, ȳ). For each of b0, b1, sb0 and sb1 discuss whether the value will go up, down, stay the same or can’t tell once this new data point is added. Problem 3 The following data were collected on the height (inches) and weight (pounds) of women swimmers. Height 68 64 62 65 66 Weight 132 108 102 I 15 128 a. Develop a scatter diagram for these data with height as the independent variable. What does the scatter diagram indicate about the relationship between the two variables? b. Develop the estimated regression equation and comment on the results. c. Are all assumptions for simple linear regression satisfied? d. If a swimmer's height is 63 inches, what would you estimate her weight to be? What are the associated 99% confidence and prediction intervals? 4 Problem 4 In an attempt to predict adult heights, researchers randomly selected men and collected their heights at age 2 and their adult heights, and then computed a least squares regression equation, adult height = 2.1 * height at age 2 + 3.5. A 90% confidence interval for the slope was given as (1.9, 2.3). Give an easy to understand interpretation of this confidence interval. Problem 5 Jensen Tire & Auto is in the process of deciding whether to purchase a maintenance con- tract for its new computer wheel alignment and balancing machine. Managers feel that maintenance expense should be related to usage, and they collected the following information on weekly usage (hours) and annual maintenance expense (in hundreds of dollars). Weekly Usage (hours) Annual Maintenance Expense 13 17.0 10 22.0 20 30.0 28 37.0 32 47.0 17 30.5 24 32.5 31 39.0 40 51.5 38 40.0 a. Develop the estimated regression equation that relates annual maintenance expense to weekly usage. b. Test the significance of the relationship in part (a) at a 0.05 level of significance. c. Jensen expects to use the new machine 30 hours per week. Develop a 95% prediction interval for the company's annual maintenance expense. d. If the maintenance contract costs $3000 per year, would you recommend purchasing it? Why or why not? Problem 6 One of the most common questions of prospective house buyers pertains to the cost of heating in dollars (Y). To provide its customers with information on that matter, a large real estate firm used the daily minimum outside temperature in degrees of Fahrenheit (X1) to predict heating costs. Given below is the output of a regression model. Interpret the slope on the Temperature variable. Problem 7 Suppose you estimate a simple linear regression model and obtain a t-value for the slope coefficient of -3.1. Based on this information, explain whether each of the following statements are correct or incorrect: a) A 95% confidence interval for the true slope would exclude 0. b) It is possible that the point estimate for the slope is b1 = 4. c) At the 10% level of significance, you fail to reject the null hypothesis that the true slope is equal to 0. d) The probability that the true slope is negative is greater than the probability that the true slope is positive. 5 Problem 8 For each of the following plots, separately, describe (in one sentence) what you think is problematic with each (potential) linear regression. Problem 9 Calculate volumes (volume) and page areas (area) for the books on which information is given in the data frame oddbooks (DAAG). (a) Plot log(weight) against log(volume), fit a regression line and provide summary of the regression analysis. (b) Plot log(weight) against log(area), again fit a regression line and provide summary of the regression analysis. (c) Which of the lines (a) and (b) gives the better fit? (d) Repeat (a) and (b), now with log(density) in place of log(weight) as the dependent variable. Comment on how results from these regressions may help explain the results obtained in (a) and (b). Problem 10 In this problem we will investigate the t-statistic for the null hypothesis H0 : β = 0 in simple linear regression without an intercept. To begin, we generate a predictor x and a response y using R codes given below: set.seed(5) x <- rnorm(100)="" y=""><- 2*x + rnorm(100) (a) using the command lm(y~x+0), perform a simple linear regression of y onto x without an intercept. report the coefficient estimate b1, the standard error of b1, the t-statistic and its’ associated p-value. assuming a null hypothesis of h0 : β = 0, what is the alternative hypothesis and what conclusions can you draw? (b) now perform a simple linear regression of x onto y without an intercept, and report the coefficient estimate, its standard error, and the corresponding t-statistic and p-values associated with the null hypothesis h0 : β = 0. comment on these results. (c) what is the relationship between the results obtained in (a) and (b)? (d) now perform a simple linear regression with an intercept, first with y onto x, and then again with x onto y and compare the t-statistic associated with h0 : β1 = 0 for both models. 6 problem 11 in this problem you will create some simulated data and will fit simple linear regression models to it. make sure to use set.seed(1) prior to starting part (a) to ensure consistent results. (a) using the rnorm() function, create a vector, x, containing 100 observations drawn from a n(0, 1) distribution. this represents a feature, x. using the rnorm() function, create a vector, eps, containing 100 observations drawn from a n(0, 0.25) distribution i.e. a normal distribution with mean zero and variance 0.25. (b) using x and eps, generate a vector y according to the model: y = −1 + 0.5x + e what is the length of the vector y? what are the values of β0 and β1 in this linear model? (c) create a scatterplot displaying the relationship between x and y. comment on what you observe. (d) fit a least squares linear model to predict y using x. comment on the model obtained. how do b0 and b1 compare to β0 and β1? state null and alternate hypothesis and report your conclusions. (e) now fit a polynomial regression model that predicts y using x and x2. is there evidence that the quadratic term improves the model fit? explain your answer. (f) repeat (a)–(e) after modifying the data generation process in such a way that there is less noise in the data. the model in (c) should remain the same. you can do this by decreasing the variance of the normal distribution used to generate the error term e in (c). describe your results. (g) repeat (a)–(e) after modifying the data generation process in such a way that there is more noise in the data. the model in (c) should remain the same. you can do this by increasing the variance of the normal distribution used to generate the error term e in (c). describe your results. (h) what are the confidence intervals for β0 and β1 based on the original data set, the noisier data set, and the less noisy data set? comment on your results. 2*x="" +="" rnorm(100)="" (a)="" using="" the="" command="" lm(y~x+0),="" perform="" a="" simple="" linear="" regression="" of="" y="" onto="" x="" without="" an="" intercept.="" report="" the="" coefficient="" estimate="" b1,="" the="" standard="" error="" of="" b1,="" the="" t-statistic="" and="" its’="" associated="" p-value.="" assuming="" a="" null="" hypothesis="" of="" h0="" :="" β="0," what="" is="" the="" alternative="" hypothesis="" and="" what="" conclusions="" can="" you="" draw?="" (b)="" now="" perform="" a="" simple="" linear="" regression="" of="" x="" onto="" y="" without="" an="" intercept,="" and="" report="" the="" coefficient="" estimate,="" its="" standard="" error,="" and="" the="" corresponding="" t-statistic="" and="" p-values="" associated="" with="" the="" null="" hypothesis="" h0="" :="" β="0." comment="" on="" these="" results.="" (c)="" what="" is="" the="" relationship="" between="" the="" results="" obtained="" in="" (a)="" and="" (b)?="" (d)="" now="" perform="" a="" simple="" linear="" regression="" with="" an="" intercept,="" first="" with="" y="" onto="" x,="" and="" then="" again="" with="" x="" onto="" y="" and="" compare="" the="" t-statistic="" associated="" with="" h0="" :="" β1="0" for="" both="" models.="" 6="" problem="" 11="" in="" this="" problem="" you="" will="" create="" some="" simulated="" data="" and="" will="" fit="" simple="" linear="" regression="" models="" to="" it.="" make="" sure="" to="" use="" set.seed(1)="" prior="" to="" starting="" part="" (a)="" to="" ensure="" consistent="" results.="" (a)="" using="" the="" rnorm()="" function,="" create="" a="" vector,="" x,="" containing="" 100="" observations="" drawn="" from="" a="" n(0,="" 1)="" distribution.="" this="" represents="" a="" feature,="" x.="" using="" the="" rnorm()="" function,="" create="" a="" vector,="" eps,="" containing="" 100="" observations="" drawn="" from="" a="" n(0,="" 0.25)="" distribution="" i.e.="" a="" normal="" distribution="" with="" mean="" zero="" and="" variance="" 0.25.="" (b)="" using="" x="" and="" eps,="" generate="" a="" vector="" y="" according="" to="" the="" model:="" y="−1" +="" 0.5x="" +="" e="" what="" is="" the="" length="" of="" the="" vector="" y?="" what="" are="" the="" values="" of="" β0="" and="" β1="" in="" this="" linear="" model?="" (c)="" create="" a="" scatterplot="" displaying="" the="" relationship="" between="" x="" and="" y.="" comment="" on="" what="" you="" observe.="" (d)="" fit="" a="" least="" squares="" linear="" model="" to="" predict="" y="" using="" x.="" comment="" on="" the="" model="" obtained.="" how="" do="" b0="" and="" b1="" compare="" to="" β0="" and="" β1?="" state="" null="" and="" alternate="" hypothesis="" and="" report="" your="" conclusions.="" (e)="" now="" fit="" a="" polynomial="" regression="" model="" that="" predicts="" y="" using="" x="" and="" x2.="" is="" there="" evidence="" that="" the="" quadratic="" term="" improves="" the="" model="" fit?="" explain="" your="" answer.="" (f)="" repeat="" (a)–(e)="" after="" modifying="" the="" data="" generation="" process="" in="" such="" a="" way="" that="" there="" is="" less="" noise="" in="" the="" data.="" the="" model="" in="" (c)="" should="" remain="" the="" same.="" you="" can="" do="" this="" by="" decreasing="" the="" variance="" of="" the="" normal="" distribution="" used="" to="" generate="" the="" error="" term="" e="" in="" (c).="" describe="" your="" results.="" (g)="" repeat="" (a)–(e)="" after="" modifying="" the="" data="" generation="" process="" in="" such="" a="" way="" that="" there="" is="" more="" noise="" in="" the="" data.="" the="" model="" in="" (c)="" should="" remain="" the="" same.="" you="" can="" do="" this="" by="" increasing="" the="" variance="" of="" the="" normal="" distribution="" used="" to="" generate="" the="" error="" term="" e="" in="" (c).="" describe="" your="" results.="" (h)="" what="" are="" the="" confidence="" intervals="" for="" β0="" and="" β1="" based="" on="" the="" original="" data="" set,="" the="" noisier="" data="" set,="" and="" the="" less="" noisy="" data="" set?="" comment="" on="" your="">
Answered Same DayMar 05, 2021

Answer To: Microsoft Word - midterm-spring2021.docx 3 Problem 1 For the following 15 questions, answer True or...

Naveen answered on Mar 06 2021
130 Votes
# Problem 1
#1).
#a).Ans : False
# Explanation : The Xnew variable only explain the linear relationship between Xnew and ynew.
#b).Ans : False
# Explanation : The residuals need to be distibuted normally and that is the difference between actual and predicted va
lue so there not present correlation.
#c).Ans : False
# Explanation : The beta1 value will be present in the confidence interval but it need not be center value.
#d).Ans : True
# Explanation : It dependents on the resercher that is also called as Type-1 error.
#e).Ans : True
# Explanation : Because of those are the weight of the variable and intercept which needs to be appropriate values to get the good model.
#f).Ans : True
# Explanation : The variance is very high in the residuals then its will the uncertainity will happen.
#g).Ans : True
# Explanation : The variance explained by the both the variable on both is same.
#h).Ans : True
# Explanation : They won't be dependent because the model all generate the appropriate weith for the variable and the intercept is constant.
#i).Ans : True
# Explanation : If the co-efficient value increase then there will be difference in the standard erros so t ratio also will be increase.
#j).Ans : True
# Explanation : The leverage value increase then the difference between actual and predict value will be decrease.
#k).Ans : False
# Explanation : By increasing the independent variables in the model R square value increase which means SSE will decrese.
#l).Ans : True
# Explanation : The leverage value increase then the difference between actual and predict value will be decrease.
#m).Ans : True
# Explanation : The correlation is negative the the slope will be the negative vice versa.
#n).Ans : False
# Explanation : The correlation between Y and ?? will be present but not be one.
#o).Ans : True
# Explanation : The erros variance will be the same for all independet variables.
# Problem 2
# Ans: If we add the observations to our existing data all the values b0, b1, sb0 and sb1 will change.
# The intercept will be increase and the coefficient will decrease because we have more data than the previous one.
# But we can't say the values of sb0 and sb1 they may be the same as previous values.
# Problem 3:
#a).
# crating data
Height <- c(68,64,62,65,66)
Weight <- c(132,108,102,115,128)
# Making scatter plot
plot(Height,Weight)
# From the above scatter plot we can say that increasing in Height will increase the Weight.
#b).
# Regression model
reg<-lm(Weight~Height)
# printing the model
reg
# From the above output the model is Weight = -240.5 + 5.5(Height).
# which means one unit increase of the Height there is chance of increase...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here