Linear Regression Models Statistics GR5205/GU4205 — Fall 2020 Homework 1 The following problems are due on Monday, September 21, 11:59pm. 1. (Problems 1.20 and 1.24 in KNN) The Tri-City Office...

1 answer below »
look at file


Linear Regression Models Statistics GR5205/GU4205 — Fall 2020 Homework 1 The following problems are due on Monday, September 21, 11:59pm. 1. (Problems 1.20 and 1.24 in KNN) The Tri-City Office Equipment Corporation sells an im- ported copier on a franchise basis and performs preventive maintenance and repair service on this copier. The data in copier_maintainenance.txt were collected from 45 recent calls on users to perform routine preventive maintenance service; for the ith call let xi denote the number of copiers serviced and yi the total number of minutes spent by the service person, for i = 1, 2, . . . , n = 45. (a) Plot the data and overlay a lowess smoother. Does it seem that the simple linear regres- sion model yi = β0 + β1xi + εi is appropriate? Explain. (b) Obtain the least squares estimated linear regression function, and overlay it on a scat- terplot of the data. How well does the estimated regression function fit the data? (c) Interpret b1 in your estimated regression function. (d) Interpret b0 in your estimated regression function. Does b0 provide any relevant infor- mation here? Explain. (e) Obtain a point estimate of the mean service time for calls on which x = 5 copiers are serviced. (f) Obtain a point prediction for the service time of a single call on which x = 5 copiers are to be serviced. (g) Obtain the residuals ei = yi − (b0 + b1xi) and confirm that they sum to zero. Explain the relation between the sum of squared residuals and the quantity Q = n∑ i=1 (yi − β0 − β1xi)2 . (h) Obtain point estimates of σ2 = var(εi) and σ. In what units is σ expressed? 1 2. (Problem 1.45 in KNN) The primary objective of the Study on the Efficacy of Nosocomial Infection Control (SENIC Project) was to determine whether infection surveillance and control programs have reduced the rates of nosocomial (hospital-acquired) infection in United States hospitals. The data set in SENIC.txt consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed. Each line of the data set has an identification number ID and provides information on 11 other variables for a single hospital. The average length of a stay in a hospital (Stay) is anticipated to be related to infection risk Risk, available facilities and services AFS, and routine chest X-ray ratio Xray. (See Appendix C.1 for details on these and the other variables included in the data set.) (a) Obtain scatterplots of average length of stay against each of the three predictor variables, and overlay lowess smoothers. Does a linear mean function seem plausible in each case? Explain. (b) Obtain the least squares estimate for the linear regression of average length of stay on each of the three predictor variables, and overlay the least squares lines on your scatter- plots. Does the simple linear regression model seem plausible in each case? Explain. (c) Calculate MSE for each of the three linear regression fits. Which predictor variable leads to the smallest variability around the fitted regression line? Was this result apparent from your plots in parts (a) and (b)? Explain. 2 3. Properties of the Least Squares Estimation Prove the following properties of the least squares estimated regression function ŷ = b0 + b1x where b1 = ∑n i=1(xi − x̄)(yi − ȳ)∑n i=1(xi − x̄)2 and b0 = ȳ − b1x̄ . (a) The sum of the residuals ei = yi − (b0 + b1xi) is zero: n∑ i=1 ei = 0 (b) The sum of the observed values yi equals the sum of the fitted values ŷi = b0 + b1xi: n∑ i=1 yi = n∑ i=1 ŷi (c) The sum of the weighted residuals, weighted by the values of the predictor variable, is zero: n∑ i=1 xiei = 0 (d) The sum of the weighted residuals, weighted by the fitted values, is zero: n∑ i=1 ŷiei = 0 (e) The least squares regression line always passes through the point (x̄, ȳ). 3 4. Conditional Expectation as Minimum Mean Squared Error Estimator (from lecture notes of STAT 901 at the University of Waterloo, by Prof. Don McLeish) Let (X,Y ) ∼ p(x, y). Suppose E[X2 + Y 2] <∞. (a) what constant is consider to be the best fit to a random variable in the sense of smallest mean squared error? in other words, what is the value of c solving min c e [ (y − c)2 ] . (b) show that for any function g, e [ (y − e [y |x])2 ] ≤ e [ (y − g(x))2 ] . hint. consider var[y |x = x] = e [ (y − e [y |x])2 |x = x ] . 4 minutes copiers 20 2 60 4 46 3 41 2 12 1 137 10 68 5 89 5 4 1 32 2 144 9 156 10 93 6 36 3 72 4 100 8 105 7 131 8 127 10 57 4 66 5 101 7 109 7 74 5 134 9 112 7 18 2 73 5 111 7 96 6 123 8 90 5 20 2 28 2 3 1 57 4 86 5 132 9 112 7 27 1 131 9 34 2 27 2 61 4 77 5 id stay age risk cult xray beds ms reg cen nurses afs 1 7.13 55.7 4.1 9.0 39.6 279 2 4 207 241 60.0 2 8.82 58.2 1.6 3.8 51.7 80 2 2 51 52 40.0 3 8.34 56.9 2.7 8.1 74.0 107 2 3 82 54 20.0 4 8.95 53.7 5.6 18.9 122.8 147 2 4 53 148 40.0 5 11.20 56.5 5.7 34.5 88.9 180 2 1 134 151 40.0 6 9.76 50.9 5.1 21.9 97.0 150 2 2 147 106 40.0 7 9.68 57.8 4.6 16.7 79.0 186 2 3 151 129 40.0 8 11.18 45.7 5.4 60.5 85.8 640 1 2 399 360 60.0 9 8.67 48.2 4.3 24.4 90.8 182 2 3 130 118 40.0 10 8.84 56.3 6.3 29.6 82.6 85 2 1 59 66 40.0 11 11.07 53.2 4.9 28.5 122.0 768 1 1 591 656 80.0 12 8.30 57.2 4.3 6.8 83.8 167 2 3 105 59 40.0 13 12.78 56.8 7.7 46.0 116.9 322 1 1 252 349 57.1 14 7.58 56.7 3.7 20.8 88.0 97 2 2 59 79 37.1 15 9.00 56.3 4.2 14.6 76.4 72 2 3 61 38 17.1 16 11.08 50.2 5.5 18.6 63.6 387 2 3 326 405 57.1 17 8.28 48.1 4.5 26.0 101.8 108 2 4 84 73 37.1 18 11.62 53.9 6.4 25.5 99.2 133 2 1 113 101 37.1 19 9.06 52.8 4.2 6.9 75.9 134 2 2 103 125 37.1 20 9.35 53.8 4.1 15.9 80.9 833 2 3 547 519 77.1 21 7.53 42.0 4.2 23.1 98.9 95 2 4 47 49 17.1 22 10.24 49.0 4.8 36.3 112.6 195 2 2 163 170 (a)="" what="" constant="" is="" consider="" to="" be="" the="" best="" fit="" to="" a="" random="" variable="" in="" the="" sense="" of="" smallest="" mean="" squared="" error?="" in="" other="" words,="" what="" is="" the="" value="" of="" c="" solving="" min="" c="" e="" [="" (y="" −="" c)2="" ]="" .="" (b)="" show="" that="" for="" any="" function="" g,="" e="" [="" (y="" −="" e="" [y="" |x])2="" ]="" ≤="" e="" [="" (y="" −="" g(x))2="" ]="" .="" hint.="" consider="" var[y="" |x="x]" =="" e="" [="" (y="" −="" e="" [y="" |x])2="" |x="x" ]="" .="" 4="" minutes="" copiers="" 20="" 2="" 60="" 4="" 46="" 3="" 41="" 2="" 12="" 1="" 137="" 10="" 68="" 5="" 89="" 5="" 4="" 1="" 32="" 2="" 144="" 9="" 156="" 10="" 93="" 6="" 36="" 3="" 72="" 4="" 100="" 8="" 105="" 7="" 131="" 8="" 127="" 10="" 57="" 4="" 66="" 5="" 101="" 7="" 109="" 7="" 74="" 5="" 134="" 9="" 112="" 7="" 18="" 2="" 73="" 5="" 111="" 7="" 96="" 6="" 123="" 8="" 90="" 5="" 20="" 2="" 28="" 2="" 3="" 1="" 57="" 4="" 86="" 5="" 132="" 9="" 112="" 7="" 27="" 1="" 131="" 9="" 34="" 2="" 27="" 2="" 61="" 4="" 77="" 5="" id="" stay="" age="" risk="" cult="" xray="" beds="" ms="" reg="" cen="" nurses="" afs="" 1="" 7.13="" 55.7="" 4.1="" 9.0="" 39.6="" 279="" 2="" 4="" 207="" 241="" 60.0="" 2="" 8.82="" 58.2="" 1.6="" 3.8="" 51.7="" 80="" 2="" 2="" 51="" 52="" 40.0="" 3="" 8.34="" 56.9="" 2.7="" 8.1="" 74.0="" 107="" 2="" 3="" 82="" 54="" 20.0="" 4="" 8.95="" 53.7="" 5.6="" 18.9="" 122.8="" 147="" 2="" 4="" 53="" 148="" 40.0="" 5="" 11.20="" 56.5="" 5.7="" 34.5="" 88.9="" 180="" 2="" 1="" 134="" 151="" 40.0="" 6="" 9.76="" 50.9="" 5.1="" 21.9="" 97.0="" 150="" 2="" 2="" 147="" 106="" 40.0="" 7="" 9.68="" 57.8="" 4.6="" 16.7="" 79.0="" 186="" 2="" 3="" 151="" 129="" 40.0="" 8="" 11.18="" 45.7="" 5.4="" 60.5="" 85.8="" 640="" 1="" 2="" 399="" 360="" 60.0="" 9="" 8.67="" 48.2="" 4.3="" 24.4="" 90.8="" 182="" 2="" 3="" 130="" 118="" 40.0="" 10="" 8.84="" 56.3="" 6.3="" 29.6="" 82.6="" 85="" 2="" 1="" 59="" 66="" 40.0="" 11="" 11.07="" 53.2="" 4.9="" 28.5="" 122.0="" 768="" 1="" 1="" 591="" 656="" 80.0="" 12="" 8.30="" 57.2="" 4.3="" 6.8="" 83.8="" 167="" 2="" 3="" 105="" 59="" 40.0="" 13="" 12.78="" 56.8="" 7.7="" 46.0="" 116.9="" 322="" 1="" 1="" 252="" 349="" 57.1="" 14="" 7.58="" 56.7="" 3.7="" 20.8="" 88.0="" 97="" 2="" 2="" 59="" 79="" 37.1="" 15="" 9.00="" 56.3="" 4.2="" 14.6="" 76.4="" 72="" 2="" 3="" 61="" 38="" 17.1="" 16="" 11.08="" 50.2="" 5.5="" 18.6="" 63.6="" 387="" 2="" 3="" 326="" 405="" 57.1="" 17="" 8.28="" 48.1="" 4.5="" 26.0="" 101.8="" 108="" 2="" 4="" 84="" 73="" 37.1="" 18="" 11.62="" 53.9="" 6.4="" 25.5="" 99.2="" 133="" 2="" 1="" 113="" 101="" 37.1="" 19="" 9.06="" 52.8="" 4.2="" 6.9="" 75.9="" 134="" 2="" 2="" 103="" 125="" 37.1="" 20="" 9.35="" 53.8="" 4.1="" 15.9="" 80.9="" 833="" 2="" 3="" 547="" 519="" 77.1="" 21="" 7.53="" 42.0="" 4.2="" 23.1="" 98.9="" 95="" 2="" 4="" 47="" 49="" 17.1="" 22="" 10.24="" 49.0="" 4.8="" 36.3="" 112.6="" 195="" 2="" 2="" 163="">
Answered Same DaySep 20, 2021

Answer To: Linear Regression Models Statistics GR5205/GU4205 — Fall 2020 Homework 1 The following problems are...

Rajeswari answered on Sep 22 2021
140 Votes
Least squares assignment
Hence proved.
Note: Only symbols I used a0 and a1 instead of b0 and b1.
Because normally we use a.
b) To prove that The sum of the observed values yi equals the sum of the fitted values ˆyi = b0 + b1xi
We know that (x bar, y bar) satisfies the regression equation without error. In other words always in a
Regression line the means ordered pair will lie on the line.
Or
Here error e would be 0.
Using this we can prove the give statement.
We find that last term sum of residuals is 0 as per proof of a.
So we have now
Thus right side equals left side because (x...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here