Multiple linear regression Question 1 - 30 marks The dataset stored in the file swiss.gsh consists of a standardized fertility measurement (fertilty) for 47 French-speaking provinces of Switzerland in...

This assignment needs to be done in genstat . HAs anyone experience with Genstat ?


Multiple linear regression Question 1 - 30 marks The dataset stored in the file swiss.gsh consists of a standardized fertility measurement (fertilty) for 47 French-speaking provinces of Switzerland in about 1888, together with the following socio-economic indicators for the same provinces: agricltr examintn educatin catholic mortalty percentage of the population involved in agriculture as an occupation; percentage of drafted soldiers receiving the highest mark in the army examination; percentage of the population educated beyond primary school; percentage of the population who were Catholic by religion; percentage of live births who lived less than one year (that is, infant mortality). [Source: Mosteller, F. and Tukey, J.W. (1977) Data Analysis and Regression: A Second Course in Statistics, Reading, MA, Addison-Wesley, pp. 549-51.] The aim is to determine whether, and how, fertility is related to these five socio-economic variables. (a) Using GenStat, produce a scatterplot matrix including all the variables, and also produce their correlation matrix. On the basis of these, carry out a preliminary examination of the relationships between fertilty and the explanatory variables, and of the relationships between the explanatory variables. Regarding fertilty as the response, which of the other variables would you expect to see as explanatory variables in a good linear regression model? [7] (b) (i) Treating fertilty as the response variable, fit a regression model that includes all the other five variables as explanatory variables. Based on the Estimates of parameters given by GenStat, which variables seem to be important in this model? Include a copy of the Estimates of parameters table in your answer. [2 ] (ii) Check the appropriateness of the model by producing a composite residual plot. Judging from the composite residual plot, do the assumptions that underlie the model seem to be reasonable? [3] page 2 of 5 (iii) Your GenStat output should contain the following message. Message: the following units have large standardized residuals. Unit Response Residual 37 92.2 2.31 47 42.8 -2.27 What does this message mean? Why might this be important? Is this message a cause for concern in this particular case? [3] (iv) Your GenStat output should also contain the following message. Message: the following units have high leverage. Unit Response Residual 19 54.3 0.35 45 35.0 0.46 You are not expected to know what this message means yet; for now, accept that it just means that there is something unusual about provinces 19 and 45. On the scatterplot matrix that you produced in part (a), mark (by hand) where the point corresponding to province 45 is. What is unusual about province 45? Why might this be important? In your opinion, is province 45 a rural province, or is it a province containing a large town or city? [5 ] (c) (i) Perform a stepwise regression starting from the full regression model, using 16 as the maximum number of steps, and 4 as the test criterion. Which explanatory variables are selected by this procedure? Does this make sense in the light of your initial data examination in parts (a) and (b)(i)? [4 ] (ii) Does the stepwise regression starting from the null model lead to the same selected model? (iii) Summarise your findings from the model fitted in part (c)(i), and describe in simple terms how the standardised fertility measurement depends on the other five variables. Include the fitted model in your answer. [2] [4] page 3 of 5 Question 2 - 30 marks Investigators interested in finding the optimal settings for a propeller to turn efciently studied propellers made from recycled plastic cups. In the study, they evaluated 16 propellers. For each propeller, they used a diferent combination of the following settings: • The number of blades (blades): 1 '4 blades' or 2 '8 blades'. • The angle of the blades (angle): 1 '10 degrees' or 2 '30 degrees'. • The length of the blades (length): 1 'half length' or 2 'full length'. • The air speed into which the propellers faced (air speed): 1 'low setting' or 2 'high setting'. One of the response variables that they measured — which is the one that we will consider in this question — was the rotation rate of the propeller (rotation rate), measured in hundreds of revolutions per minute. The data are given in the file propeller.gsh. [Source: Izraelevitz, A.M., Anderson-Cook, C.M. and Hamada, M.S. (2011) 'Illustrating the use of statistical experimental design and analysis for multiresponse prediction and optimization', Quality Engineering, 23, 265-77.] (a) Is this study an observational study or a controlled experiment? Justify your answer. (b) First consider the possible efect of just the length of the propeller blades. Use GenStat to produce an appropriate graphical display to illustrate the rotation rate in response to diferent lengths. On the basis of your diagram, comment also on whether these data appear to satisfy the assumptions for analysis of variance with length as the only factor. (c) Now consider the possible efects of all four factors (blades, angle, length and air speed). What characteristic of the design would lead one to want to fit a reduced model to these data that omitted one or more of the possible interactions between the factors? Assuming that all of the factors given were of equal interest to the investigators, which interactions, if any, would you consider leaving out of the model initially? (d) Fit the model with all main efects and all second-order interactions only. Using the analysis of variance table and the tables of means provided by Genstat for this model, describe the conclusions that you reach. (e) Produce a means plot (with means joined by lines) in which the efect of angle is along the horizontal axis and the efect of length determines which line is which. Comment on how this plot reflects the p value associated with the angle.length term in the model fitted in part (d). (f) Produce appropriate residual plots to check the appropriateness of the analysis of variance model fitted in part (d). Comment, in the light of the plots, on the adequacy of the model. [2] [4] [3] [7 ] [2] [5 ] page 4 of 5 (g) The investigators then duplicated their study for the same 16 combinations of settings as they had before, using a further 16 propellers. The combined results for both runs of this study are given in the file propeller2.gsh. Using this datafile, fit the model with all main efects and all interactions (not just second-order ones). Include a copy of the resulting Analysis of variance table in your answer. Compare the results thus obtained from the duplicated study with those obtained from the initial study in part (d), in terms of efects deemed significant. Also, comment on what the duplicated study tells you about the appropriateness or otherwise of restricting the model to main efects and second-order interactions in part (d). [7] page 5 of 5
Feb 10, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here