STAT8111/STAT7111 Generalized Linear Models Assignment 2, 2020 Due date: Friday 2 October, 12 noon Instructions • There is a limit of 15 pages for your submission. There will be a 10% penalty if your...

generalize liner model


STAT8111/STAT7111 Generalized Linear Models Assignment 2, 2020 Due date: Friday 2 October, 12 noon Instructions • There is a limit of 15 pages for your submission. There will be a 10% penalty if your submission is 16 pages long and another 10% penalty if your submission is 17 pages long. After that, there will be 1% penalty for each CHARACTER (space included) from page 18 onwards. • All R related questions must be word processed (preferably using R markdown) and written in Times New Roman (or equivalent) with font 12pt size. Pages should have margin of at least 1cm in all sides. This is to discourage you to try to squeeze as much content as possible within the page limit without thinking what should be included in your submission. • All pages should be numbered. • If possible, answer questions in the same order as they appear in the assignment sheet. If this is not possible, highlight (e.g., bold text) which question you are answering clearly to allow the assessor(s) to identify which question you are referring to. • When building a model, do not present interim models. Explain your model-building strategy and give a summary of your results in a table. While you should present only your final model in detail, make sure to provide enough information for the assessor(s) to evaluate the quality of all models. • NEW! You should include all important R output in your write up. Any R output provided in the appendix will not be marked. • Give your R code in an appendix. This is not counted in the page limit. However, keep in mind that the aim is to keep the assignment as short as it can be. • You should submit the assignment using the Turnitin tool on iLearn. 1 Question 1 The file couple.csv contains a dataset based on a study of the impact of education level and level of anxious attachment on unwanted pursuit be- haviours in the context of couple separation. The following are the variables. Variables Description upb number of unwanted pursuit behaviour perpetrations; education 1 if at least bachelor’s degree; 0 otherwise; anxiety continuous measure of anxious attachment a) Develop an appropriate statistical model for the number of unwanted pursuit behaviour perpetrations. There is no need to consider any in- teraction between predictors. You should investigate the data graphi- cally and/or numerically before fitting a model. Use a model selection criterion to select the best model if needed. For your final model: i) Write down the fitted model equation. ii) Interpret the model parameter(s). iii) Perform diagnostic checking to confirm your final model is ap- propriate. b) Inspire by https://youtu.be/s3JldKoA0zw, we are adding two more cases to the dataset: upb education anxiety 2 1 0.7425 1 0 -1.0977 Add these 2 cases to your existing dataset and reproduce the whole analysis you did in part a). Marks will be awarded in this part based on the correctness of the analysis as well as how similar and consistent your analysis looks when compared to what you had in part a). c) Suppose you are now asked to investigate the potential interaction effect between the two covariates. Investigate the data graphically and/or numerically. Comment on your output and whether you would need to adjust your analysis earlier. There is no need to (fit and) report any extra model(s) with interaction effect. 2 https://youtu.be/s3JldKoA0zw Question 2 a) For an epidemic disease, it is quite common to consider that in the early stages, the rate at which new cases concur increase exponentially through time. Consider µi the expected number of new cases on day ti and the following proposed model µi = γe η×ti , where γ and η are the unknown model parameters. i) Show that this model can be expressed as a GLM by specify- ing the link function, the response variable and the new model parameters. ii) Provide a sketch of some R code, i.e. they don’t have to be able to run, to help the ecologist to fit his model and obtain the orig- inal model parameter estimates. You can include those code as comments or use the chunk option: eval = FALSE. b) An ecologist has recorded the consumption rate of Grouse by Hen Harriers (per day), and the corresponding Grouse density (per km2). Ecological theory suggests that the expected consumption rate (noted c) of grouse per day should be related to grouse density (noted d) by the model: E(ci) = a× dmi 1 + α× t× dmi , where a, t and m are unknown parameters. It is also expected that the variance in consumption rate is proportional to the mean consumption rate. i) Show that, for fixed m, this model can be expressed as a GLM model. Specifying the link function, the response variable and the new model parameters. ii) For m = 1, provide a sketch of some R code to help the ecologist to fit his model and obtain the original model parameter estimates. Question 3 An ecologist is interested in the presence and absence of a target species, based on some stratified sampling, along a precipitation gradient starting from a coast with high precipitation towards rather dry conditions in the interior. He denoted the presence of the species as (pres=1) and the absence as (pres=0) and he has recorded the precipitation values each time (prcp). In the following, the first records of the data are presented: 3 load("data_ecology.RData") head(data.ecology) ## pres prcp ## 1 0 1.154712 ## 2 0 1.363412 ## 3 0 4.152266 ## 4 0 6.810878 ## 5 0 7.447050 ## 6 0 8.232014 a) The scientist decides to run a Gaussian linear model to predict the presence of the species. Explain why the Gaussian linear model is not appropriate to answer his research question. b) Propose a more appropriate model to link the presence of the species to the precipitation. Write down your statistical model and its as- sumptions. c) Estimate and interpret the parameter(s) of your model. d) Write down the equation for predicting the probability of the presence of species given the precipitation. We will revisit this dataset in Assignment 3. 4
Sep 20, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here