Can I please get a quote for this assignment to be done?
1 Linear Models (LMR) Please read these carefully. Please proof-read your submission to ensure it is as polished and professionally presented as possible. You can include any relevant computer code used to generate results for your answers. Allowing some tolerance, the maximum number of pages for this assignment is set to 14 pages. 2 instructions Answer the questions in an essay-style approach when appropriate. Make sure to include all relevant computer output (and exclude irrelevant output), that is presented neatly and integrated through the discussion and interpretation. Do not include an appendix. Note that there are not necessarily unique correct answers for these questions, Where equations or formula are to be presented, please attempt to do this electronically using a word processor rather than including images of scanned in hand written work. When this is not possible, scanned in written work must be extremely neat and legible. Writing mathematics electronically is an important skill to learn and practice. However concise presentation and discussion within the limit of the maximum number of pages is encouraged 3 Part A Dataset: vo2.dta Background: The background and data for this assignment is taken from the following article (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0199509). It is not necessary to read this for the assignment or course, and the link is provided purely for your own interest. This article uses a form of linear regression to create a prediction model for maximum oxygen uptake; a measure of cardiorespiratory fitness. There are several predictors used in this article, as well as up to two measurements per participant. The data is freely available to download. This assignment question uses the articles data but restricts the predictors to sex and body mass index alone, and only uses the average measurement for each participant (so as not to violate the independence of residuals assumption). These data has been modified to a small extent for the purposes of this assignment. The data for this assignment contains the following columns: id: Participant id number sex: Sex of the participant (0 = male, 1 = female) bmi: Body mass index (kg/m2) vo2: Relative maximum oxygen uptake (ml/kg/min) The purpose of this assignment question is to investigate the relationship between maximum oxygen uptake and sex, and see if body mass index confounds or modifies this relationship. Investigate this by answering the questions below: Note: You do not need to investigate the assumptions of regression analysis for questions 1 – 3, as this will be done in question 4. Question 1: Ignoring body mass index for now, examine the evidence for an average difference in maximum oxygen uptake between males and females. Interpret the key results of this analysis. Question 2: Describe in words the general conditions under which we would expect a variable C to confound the relationship between and endpoint Y and a covariate X. Does body mass index confounds the relationship between maximum oxygen uptake and sex? Discuss whether these conditions have been met in this data. Now using regression methodology, examine how taking account of body mass index changes the relationship between maximum oxygen uptake and sex. Do you consider body mass index to be acting as a confounder here? http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0199509 4 Question 3: Use regression to test for interaction (effect modification) between sex and body mass index on the outcome of maximum oxygen uptake. Question 4: Investigate the assumptions of the regression model you believe to be most informative to report on. This should include a clear indication of whether you believe the assumptions have been met, and justifications for these conclusions with reference to appropriate diagnostic figures. This should also include some discussion of any outliers, leverage points or influential points that might affect the conclusions, and your recommendations for handling any such data points. A minimum of 3 plots should be presented. Question 5: Write a conclusion suitable for non-statistician researcher that summarises your findings and analysis. This researcher would be familiar with introductory statistics concepts such as p-values and 95% confidence intervals but would be unfamiliar with the technical details of a regression analysis. Part B Dataset: virus.dta Background: The size of genomes across organisms vary enormously. For example, the human genome is approximately 3 billion nucleotides (nt) long (a nucleotide is one of the 4 letters that make up the genetic language), the genome of the Japanese flower Paris Japonica is approximately 150 billion nucleotides long, and the genome size of a nematode is about 100 million nucleotides long. In bacteria and viruses, the genomes are much smaller, but still vary in size considerably. For example, Influenza (the flu) is only 2400nt, whereas Pandoravirus is approximately 2.4 million nucleotides long. What drives this variation in genome sizes is an active area of research. The motivation for this assignment question comes from the article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4093846/) which does not need to be read for this assignment, and is included for your interest only. A study investigating this topic sampled the genome size of 82 independent viruses. For each of these viruses, the volume of the virus (in terms of the physical space it occupies) was measured. The data has the following columns glength: The genome length of the virus in nucleotides (nt) vvolume: The volume of the virus (nm3) (Note: These data has been modified from the original research study) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4093846/ 5 The study would like to investigate whether there is a relationship between genome length and volume of a virus. Question 1: Examine the evidence for a relationship between genome length and virus volume where genome length (untransformed) is the outcome and virus volume (untransformed) is the predictor variable. Investigate the validity of the assumptions of this regression analysis. Now examine the evidence for this relationship, and the assumptions of this analysis when both variables are log transformed. Which analysis (log transformed or untransformed) do you prefer and why? Question 2: Provide a suitable interpretation of your preferred analysis in terms that you could explain to a non-statistician who is familiar with introductory statistics concepts such as p-values and 95% confidence intervals but would be unfamiliar with the technical details of a regression analysis. If you choose the transformed model, provide an interpretation on the log scale first and then on the original scale Part C Dataset: dosevd.dta Background: This second part of the assignment requires some more theoretical work based on fitting a linear regression model to investigate the effect of three dosage levels on an outcome. Suppose a clinical investigator is interested in examining the relationship between the effect of increasing doses of a Vitamin D supplement given to individuals who are Vitamin-D deficient. She performs a randomised trial in which she allocates (at random) volunteers to three groups, 1000 IU (International Units), 2000 IU and 3000 IU of supplement, per day for a period of three months, after which the serum levels of a key metabolite of Vitamin D called 25(OH)D are measured in each participant. (N.B. This is a hypothetical scenario based on a real question that is current in epidemiology at the moment.) Question 1 One possible analysis of the data described is to estimate the linear effect of dose, i.e. to assume a linear relationship of expected outcome (labelled Y, as usual) to dose level, which for simplicity we will represent as X = 1,2,3 representing doses of 1000IU, 2000IU and 3000UI respectively. To estimate the average rate of change in Y with dose we would fit the simple linear regression model with the standard assumptions for the error term: ?? = ?0 + ?1?? + ?? 6 To objective is to show (algebraically) that if the sample size allocation between group1 1, group 2 and group 3 is 1:1:4 (i.e. n1 = n, n2 = n, n3 = 4n), then the least squares estimate of ?1 is �̂�1 = 4�̅�3 − 3�̅�1 − �̅�2 7 Where �̅�? is the mean of the outcome in dose group i. For simplicity we will assume that ?? = 1 for ? = 1, … , ?, ?? = 2 for ? = ? + 1, … ,2? and ?? = 3 for ? = 2? + 1, … ,6? i) Show that the overall means for X is �̅� = 5 2 ii) Show that ∑(?? − �̅�) 2 = 7? 2 iii) Use these results to prove the formula above for �̂�1 Question 2 The dataset provided contains some simulated data that might have arisen from the study just described, with 15 participants in groups 1 and 2, and 60 participants in dose group 3. Fit the regression model discussed above and demonstrate that the result obtained for �̂�1 in question 1 is true in this sample.