1 Linear Models (LMR) Please read these carefully. Please proof-read your submission to ensure it is as polished and professionally presented as possible. You can include any relevant computer code...

1 answer below »
Can I please get a quote for this assignment to be done?



1 Linear Models (LMR) Please read these carefully. Please proof-read your submission to ensure it is as polished and professionally presented as possible. You can include any relevant computer code used to generate results for your answers. Allowing some tolerance, the maximum number of pages for this assignment is set to 14 pages. 2 instructions  Answer the questions in an essay-style approach when appropriate. Make sure to include all relevant computer output (and exclude irrelevant output), that is presented neatly and integrated through the discussion and interpretation. Do not include an appendix.  Note that there are not necessarily unique correct answers for these questions,  Where equations or formula are to be presented, please attempt to do this electronically using a word processor rather than including images of scanned in hand written work. When this is not possible, scanned in written work must be extremely neat and legible. Writing mathematics electronically is an important skill to learn and practice.  However concise presentation and discussion within the limit of the maximum number of pages is encouraged 3 Part A Dataset: vo2.dta Background: The background and data for this assignment is taken from the following article (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0199509). It is not necessary to read this for the assignment or course, and the link is provided purely for your own interest. This article uses a form of linear regression to create a prediction model for maximum oxygen uptake; a measure of cardiorespiratory fitness. There are several predictors used in this article, as well as up to two measurements per participant. The data is freely available to download. This assignment question uses the articles data but restricts the predictors to sex and body mass index alone, and only uses the average measurement for each participant (so as not to violate the independence of residuals assumption). These data has been modified to a small extent for the purposes of this assignment. The data for this assignment contains the following columns:  id: Participant id number  sex: Sex of the participant (0 = male, 1 = female)  bmi: Body mass index (kg/m2)  vo2: Relative maximum oxygen uptake (ml/kg/min) The purpose of this assignment question is to investigate the relationship between maximum oxygen uptake and sex, and see if body mass index confounds or modifies this relationship. Investigate this by answering the questions below: Note: You do not need to investigate the assumptions of regression analysis for questions 1 – 3, as this will be done in question 4. Question 1: Ignoring body mass index for now, examine the evidence for an average difference in maximum oxygen uptake between males and females. Interpret the key results of this analysis. Question 2: Describe in words the general conditions under which we would expect a variable C to confound the relationship between and endpoint Y and a covariate X. Does body mass index confounds the relationship between maximum oxygen uptake and sex? Discuss whether these conditions have been met in this data. Now using regression methodology, examine how taking account of body mass index changes the relationship between maximum oxygen uptake and sex. Do you consider body mass index to be acting as a confounder here? http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0199509 4 Question 3: Use regression to test for interaction (effect modification) between sex and body mass index on the outcome of maximum oxygen uptake. Question 4: Investigate the assumptions of the regression model you believe to be most informative to report on. This should include a clear indication of whether you believe the assumptions have been met, and justifications for these conclusions with reference to appropriate diagnostic figures. This should also include some discussion of any outliers, leverage points or influential points that might affect the conclusions, and your recommendations for handling any such data points. A minimum of 3 plots should be presented. Question 5: Write a conclusion suitable for non-statistician researcher that summarises your findings and analysis. This researcher would be familiar with introductory statistics concepts such as p-values and 95% confidence intervals but would be unfamiliar with the technical details of a regression analysis. Part B Dataset: virus.dta Background: The size of genomes across organisms vary enormously. For example, the human genome is approximately 3 billion nucleotides (nt) long (a nucleotide is one of the 4 letters that make up the genetic language), the genome of the Japanese flower Paris Japonica is approximately 150 billion nucleotides long, and the genome size of a nematode is about 100 million nucleotides long. In bacteria and viruses, the genomes are much smaller, but still vary in size considerably. For example, Influenza (the flu) is only 2400nt, whereas Pandoravirus is approximately 2.4 million nucleotides long. What drives this variation in genome sizes is an active area of research. The motivation for this assignment question comes from the article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4093846/) which does not need to be read for this assignment, and is included for your interest only. A study investigating this topic sampled the genome size of 82 independent viruses. For each of these viruses, the volume of the virus (in terms of the physical space it occupies) was measured. The data has the following columns  glength: The genome length of the virus in nucleotides (nt)  vvolume: The volume of the virus (nm3) (Note: These data has been modified from the original research study) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4093846/ 5 The study would like to investigate whether there is a relationship between genome length and volume of a virus. Question 1: Examine the evidence for a relationship between genome length and virus volume where genome length (untransformed) is the outcome and virus volume (untransformed) is the predictor variable. Investigate the validity of the assumptions of this regression analysis. Now examine the evidence for this relationship, and the assumptions of this analysis when both variables are log transformed. Which analysis (log transformed or untransformed) do you prefer and why? Question 2: Provide a suitable interpretation of your preferred analysis in terms that you could explain to a non-statistician who is familiar with introductory statistics concepts such as p-values and 95% confidence intervals but would be unfamiliar with the technical details of a regression analysis. If you choose the transformed model, provide an interpretation on the log scale first and then on the original scale Part C Dataset: dosevd.dta Background: This second part of the assignment requires some more theoretical work based on fitting a linear regression model to investigate the effect of three dosage levels on an outcome. Suppose a clinical investigator is interested in examining the relationship between the effect of increasing doses of a Vitamin D supplement given to individuals who are Vitamin-D deficient. She performs a randomised trial in which she allocates (at random) volunteers to three groups, 1000 IU (International Units), 2000 IU and 3000 IU of supplement, per day for a period of three months, after which the serum levels of a key metabolite of Vitamin D called 25(OH)D are measured in each participant. (N.B. This is a hypothetical scenario based on a real question that is current in epidemiology at the moment.) Question 1 One possible analysis of the data described is to estimate the linear effect of dose, i.e. to assume a linear relationship of expected outcome (labelled Y, as usual) to dose level, which for simplicity we will represent as X = 1,2,3 representing doses of 1000IU, 2000IU and 3000UI respectively. To estimate the average rate of change in Y with dose we would fit the simple linear regression model with the standard assumptions for the error term: ?? = ?0 + ?1?? + ?? 6 To objective is to show (algebraically) that if the sample size allocation between group1 1, group 2 and group 3 is 1:1:4 (i.e. n1 = n, n2 = n, n3 = 4n), then the least squares estimate of ?1 is �̂�1 = 4�̅�3 − 3�̅�1 − �̅�2 7 Where �̅�? is the mean of the outcome in dose group i. For simplicity we will assume that ?? = 1 for ? = 1, … , ?, ?? = 2 for ? = ? + 1, … ,2? and ?? = 3 for ? = 2? + 1, … ,6? i) Show that the overall means for X is �̅� = 5 2 ii) Show that ∑(?? − �̅�) 2 = 7? 2 iii) Use these results to prove the formula above for �̂�1 Question 2 The dataset provided contains some simulated data that might have arisen from the study just described, with 15 participants in groups 1 and 2, and 60 participants in dose group 3. Fit the regression model discussed above and demonstrate that the result obtained for �̂�1 in question 1 is true in this sample.
Answered Same DayApr 07, 2021

Answer To: 1 Linear Models (LMR) Please read these carefully. Please proof-read your submission to ensure it is...

Subhanbasha answered on Apr 11 2021
140 Votes
.
Part - A
#-------------- Q1 --------------------------------------
Interpretation : from the above result oxygen uptake is high in males than females.
#-------------- Q2 --------------------------------------
Interpretation : Generally bmi is a confounding variable of relationship between oxygen uptake
and sex.
From the above regression analysis one unit increase of bmi will decrease the oxygen uptake in 0.6 times, and in sex also it will decrease 0.03 times. So there is negative correlation . so , bmi is not confounding variable.
#-------------- Q3 --------------------------------------

Interpretation : From the above result there is a interaction effect between sex and bmi on the outcome of maximum oxygen uptake. If the sex is 0(male) maximum oxygen uptake will decrease in 0.8808 times of bmi ,if the sex is 1 (female) then maximum oxygen uptake will decrease approximately 9 times of bmi.
#-------------- Q4 --------------------------------------
Interpretation :
From the above graphs data is approximately normal . and also relation between the variables is linear so we can use regression methodology. From the below box plots there are two outliers in bmi which can be effect in model. The plot between residuals and fitted is not muc different .
We can use regeression analysis
#-------------- Q5 --------------------------------------
There is a 95% chance unit increase of bmi will decrease the maximum oxygen uptake in 0.880 times, and also changing the sex also it will decrease the maximum oxygen uptake.
The intervals for bmi -1.73117 -0.0304973 it means one unit increase of bmi will decrease maximum oxygen uptake in between these interval.
The intervals for sex -12.60329 -3.70957 it means if the sex is 1 (female) then it will decrease maximum oxygen uptake in between these interval
Coefficient interval is 50.71601 90.25928 means without those bmi and sex it will increase the maximum oxygen uptake in between these interval constantly.
Part – B
#-------------- Q1 --------------------------------------
From the above result there is a approximately 95% correlation means if the glenth will change there is a 95% chance vvolume also will change.
# --------------------- After log Transformation ---------------------------------
Interpretation : after doing log transformation the data is changed into normal so now we can do regression analysis for this data.
Before transformation it does not follow the regression assumptions. So, we can prefer the log transformation for this analysis.
#-------------- Q2 --------------------------------------
From the above analysis we can infer that there is a 95% chance one unit increasing of genome length will increase the volume of virus in 1.375941 times.
Interval for genome length 1.209881 - 1.542002 means 95% confident that the genome length will take the values in between this range .
Constant increase is negative means without genome length it will not be increase the volume of virus.
Part – C
#-------------- Q1 --------------------------------------
#-------------- Q2...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here