This needs to be done in STATA please?
Linear Models (LMR) 2020: Assignment 2 Due: Monday 1st June 2020, 11:59pm There are 3 parts in this assignment. Indications for the structure of responses are given in the 3 parts. Please read these carefully. Please proof-read your submission to ensure it is as polished and professionally presented as possible. You can include any relevant computer code used to generate results for your answers. We expect that a reasonable effort on the assignment should amount to not much more than 12 pages including both text and graphics. Allowing some tolerance, the maximum number of pages for this assignment is set to 16 pages. Can I also ask you to indicate the part and number your answers and in the same way the parts are numbered. Instructions • Answer the questions in an essay-style approach when appropriate. Make sure to include all relevant computer output (and exclude irrelevant output), that is presented neatly and integrated through the discussion and interpretation. • Place the code in an appendix (if the syntax you want to present is long). • Do not repeat the assignment questions • Do not include an assignment cover page • Note that there are not necessarily unique correct answers for these questions, and marks will be awarded for appropriate analysis using regression models, with corresponding justifications and explanations. Marks may be subtracted for unfocussed or disorganised presentation of material. • Where equations or formula are to be presented, please attempt to do this electronically using a word processor rather than including images of scanned in hand written work. When this is not possible, scanned in written work must be extremely neat and legible. Writing mathematics electronically is an important skill to learn and practice. Part A The dataset “vitD.dta” contains data on measurements made on a group of newborn babies in a study on the possible associations between maternal vitamin D levels and fetal growth, as measured by size at birth. The study was motivated by earlier studies on animals and some conflicting epidemiological evidence that low levels of vitamin D may be associated with reduced growth of the fetus. The particular measure of size at birth that we will focus on here was a measure of the baby’s knee-to-heel length (performed by a very accurate device called a knemometer!), recorded in the dataset with the variable name “kneeheel”. Vitamin D level was measured as the concentration of 25-hydroxyvitamin D (25-OHD), in nmol/L, at a first trimester antenatal visit. The dataset also records a number of other variables potentially associated with birth size, in particular the sex of the baby, the mother’s height, whether or not she smoked during pregnancy, whether this was her first baby, and the gestational age at birth. [Note that the usual caveats apply: these data have been sampled and modified from an original study (conducted by Dr Ruth Morley at the Royal Children’s Hospital, Melbourne) and no substantive conclusions should be drawn from these analyses.] Question 1 In this question we ask you to examine the association between birth size (kneeheel) and maternal vitamin D levels during pregnancy (vitd) using two different regression models. For this part of the analysis you should ignore all other variables in the dataset. The literature in the area suggests the following two possible relationships (if any): a) There may be a smoothly increasing birth size with vitamin D level across the whole range of vitamin D levels referred to as model (1) b) There might be a threshold effect, whereby growth is adversely affected only below a certain minimum “normal range” of vitamin D levels. In other words, there may be a smooth association between the two variables that is particularly strong below the “normal range” of vitamin D levels, and less strong once within the normal range. This model will be referred to as model (2). It is often called a “hockey stick” or “bent stick” or “changepoint” model. The threshold value of interest is taken from the relevant research literature, and is 28 mmol/L of 25-OHD. (Note: Although it would be possible to explore for other threshold values, we require that you take this value as given in your analysis.) Your task is to investigate the evidence for each of the two hypothesised relationships in the above order and to determine which model results you would present to the clinical investigator. Provide your response in the form of a technical report documenting your analyses for each of the models, including Stata (or other software) output where appropriate, in a form that would allow another person to check over the work that you’ve done. It should include an appropriate explanation of steps taken, including a clear statement of the models, and some checking of the underlying assumptions for the analyses performed. At the end of your report, you should include a separate and stand-alone paragraph where you interpret the important results of your analysis suitable for a clinical investigator. [Hint: For the second model, you will need to define a second vitamin D covariate in order to estimate separately the association between birth size and vitamin D level for mothers with vitamin D levels above and below 28 mmol/L. This is done most easily by creating a variable that takes the value zero if vitd<28, and="" then="" measures="" the="" “excess”="" vitamin="" d="" beyond="" the="" threshold="" level,="" i.e.="" by="" typing="" the="" following="" commands="" in="" stata:="" gen="" vitd_beyond="vitd-28" replace="" vitd_beyond="0" if="" vitd="">28,>< 28="" ]="" question="" 2="" now="" consider="" the="" additional="" variables:="" maternal="" height,="" smoking="" and="" pregnancy="" history,="" the="" baby’s="" sex="" and="" gestational="" age="" (length="" in="" weeks)="" at="" birth.="" (i)="" the="" clinical="" investigator="" asks="" you="" to="" comment="" on="" the="" possible="" confounding="" effects="" of="" gestational="" age="" and="" requests="" you="" adjust="" for="" gestational="" age="" in="" your="" chosen="" regression="" model="" in="" question="" 1.="" without="" performing="" any="" analyses,="" explain="" to="" the="" clinician="" how="" the="" estimated="" vitamin="" d="" effect="" would="" be="" interpreted="" in="" such="" a="" model,="" and="" whether="" you="" have="" any="" reservations="" about="" this="" proposed="" regression="" model="" to="" address="" the="" confounding="" question.="" (ii)="" would="" you="" have="" similar="" reservations="" about="" adjusting="" for="" any="" of="" the="" other="" four="" variables="" (maternal="" height,="" smoking,="" pregnancy="" history="" and="" sex="" of="" child)?="" note:="" no="" analyses="" are="" required="" for="" question="" 2="" question="" 3="" the="" hint="" for="" question="" 1="" regarding="" model="" (2)="" states="" the="" required="" form="" of="" the="" covariate="" that="" needs="" to="" be="" added="" to="" model="" (1)="" to="" form="" the="" “threshold”="" effect="" model.="" your="" task="" here="" is="" to="" provide="" an="" algebraic="" justification="" for="" the="" form="" of="" this="" covariate.="" you="" might="" wish="" to="" start="" by="" writing="" out="" separate="" regression="" models="" for="" the="" relationship="" with="" vitamin="" d="" before="" and="" after="" the="" threshold,="" and="" then="" including="" the="" constraint="" that="" the="" regression="" line="" must="" be="" continuous,="" that="" is,="" the="" lines="" before="" and="" after="" the="" threshold="" joint="" together="" at="" the="" threshold.="" or="" you="" may="" choose="" another="" approach.="" part="" b="" a="" sexual="" health="" researcher="" has="" asked="" you="" for="" some="" statistical="" help="" in="" interpreting="" the="" results="" of="" their="" study.="" in="" this="" study,="" the="" researcher="" randomised="" 96="" people="" into="" 4="" different="" education="" interventions,="" and="" measured="" their="" knowledge="" on="" sexually="" transmitted="" infections="" (stis)="" one="" month="" later.="" the="" knowledge="" score="" is="" measured="" on="" a="" scale="" from="" 0="" to="" 25="" and="" the="" education="" groups="" are="" as="" follows:="" group="" a:="" an="" email="" containing="" links="" to="" web="" resources="" group="" b:="" a="" one="" on="" one="" discussion="" with="" a="" nurse="" about="" stis="" group="" c:="" a="" fact="" sheet="" brochure="" group="" d:="" an="" interactive="" group="" presentation="" the="" data="" are="" provided="" in="" the="" dataset="" “knowledge.dta”.="" the="" researcher="" has="" previously="" completed="" an="" introductory="" statistics="" course,="" and="" analysed="" the="" scores="" across="" groups="" using="" the="" stata="" code="" below,="" where="" variables="" b,="" c,="" and="" d="" represent="" indicator="" variables="" for="" education="" groups="" b,="" c="" and="" d="" respectively,="" and="" ‘score’="" represents="" the="" knowledge="" score.="" .="" regr="" score="" b="" c="" d="" source="" |="" ss="" df="" ms="" number="" of="" obs="96" -------------+----------------------------------="" f(3,="" 92)="2.19" model="" |="" 178.166667="" 3="" 59.3888889="" prob=""> F = 0.0941 Residual | 2491.16667 92 27.0778986 R-squared = 0.0667 -------------+---------------------------------- Adj R-squared = 0.0363 Total | 2669.33333 95 28.0982456 Root MSE = 5.2036 ------------------------------------------------------------------------------ score | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 14.79167 1.062189 13.93 0.000 12.68207 16.90127 B | 2.083333 1.502162 1.39 0.169 -.9000906 5.066757 C | 2.25 1.502162 1.50 0.138 -.733424 5.233424 D | 3.833333 1.502162 2.55 0.012 .8499094 6.816757 ------------------------------------------------------------------------------ The researcher interprets the results as telling him that the group presentation (D) is definitely the best education intervention because it is the only one with a “significant” p-value. Following this conclusion, the researcher decided to leave the “non-significant” indicator variables out of the regression model and obtained the following results: . regr score D Source | SS df MS Number of obs = 96 -------------+---------------------------------- F(1, 94) = 3.76 Model | 102.722222 1 102.722222 Prob > F = 0.0554 Residual | 2566.61111 94 27.3043735 R-squared = 0.0385 -------------+---------------------------------- Adj R-squared = 0.0283 Total | 2669.33333 95 28.0982456 Root MSE = 5.2254 ------------------------------------------------------------------------------ score | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 16.23611 .6158144 26.37 0.000 15.0134 17.45883 D | 2.388889 1.231629 1.94 0.055 -.0565391 4.834317 ------------------------------------------------------------------------------ He suggests that this provides the simplest summary result and would like to report this in his research paper. However, he is confused as to why the estimated regression coefficient for group D has reduced compared with the previous model, with the P-value now being greater than 0.05. Question 1 Provide some advice to the researcher on their interpretation of the data analysis. Do you agree that the second regression results with the indicator for group D alone should be