STAT*2040
Winter 2022
Data Analysis Assignment #2
This assignment has an official deadline of Wednesday March 30 at 11:59 pm, but I will happily accept
submissions (with no penalty) until Sunday April 3 at 11:59 pm. You must submit one pdf document
for each part of this assignment (4 pdfs in total). Submissions must be made to Crowdmark, using
the personalized link that will be sent to your email address.
You may complete this assignment individually, or in groups of 2 or 3. No groups of 4 or more, unde
any circumstances. This is to encourage discussion, with the idea of helping you learn the material.
It is expected that you work on all of the parts together, and not split it up into parts for each
person. I very much view these assignments as a learning tool as much as an assessment tool, and
you’re depriving yourself of that if you ignore certain parts. With that in mind, it is entirely up to
you whether or not you choose to work in groups or individually, and entirely up to you to make sure
your group members are contributing. If someone is not ca
ying their weight, you are welcome to
do the assignment individually. And, once again, it is expected that you work together on all parts.
There are 4 parts to this assignment:
1. Data analysis and write-up of conclusions for a one-sample problem. (25 marks)
2. Data analysis and write-up of conclusions for a two-sample procedure. (30 marks)
3. Reading parts of a journal article, and interpreting some values in the the article. (15 marks)
4. Plotting a scatterplot with a regression line, and interpreting the results of the test on the
slope. (10 marks + 3 bonus marks)
Part of this assignment involves creating various plots in R (or R Studio), and using R to ca
y out
some calculations. Any plot or output that is done in software other than R (e.g. Excel) will receive
a grade of 0. This is an R assignment.
On at least one of the parts, you will need to look up a journal article. The journal articles are freely
available from the University of Guelph li
ary website. It’s usually quickest to search for the article
title in Omni (on the li
ary site), then follow the “available online” link. You can also search for the
1
journal title through Omni, but the article title often takes you straight there. If you are off-campus,
then you will be prompted to use the off-campus sign on before proceeding to the journal article.
This assignment is worth 12.5% of your final grade. You will be marked on: 1) Getting the prope
R output and plots, 2) Validity of your statistical conclusions and interpretations, 3) Writing style
(grammar and clear concise language count!), 4) Presentation. (I don’t have a specific presentation
style in mind, but make it clean and easy to read. Sloppy work won’t earn full marks.) Note that
you must use R to complete this assignmen. My “Intro to R” document is available on the Courselink
site.
You will have to do some thinking in this assignment. I am not going to tell you exactly what to do,
and I would be negligent in my duties as a professor if I were to do so. You are most welcome to
ask me questions, and post questions or comments on the discussion board (but refrain from posting
specific answers or code that could simply be copied). If you’re holding up your end of the bargain,
and giving these questions an honest go, then I’m very willing to help when you have questions o
concerns. I am not always looking for one specific method of analysis – for some of these questions,
there is more than one path to perfect marks.
(You can do this assignment in either base R or R Studio. I’ll phrase everything in terms of base R.)
1 Post-mortem body weight compared to pre-mortem body weight
(25 marks total)
McCormack et al XXXXXXXXXXinvestigated differences in body weight before and after death, and the
possible impact of these differences on the pathologic assessment of the heart. As part of the
study, the authors investigated the percentage change in body weight at autopsy relative to the
pre-mortem weight assessment in 120 autopsy cases at Massachusetts General Hospital. The file
2040_W22_autopsyweight.csv contains the percentage increase for the 120 autopsy cases. (A value
of 5, for example, indicates that the weight at autopsy was 5% greater than the pre-mortem weight.
There were actually 132 observations in the data set, but we are looking at a subset of 120 here. Act
as if there were 120 cases in the sample.) You must import this data set into R to ca
y out the
analysis.
For your write-up to be complete, you must:
a) Plot an appropriate boxplot and an appropriate normal quantile-quantile plot. Include the plots
in your submission.
) Comment on the whether the normality assumption of the one-sample t procedures is reasonable
in this setting. You should make reference to the appropriate plot(s). If you feel there is a
violation of the normality assumption, do you think it is still reasonable to use the t procedure
here? Justify your position.
For the remainder of this section, assume it is reasonable to use the t procedures.
c) Suppose we decide to use the t procedures to analyze the data. Use t.test in R to calculate a
95% confidence interval for the population mean percentage. Include the output from R in you
submission.
d) Give an appropriate interpretation of the 95% confidence interval given by R, in the context of
the problem.
e) Ca
y out a test of the more meaningful test: H0: µ = 0 or H0: µ = 10. Justify your choice
of hypothesis. (Recall that choice of hypothesis has nothing to do with the data in the cu
ent
sample, or default output from software, but is based on the nature of the problem at hand.)
Give the hypotheses in words and symbols, get the appropriate t statistic and p-value from the R
output, and give an appropriate conclusion to this test.
Your submission must include the boxplot, the normal QQ plot, and the R output, in addition to
your comments and interpretation. Your submission for this part should only be two pages, but can
e three pages if you feel that is necessary.
2 Subjective and objective sleep time mismatch in insomniacs and
normal sleepers (30 marks total)
Consider again this information from Assignment #1.
Manconi et al XXXXXXXXXXinvestigated the objective-subjective mismatch in sleep perception. In one part
of the study, the total sleep time mismatch (TSTobjective − TSTsubjective) was measured for a sample
of 159 self-diagnosed insomniacs and a sample of 288 normal sleepers (controls). (TSTsubjective is the
total time (minutes) that the individual estimated they slept (upon wakening), and TSTobjective is the
total time they slept as measured by technology. The difference is the mismatch, and these mismatch
times are given in the data set. Positive mismatch times indicate the individual thought they slept
for less time than they did.) The data is contained in the file 2040_W22_sleep_insomnia_both.csv.
Is there evidence of a difference in mismatch times for normal sleepers and self-diagnosed insomniacs?
You must import this data set into R to ca
y out the analysis. You will need to use an appropriate
t.test command to ca
y out the calculations.
For your write-up to be complete, you must:
a) Plot side-by-side box plots of the data (in one plot). Label the plot appropriately. Plot normal
quantile-quantile plots for the two groups separately. Include the plots in your submission.
) Comment on the whether the normality assumption of the two-sample t procedures is reasonable
in this setting. You should make reference to the plots. If you feel there is a violation of the
normality assumption, do you think it is still reasonable to use the t procedure here? Justify you
position.
c) Choose to analyze the data with either the pooled-variance t or Welch’s t procedure. Justify you
choice.
d) Give the R output for your choice of procedure.
e) Interpret the results, including commenting on the results of the test of the null hypothesis that
the true mean mismatch time is the same for both groups, and an appropriate interpretation of a
elevant confidence interval. Interpretations must relate to the problem at hand.
f) If we wish to use information in this study to draw conclusions about self-diagnosed insomniacs
vs normal sleepers in general, what biases might be present? (You’ll need to look at relevant bits
of the journal article to answer this.)
Your submission must include the boxplots, normal QQ plots, and the R output, in addition to you
comments and interpretation. Your submission for this part should only be two pages, but can be
three pages if you feel that is necessary.
3 Interpreting some values from a journal article (15 marks total)
Answer the following questions clearly and concisely. Your submission for this part should be a single
page.
a) In many journals, when the authors report a result such as 16.8±1.4, the 1.4 is the standard e
o
of the statistic, or the raw standard deviation of the data values, and not the margin of e
or.
(Sometimes, especially for small sample sizes, it’s not trivial to figure out whether the authors
are reporting the standard e
or of the statistic, or the standard deviation of the data values, but
it should be clear in this example.) Consider Aydin and Yalcin XXXXXXXXXXIn Table 1, the authors
eport a value of 14.9± 4.7. What do these two values mean in the context of this study?
) Consider again Aydin and Yalcin XXXXXXXXXXIn Table 2, there is a line “Biceps skin-fold thickness
(mm).” In that line, they report a p-value of less than XXXXXXXXXXGive the null hypothesis of the test
co
esponding to that p-value, in words and symbols, and give an appropriate conclusion that
elates to the problem at hand.
c) Consider again Aydin and Yalcin XXXXXXXXXXIn Table 2, the authors report a value of 11.1, with an
associated confidence interval of [10.0, 12.2]. Give an interpretation of the confidence interval, in
the context of the problem at hand.
4 Scatterplot with a least-squares regression line. 10 marks + 3
onus
Pourhassan et al XXXXXXXXXXinvestigated various aspects of oxygen uptake and resting energy expenditure
in overweight subjects. One part of the study explored the relationship between resting energy
expenditure (kJ/min) and total organ weight (kg, excluding the heart and estimated via an MRI).
The data for this sample of 53 females is contained in the file 2040_W22_REE_organ.csv
Plot a scatterplot with resting energy expenditure on the vertical axis and total organ weight on the
horizontal axis