STAT*2040 Winter 2022 Data Analysis Assignment #2 This assignment has an official deadline of Wednesday March 30 at 11:59 pm, but I will happily accept submissions (with no penalty) until Sunday April...

1 answer below »
Data Analysis Assignment


STAT*2040 Winter 2022 Data Analysis Assignment #2 This assignment has an official deadline of Wednesday March 30 at 11:59 pm, but I will happily accept submissions (with no penalty) until Sunday April 3 at 11:59 pm. You must submit one pdf document for each part of this assignment (4 pdfs in total). Submissions must be made to Crowdmark, using the personalized link that will be sent to your email address. You may complete this assignment individually, or in groups of 2 or 3. No groups of 4 or more, under any circumstances. This is to encourage discussion, with the idea of helping you learn the material. It is expected that you work on all of the parts together, and not split it up into parts for each person. I very much view these assignments as a learning tool as much as an assessment tool, and you’re depriving yourself of that if you ignore certain parts. With that in mind, it is entirely up to you whether or not you choose to work in groups or individually, and entirely up to you to make sure your group members are contributing. If someone is not carrying their weight, you are welcome to do the assignment individually. And, once again, it is expected that you work together on all parts. There are 4 parts to this assignment: 1. Data analysis and write-up of conclusions for a one-sample problem. (25 marks) 2. Data analysis and write-up of conclusions for a two-sample procedure. (30 marks) 3. Reading parts of a journal article, and interpreting some values in the the article. (15 marks) 4. Plotting a scatterplot with a regression line, and interpreting the results of the test on the slope. (10 marks + 3 bonus marks) Part of this assignment involves creating various plots in R (or R Studio), and using R to carry out some calculations. Any plot or output that is done in software other than R (e.g. Excel) will receive a grade of 0. This is an R assignment. On at least one of the parts, you will need to look up a journal article. The journal articles are freely available from the University of Guelph library website. It’s usually quickest to search for the article title in Omni (on the library site), then follow the “available online” link. You can also search for the 1 journal title through Omni, but the article title often takes you straight there. If you are off-campus, then you will be prompted to use the off-campus sign on before proceeding to the journal article. This assignment is worth 12.5% of your final grade. You will be marked on: 1) Getting the proper R output and plots, 2) Validity of your statistical conclusions and interpretations, 3) Writing style (grammar and clear concise language count!), 4) Presentation. (I don’t have a specific presentation style in mind, but make it clean and easy to read. Sloppy work won’t earn full marks.) Note that you must use R to complete this assignmen. My “Intro to R” document is available on the Courselink site. You will have to do some thinking in this assignment. I am not going to tell you exactly what to do, and I would be negligent in my duties as a professor if I were to do so. You are most welcome to ask me questions, and post questions or comments on the discussion board (but refrain from posting specific answers or code that could simply be copied). If you’re holding up your end of the bargain, and giving these questions an honest go, then I’m very willing to help when you have questions or concerns. I am not always looking for one specific method of analysis – for some of these questions, there is more than one path to perfect marks. (You can do this assignment in either base R or R Studio. I’ll phrase everything in terms of base R.) 1 Post-mortem body weight compared to pre-mortem body weight (25 marks total) McCormack et al. (2016) investigated differences in body weight before and after death, and the possible impact of these differences on the pathologic assessment of the heart. As part of the study, the authors investigated the percentage change in body weight at autopsy relative to the pre-mortem weight assessment in 120 autopsy cases at Massachusetts General Hospital. The file 2040_W22_autopsyweight.csv contains the percentage increase for the 120 autopsy cases. (A value of 5, for example, indicates that the weight at autopsy was 5% greater than the pre-mortem weight. There were actually 132 observations in the data set, but we are looking at a subset of 120 here. Act as if there were 120 cases in the sample.) You must import this data set into R to carry out the analysis. For your write-up to be complete, you must: a) Plot an appropriate boxplot and an appropriate normal quantile-quantile plot. Include the plots in your submission. b) Comment on the whether the normality assumption of the one-sample t procedures is reasonable in this setting. You should make reference to the appropriate plot(s). If you feel there is a violation of the normality assumption, do you think it is still reasonable to use the t procedure here? Justify your position. For the remainder of this section, assume it is reasonable to use the t procedures. c) Suppose we decide to use the t procedures to analyze the data. Use t.test in R to calculate a 95% confidence interval for the population mean percentage. Include the output from R in your submission. d) Give an appropriate interpretation of the 95% confidence interval given by R, in the context of the problem. e) Carry out a test of the more meaningful test: H0: µ = 0 or H0: µ = 10. Justify your choice of hypothesis. (Recall that choice of hypothesis has nothing to do with the data in the current sample, or default output from software, but is based on the nature of the problem at hand.) Give the hypotheses in words and symbols, get the appropriate t statistic and p-value from the R output, and give an appropriate conclusion to this test. Your submission must include the boxplot, the normal QQ plot, and the R output, in addition to your comments and interpretation. Your submission for this part should only be two pages, but can be three pages if you feel that is necessary. 2 Subjective and objective sleep time mismatch in insomniacs and normal sleepers (30 marks total) Consider again this information from Assignment #1. Manconi et al. (2010) investigated the objective-subjective mismatch in sleep perception. In one part of the study, the total sleep time mismatch (TSTobjective − TSTsubjective) was measured for a sample of 159 self-diagnosed insomniacs and a sample of 288 normal sleepers (controls). (TSTsubjective is the total time (minutes) that the individual estimated they slept (upon wakening), and TSTobjective is the total time they slept as measured by technology. The difference is the mismatch, and these mismatch times are given in the data set. Positive mismatch times indicate the individual thought they slept for less time than they did.) The data is contained in the file 2040_W22_sleep_insomnia_both.csv. Is there evidence of a difference in mismatch times for normal sleepers and self-diagnosed insomniacs? You must import this data set into R to carry out the analysis. You will need to use an appropriate t.test command to carry out the calculations. For your write-up to be complete, you must: a) Plot side-by-side box plots of the data (in one plot). Label the plot appropriately. Plot normal quantile-quantile plots for the two groups separately. Include the plots in your submission. b) Comment on the whether the normality assumption of the two-sample t procedures is reasonable in this setting. You should make reference to the plots. If you feel there is a violation of the normality assumption, do you think it is still reasonable to use the t procedure here? Justify your position. c) Choose to analyze the data with either the pooled-variance t or Welch’s t procedure. Justify your choice. d) Give the R output for your choice of procedure. e) Interpret the results, including commenting on the results of the test of the null hypothesis that the true mean mismatch time is the same for both groups, and an appropriate interpretation of a relevant confidence interval. Interpretations must relate to the problem at hand. f) If we wish to use information in this study to draw conclusions about self-diagnosed insomniacs vs normal sleepers in general, what biases might be present? (You’ll need to look at relevant bits of the journal article to answer this.) Your submission must include the boxplots, normal QQ plots, and the R output, in addition to your comments and interpretation. Your submission for this part should only be two pages, but can be three pages if you feel that is necessary. 3 Interpreting some values from a journal article (15 marks total) Answer the following questions clearly and concisely. Your submission for this part should be a single page. a) In many journals, when the authors report a result such as 16.8±1.4, the 1.4 is the standard error of the statistic, or the raw standard deviation of the data values, and not the margin of error. (Sometimes, especially for small sample sizes, it’s not trivial to figure out whether the authors are reporting the standard error of the statistic, or the standard deviation of the data values, but it should be clear in this example.) Consider Aydin and Yalcin (2022). In Table 1, the authors report a value of 14.9± 4.7. What do these two values mean in the context of this study? b) Consider again Aydin and Yalcin (2022). In Table 2, there is a line “Biceps skin-fold thickness (mm).” In that line, they report a p-value of less than 0.001. Give the null hypothesis of the test corresponding to that p-value, in words and symbols, and give an appropriate conclusion that relates to the problem at hand. c) Consider again Aydin and Yalcin (2022). In Table 2, the authors report a value of 11.1, with an associated confidence interval of [10.0, 12.2]. Give an interpretation of the confidence interval, in the context of the problem at hand. 4 Scatterplot with a least-squares regression line. 10 marks + 3 bonus Pourhassan et al. (2015) investigated various aspects of oxygen uptake and resting energy expenditure in overweight subjects. One part of the study explored the relationship between resting energy expenditure (kJ/min) and total organ weight (kg, excluding the heart and estimated via an MRI). The data for this sample of 53 females is contained in the file 2040_W22_REE_organ.csv Plot a scatterplot with resting energy expenditure on the vertical axis and total organ weight on the horizontal axis
Answered 2 days AfterApr 03, 2022

Answer To: STAT*2040 Winter 2022 Data Analysis Assignment #2 This assignment has an official deadline of...

Mohd answered on Apr 04 2022
104 Votes
-
-
-
4/4/2022
Post-mortem body weight compared to pre-mortem body weight (25 marks total) McCormack et al. (2016) investigated differences in body weight before and after death, and the possible impact of these differences on the pathologic assessment of the heart. As
part of the study, the authors investigated the percentage change in body weight at autopsy relative to the pre-mortem weight assessment in 120 autopsy cases at Massachusetts General Hospital. The file 2040_W22_autopsyweight.csv contains the percentage increase for the 120 autopsy cases. (A value of 5, for example, indicates that the weight at autopsy was 5% greater than the pre-mortem weight.
library(readr)
X2040w22autopsyweight <- read_csv("New folder (2)/2040w22autopsyweight.csv")
## Rows: 120 Columns: 1
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (1): change
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
1. Plot an appropriate boxplot and an appropriate normal quantile-quantile plot. Include the plots in your submission.
change<-X2040w22autopsyweight$change
boxplot(change, main="Change in autopsy weight")
1. Comment on the whether the normality assumption of the one-sample t procedures is reasonablein this setting. You should make reference to the appropriate plot(s). If you feel there is a violation of the normality assumption, do you think it is still reasonable to use the t procedure here? Justify your position. For the remainder of this section, assume it is reasonable to use the t procedures
No, datapoints are not lying on the straight line, which implies variable does not follow normal distribution. therefor normality assumption has not met.
qqnorm(change)
qqline(change)
t.test(change)
##
## One Sample t-test
##
## data: change
## t = 7.7964, df = 119, p-value = 2.687e-12
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 8.075709 13.574291
## sample estimates:
## mean of x
## 10.825
Above results show that the p-value (2.687e-12) is less than .05. This suggests that the null hypothesis will be accepted, and the differences in body weight before and after death is significantly different from 10.825.
In many journals, when the authors report a result such as 16.8±1.4, the 1.4 is the standard error of the statistic, or the raw standard deviation of the data values, and not the margin of error. (Sometimes, especially for small sample sizes, it’s not trivial to figure out whether the authors are reporting the standard error of the statistic, or...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here