MAST90044 Thinking and Reasoning with Data Semester 1 2020 Assignment 2 Due: 9 am, Monday 18 May • Assignments are to be submitted (uploaded) via Canvas. • Please label your assignment with the...

1 answer below »
Questions attached. To be done in R. 1 week so expect high quality descriptive work


MAST90044 Thinking and Reasoning with Data Semester 1 2020 Assignment 2 Due: 9 am, Monday 18 May • Assignments are to be submitted (uploaded) via Canvas. • Please label your assignment with the following information: – your name; – your student number; – your lab class; – your tutor’s name. • You do not need to hand in another plagiarism declaration. • Late assignments will only be accepted under exceptional circumstances and must be discussed with Dr Julia Polak. If it is a medical issue, a medical certificate will be required. A late penalty may be imposed. • Your assignment should show all relevant and brief working and reasoning, as marks will be given for method as well as for correct answers. Please spell check your document. • Paste any relevant and brief R code and output into the appropriate places so that it can be seen easily along with your other work. Graphics from R can be resized within your document; make them smaller as necessary. • Assignments count for 50% of the assessment in this subject. This one is worth 15%, and covers the work done in chapters 4 to 6. • The number of marks given for each question may be fine-tuned. The total number of marks for this assignment is 45. • Tutors will not help you directly with assignment questions. However, they may give some help with R. • Solutions to the assignment questions will be made available later. • When constructing a panel of graphs with multiple plots, it is good to use the R command par(mfrow = c(nrows,ncols)) where nrows is the number of rows and ncols the number of columns in the panel. The default is (1,1). MAST90044 Thinking and Reasoning with Data Assignment 2 Q1 The following table of frequencies shows age at first pregnancy by incidence of cervical cancer diagnosed in women aged 50–59. Reference: Graham S and Shotz W (1979), Epidemiology of cancer of the cervix in Buffalo, J National Cancer Inst 63(1):23–27. Control Cervical cancer Age at first 6 25 203 42 pregnancy > 25 114 7 (a) Enter the data into R and perform a chi-squared test of the association between age at first pregnancy and incidence of cervical cancer. Is this test justified here? Briefly explain. (b) Perform a test of the association using Fisher’s exact test and compare your conclusion here to that from part (a). Explain briefly when the Fisher test would be preferred to the chi-squared test of association. [6 + 6 = 12 marks] Q2 Ophthalmologists from Victoria and Western Australia have surveyed children in the Western Desert in Western Australia to assess the prevalence and severity of trachoma. The data below come from two years of a longitudinal survey. There are six stages of trachoma, of increasing severity. In this study, children were observed to have trachoma up to the fourth stage. The data below show the stages of trachoma including an additional level — those with no signs of trachoma. Stage 1993 2003 None 124 264 Stage 1: Follicular 88 46 Stage 2: Intense inflammatory 7 3 Stage 3: Trachomatous scarring 0 2 Stage 4: Trichiasis 2 0 (a) Perform a suitable test to examine the association between severity of trachoma and year of survey. What is your conclusion? (b) Assess the validity of a politician’s claim that the prevalence (widespread presence) in 2003 was 20%. [4 + 4 = 8 marks] 2 MAST90044 Thinking and Reasoning with Data Assignment 2 Q3 An investigator wished to determine whether epinephrine has the effect of elevating plasma cholesterol levels in humans. Twelve adult males were selected and given both a placebo and the drug. Blood samples were taken following injection of the placebo and again after injection of epinephrine. Analysis of the blood samples resulted in the following data: Cholesterol Levels (mg/100mL) subject placebo epinephrine 1 178 184 2 240 243 3 210 210 4 184 189 5 190 200 6 181 191 7 156 150 8 220 226 9 210 220 10 165 163 11 188 192 12 214 216 These data are also available in TRD=asst03data.csv on LMS. (a) Formulate an appropriate statistical model, defining all the terms. State the null and two-sided alternative hypotheses which reflect the research question of interest. (b) Enter the data into R, and calculate the means for placebo and epinephrine. Find a 95% confidence interval for the mean difference in cholesterol levels between the placebo and epinephrine. Use the confidence interval to test your null hypothesis. (c) Would a 99% confidence interval contain zero? Briefly explain. [4 + 4 + 2 = 10 marks] Q4 Transient hypothyroxinemia is a common finding in premature infants. It is not thought to have long- term consequences, or to require treatment. A study was performed to investigate whether it might have long-term effects, and to this end, blood thyroxine values were obtained on routine screening in the first week of life for a sample of infants who weighed 2000g or less at birth and were born at 33 weeks gestation or earlier. These results will later be related to motor and cognitive development. Our aim here is to develop a model to estimate the thyroxin level for a specified gestational age. The data are available in (TRD=asstQ4data.csv) on LMS: g.age thyroxine 30 8.1 28 7.2 31 9.2 ... ... (a) Read the data into R and produce an appropriate graphical summary (with meaningful labels) of the relationship between thyroxin level and gestational age. (b) Write down an appropriate statistical model for examining the relationship, and fit the model in R. (c) i. Give a non-statistical interpretation of the coefficient of g.age. ii. Find a 95% confidence interval for this coefficient. 3 MAST90044 Thinking and Reasoning with Data Assignment 2 iii. Is thyroxine level related to gestational age? Explain. iv. What percentage of the total variation in thyroxine level is explained by gestational age? (d) A record of a new baby became available. Find an interval within which the thyroxine level of this premature baby of gestational age 31 weeks is likely to lie. Use 95% confidence. (e) Examine appropriate diagnostic plots and comment on anything that is noteworthy or that may challenge the assumption of the model. [2 + 4 + 4 + 2 + 3 = 15 marks] Total marks = 45 4
Answered Same DayApr 30, 2021

Answer To: MAST90044 Thinking and Reasoning with Data Semester 1 2020 Assignment 2 Due: 9 am, Monday 18 May •...

Sourav answered on May 04 2021
132 Votes
Statistical Assignment:
Question: 1
Given information: A following frequency table (cross tab);
     
    Cervical Cancer diagnosed
    
    Control
    Cervical cancer
    Age at 1st
Pregnancy
    <= 25
    203
    42
    
    > 25
    114
    7
Solution [a]: Chi-Square Test
Null hypothesis, H0: There is no significance of association between the age at first pregnancy and incidence of cervical cancer.
Alternative hypothesis, H1: There is a significance of association between the age at first pregnancy and incidence of cervical cancer.
> x = c(203,4
2,114,7)
> mat = matrix(x,2,2,byrow = TRUE)
> colnames(mat) = c("Control","Cervical cancer")
> rownames(mat) = c("<= 25","> 25")
> chiq = chisq.test(mat,correct = FALSE)
> chiq$observed
Control Cervical cancer
<= 25 203 42
> 25 114 7
> chiq
    Pearson's Chi-squared test
data: mat
X-squared = 9.0107, df = 1, p-value = 0.002684
Conclusion : After performing the chi-square test on the cross tab, we have, p-value of the test = 0.002684 and which is less than the 0.05, so we shall reject the null hypothesis of no association between the age at first pregnancy and incidence of cervical cancer at 5% of level of significance and may conclude that both attributes are not independent, or there is significance relationship between them.
Solution [b]: Fisher Exact Test
Null hypothesis, H0: There is no significance of association between the age at first pregnancy and incidence of cervical cancer.
Alternative hypothesis, H1: There is a significance of association between the age at first pregnancy and incidence of cervical cancer.
> fisher.test(mat)
    Fisher's Exact Test for Count Data
data: mat
p-value = 0.002937
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.1091870 0.6977165
sample estimates:
odds ratio
0.297605
Conclusion: Since the p-value of the test is equal to 0.002937, therefor we shall not accept null hypothesis of independence and may conclude that both attribute age of first pregnancy and cervical cancer are related at 5% level of significance.
Explanation about comparison of Fisher test and chi-square test for independence: Both are used to determine if there is any association between two given categorical variables. The Chi-square test is used when the sample size large enough and in this case of testing, p-value is an approximation and becomes exact when the sample becomes infinite. And on the other hand, the fisher’s exact test is used when the sample size is small and in this case, the p-value is exact and is not an approximation.
    And the Chi-square is not appropriate approximation and not good enough when the expected values in one of the cells of the contingency table is less than 5, and in this case Fisher’s exact test is preferred. And in our case,
> chiq$expected
Control Cervical cancer
<= 25 212.1995 32.80055
> 25 104.8005 16.19945
none of the expected cells value in above (contingency table) is less than 5, so we have to preferred chi-square test for independence.
Question: 2
Given Information: a two table (cross table),
     
    1993
    2003
    None
    124
    264
    Stage 1: Folicular
    88
    46
    Stage 2: Intense Inflammatory
    7
    3
    Stgae 3: Trachomatous scarring
    0
    2
    Stage 4: Trichiasis
    2
    0
Solution [a]:
Null hypothesis, H0: There is no significance of association between the severity of trachoma and year of survey.
Alternative hypothesis, H1: There is significance of association between the severity of trachoma and year of survey.
> x = c(124,264,88,46,7,3,0,2,2,0)
> mat = matrix(x,5,2,byrow = TRUE)
> colnames(mat) = c("1993","2003")
> rownames(mat) = c("None","Stage 1: Folicular","Stage 2: Intense Inflammatory",
+ "Stgae 3: Trachomatous scarring","Stage 4: Trichiasis")
> chiq = chisq.test(mat,correct = FALSE)
Warning message:
In chisq.test(mat, correct = FALSE) :
Chi-squared approximation may be incorrect
> chiq$expected
1993 2003
None 159.9776119 228.022388
Stage 1: Folicular 55.2500000 78.750000
Stage 2: Intense Inflammatory 4.1231343 5.876866
Stgae 3: Trachomatous scarring 0.8246269 1.175373
Stage 4: Trichiasis 0.8246269 1.175373
> fisher.test(mat)
    Fisher's Exact Test for Count Data
data: mat
p-value = 2.656e-12
alternative hypothesis: two.sided
Conclusion: Since some of expected cell value is less than 5, so it is good to use Fisher’s Exact test to observe the association between the variables. And we got know that p-value is 2.656e-12 which is less than 0.05, so we will not accept the null hypothesis of association and may conclude that both the severity of trachoma and year of survey are dependent.
Solution [b]: Prevalence = is the proportion of the population with a given disease or condition over a specific period of time. And period prevalence is defined as number of cases that existed in a given period / Number of people in the population during this period.
H0: The prevalence rate in 2003 is 20%. i.e. P0 = Pe . Or P0 = 20%
H1: The prevalence rate in 2003 is not 20%. i.e. P0 ≠ Pe . Or P0≠20%
> x = 51
> n = 315
> prop.test(x,n,p=0.2,correct = FALSE)
    1-sample proportions test without continuity correction
data: x out of n, null...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here