Answer To: MAST90044 Thinking and Reasoning with Data Semester 1 2020 Assignment 2 Due: 9 am, Monday 18 May •...
Sourav answered on May 04 2021
Statistical Assignment:
Question: 1
Given information: A following frequency table (cross tab);
Cervical Cancer diagnosed
Control
Cervical cancer
Age at 1st
Pregnancy
<= 25
203
42
> 25
114
7
Solution [a]: Chi-Square Test
Null hypothesis, H0: There is no significance of association between the age at first pregnancy and incidence of cervical cancer.
Alternative hypothesis, H1: There is a significance of association between the age at first pregnancy and incidence of cervical cancer.
> x = c(203,42,114,7)
> mat = matrix(x,2,2,byrow = TRUE)
> colnames(mat) = c("Control","Cervical cancer")
> rownames(mat) = c("<= 25","> 25")
> chiq = chisq.test(mat,correct = FALSE)
> chiq$observed
Control Cervical cancer
<= 25 203 42
> 25 114 7
> chiq
Pearson's Chi-squared test
data: mat
X-squared = 9.0107, df = 1, p-value = 0.002684
Conclusion : After performing the chi-square test on the cross tab, we have, p-value of the test = 0.002684 and which is less than the 0.05, so we shall reject the null hypothesis of no association between the age at first pregnancy and incidence of cervical cancer at 5% of level of significance and may conclude that both attributes are not independent, or there is significance relationship between them.
Solution [b]: Fisher Exact Test
Null hypothesis, H0: There is no significance of association between the age at first pregnancy and incidence of cervical cancer.
Alternative hypothesis, H1: There is a significance of association between the age at first pregnancy and incidence of cervical cancer.
> fisher.test(mat)
Fisher's Exact Test for Count Data
data: mat
p-value = 0.002937
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.1091870 0.6977165
sample estimates:
odds ratio
0.297605
Conclusion: Since the p-value of the test is equal to 0.002937, therefor we shall not accept null hypothesis of independence and may conclude that both attribute age of first pregnancy and cervical cancer are related at 5% level of significance.
Explanation about comparison of Fisher test and chi-square test for independence: Both are used to determine if there is any association between two given categorical variables. The Chi-square test is used when the sample size large enough and in this case of testing, p-value is an approximation and becomes exact when the sample becomes infinite. And on the other hand, the fisher’s exact test is used when the sample size is small and in this case, the p-value is exact and is not an approximation.
And the Chi-square is not appropriate approximation and not good enough when the expected values in one of the cells of the contingency table is less than 5, and in this case Fisher’s exact test is preferred. And in our case,
> chiq$expected
Control Cervical cancer
<= 25 212.1995 32.80055
> 25 104.8005 16.19945
none of the expected cells value in above (contingency table) is less than 5, so we have to preferred chi-square test for independence.
Question: 2
Given Information: a two table (cross table),
1993
2003
None
124
264
Stage 1: Folicular
88
46
Stage 2: Intense Inflammatory
7
3
Stgae 3: Trachomatous scarring
0
2
Stage 4: Trichiasis
2
0
Solution [a]:
Null hypothesis, H0: There is no significance of association between the severity of trachoma and year of survey.
Alternative hypothesis, H1: There is significance of association between the severity of trachoma and year of survey.
> x = c(124,264,88,46,7,3,0,2,2,0)
> mat = matrix(x,5,2,byrow = TRUE)
> colnames(mat) = c("1993","2003")
> rownames(mat) = c("None","Stage 1: Folicular","Stage 2: Intense Inflammatory",
+ "Stgae 3: Trachomatous scarring","Stage 4: Trichiasis")
> chiq = chisq.test(mat,correct = FALSE)
Warning message:
In chisq.test(mat, correct = FALSE) :
Chi-squared approximation may be incorrect
> chiq$expected
1993 2003
None 159.9776119 228.022388
Stage 1: Folicular 55.2500000 78.750000
Stage 2: Intense Inflammatory 4.1231343 5.876866
Stgae 3: Trachomatous scarring 0.8246269 1.175373
Stage 4: Trichiasis 0.8246269 1.175373
> fisher.test(mat)
Fisher's Exact Test for Count Data
data: mat
p-value = 2.656e-12
alternative hypothesis: two.sided
Conclusion: Since some of expected cell value is less than 5, so it is good to use Fisher’s Exact test to observe the association between the variables. And we got know that p-value is 2.656e-12 which is less than 0.05, so we will not accept the null hypothesis of association and may conclude that both the severity of trachoma and year of survey are dependent.
Solution [b]: Prevalence = is the proportion of the population with a given disease or condition over a specific period of time. And period prevalence is defined as number of cases that existed in a given period / Number of people in the population during this period.
H0: The prevalence rate in 2003 is 20%. i.e. P0 = Pe . Or P0 = 20%
H1: The prevalence rate in 2003 is not 20%. i.e. P0 ≠ Pe . Or P0≠20%
> x = 51
> n = 315
> prop.test(x,n,p=0.2,correct = FALSE)
1-sample proportions test without continuity correction
data: x out of n, null...