Section A 1. Does the overlap of the confidence intervals mean that the evidence should have made Alison’s suspicion about the link between the ink and cancer less strong? Explain why, or why not....


Section A
1.
Does the overlap of the confidence intervals mean that the evidence should have made Alison’s
suspicion about the link between the ink and cancer less strong? Explain why, or why not.
(Set aside, for the moment, the fact that the confidence intervals are badly made.) An overlap of
the confidence intervals is not a reason, of itself, to think that data do not support a difference between
groups. Each confidence interval relates to the mean of the population from which the sample is
drawn, not to the mean of another group or to the extremes of another confidence interval. When there
are two 95% confidence intervals from two independent sets of observations, the intervals would
generally overlap by about a quarter when a P-value for a test of difference between independent
means would be about 0.05. When 95% confidence intervals just touch, the corresponding P-value is
about 0.01. However, notice that the confidence intervals overlap by quite a bit more than a quarter, so
a corresponding P-value would not be very small.
The data are consistent with Alison’s suspicion about a link between ink and cancer and so
Alison’s suspicion should not be weakened by the data. A failure to find strong evidence for a
difference is not the same thing as finding evidence for no difference. It would be a mistake to say
that the data argue against a link between ink-exposure and cancer even if they don’t argue forcefully
in favour of the link.
2.
There are many ways to calculate confidence intervals for binomial proportions. What method did
Alison use? Explain why that method might or might not be considered appropriate in this case?
Alison used the normal approximation method that is often called the Wald method. That
method was proposed because it was relatively difficult to calculate other, more appropriate, intervals.
Nowadays the better intervals are trivially easy to calculate and so the Wald method should be
discarded.
The Wald method works OK (i.e. has reasonably good coverage of the true value) when (i) the
sample size is large (statements regarding how large vary wildly!) and (ii) the observed proportion is
not near 1 or 0. In this case the sample size is not large and one of the proportions is very close to 0,
so the method is a very bad choice. The Wald intervals can be expected to be too narrow (particularly
for the 1/20 proportion) and to have unreliable coverage properties, which is to say that the true value
of the parameter will fall outside the interval more often than the nominal 5% of the time.
3.
Calculate 95% confidence intervals using a different method. Name the method and explain any
advantages it offers over the method used by Alison. (Cite any necessary references for this
question.)
Here are the most common better alternatives for 95% confidence intervals of proportions:
I would consider all to be good enough, and there are other good enough methods besides.
Notice that all give much higher upper limits for the 1/20 proportion than the Wald method.
The advantages are largely the absence of the disadvantages of the Wald method, but there are a
couple of other minor specifics. The Wilson methods offer almost exact coverage rates on average,
and the Clopper-Pearson method offers coverage that is never less than the nominal 95%. The Agresti-
Coull method is easier to calculate by hand as it is a correction of the Wald method.
Cited papers could be any reliable-sounding sources, but notice that there is a paper on this
topic on the LMS: the famous (!) Ludbrook & Lew (2009).6.
The paired test is more appropriate because the data are clearly from a within-subject (jar),
before and after protocol. The paired analysis uses that within subject information to reduce the
variability of the test statistic and gives a more powerful analysis.
Section B
4.
What potential problems do you see in James’s data, and how would you investigate them if you had
access to James and his notebook?
The problem in James’s data is an apparently aberrant value in technical rep 3 of jar 4 before
the intervention.
It is clearly an ‘outlier’ and is responsible for both the high value of the mean for jar 4 before and
the standard deviation.
(There is also an inconsequential rounding error in one of the standard deviations.)
If one had access to Jaqmes’s lab notebook one would look for evidence of a mistake in the value,
such as a transcription error or a decimal place error. If the value was correct then one might expect a
note in the book saying “OMG” or the like. The fact that the stem mentions hand written values and
transcription into Excel serves as both a clue and reason to suspect such an error.
5.
Assuming that you have identified one or more problems in the data, how would you address it or
them? Provide a justification for your response.
The outlier can be addressed by either omission or by correction. If a correct value can be
ascertained from the lab notebook then that would be best. Otherwise, omission is probably easily
justified as it is only a technical replicate and omission should not substantially bias the overall result.
Correction to 1.11 is reasonable because an out by ten error is easily made and that value fits well
Wilson’s interval (‘scores’)
Wilson (continuity correction)
Clopper-Pearson (‘exact’)
Agresti-Coull
0.009, 0.236
0.003, 0.269
0.001, 0.249
-0.009, 0.254
0.112, 0.469
0.096, 0.494
0.087, 0.491
0.108, 0.473
page 5
within the overall dataset. Similarly omission of that one technical replicate has little impact on the
overall dataset.
Given the apparently cheap and easy nature of the experiment, a second type of response to the
problem would be to simply repeat the whole experiment.
6.
There are several variants of Student’s t-test. Which do you think would be most appropriate for these
data? Why?
The experimental design is paired in that there are before and after measurements for each jar.
The variability between jars can be expected to be present in the before and after measurements and
so a paired analysis might be more powerful than an unpaired. Thus I would suggest that a paired
(dependent samples) Student’s t-test would be most appropriate.
If it was decided not the use a paired test then a Welch’s test (Welch-Scatterthwaite) is probably
best as the variability between jars is greater in the after group than the before group and an increase
in variability is a predictable consequence of the experimental procedure.
(The fact that an ANOVA may be appropriate is not relevant to this question.)
7.
Reanalyse the data (after dealing with any data problems that you identified) and provide a full
interpretation of the results.
The re-analysis should deal with the outlier by omission or by correction to a sensible value of
1.1, 1.11, or the average of the other two technical replicates.
The analysis indicated in question 8 will be implemented in most cases, but a permutations test or
a (paired) Wilcoxon signed rank test would be fine, and an ANOVA may alternatively be used as long
as the pairing in the data is respected. It would also be possible to use a confidence interval of the
difference between the means in place of a significance or hypothesis test, but confidence intervals of
the before and after values would not be very helpful (see question 1).
Most of the value of this question lies in the interpretation of the results. A bald statement of
significance or not significance, or a P-value, would be inadequate. The results should be interpreted
in terms of the effect on mayonnaise of the treatment, but a dichotomisation into effective and not
effective is not very useful.
8.
Assuming you were asked to re-test the effect of the thickener on the viscosity of mayonnaise, what
changes would you make to the protocol and or analyses? Provide a full justification for each of
your suggestions.
Controls: The protocol does not include a control for the stirring, so a very important change
would be to add a control group that includes stirring without the thickener (a ‘vehicle’ addition is
page 6
ideal). The most straightforward analysis would then be to compare the post-stirring groups, but an
ANOVA would allow for a comparison of the effects of stirring with and without the thickener.
Many students are likely to suggest increasing the sample size, but judging from the data in hand
the effect is fairly large and so it doesn’t need a larger sample. (Large samples are not uniformly
‘better’ than small.)
Because the agent is allegedly a thickener it would be reasonable to use a one-sided test. The
down-side of that would be that some readers might be skeptical about whether the decision to use a
one-tailed test was made before the experiment, as should be the case to avoid inflation of type I error
rates.
The practical importance of the results would be enhanced by testing a range of doses of the
thickener.
Alterations to the buying strategy are not very important.
Inclusion of other brands of mayonnaise might extend the scope of inference, but may be
irrelevant to the aim of the experiment.
Section C
9.
The standard deviations of the treated and untreated groups are very different. What
consequences might that hold for statistical analysis of these data? What approaches might
be considered to deal with it?
The consequences of a difference in variance between groups is that statistical test methods that
assume equal variances will not perform ‘as advertised’. In most cases what happens is that the power
of the test to detect, or provide evidence of, an effect of the intervention is lower than expected.
(When sample size varies between the groups it is possible for the false positive error rate for a
procedure to increase when there is unequal variance in the groups. That is not a concern here.)
Two categories of approach to deal with the problem are worth considering. First, use a method
that is not adversely affected by the disparity in variance and, second, transform the data to ameliorate
the difference in variance.
In the first category, use of a Mann-Whitney U-test in place of a Student’s t-test would
eliminate the influence of the disparity in variance. That suggestion will be somewhat controversial,
as it is often said that the Mann-Whitney test assumes equal population shapes which would imply
equal variance. However, that assumption is only necessary where it is desired to interpret the
outcomes in terms of equality of the population medians. If an interpretation of ‘stochastic
dominance’ will suffice (as it will for us here) then the Mann-Whitney is unaffected by unequal
variance. Other methods that utilise the ranks of data in place of the values will also be unaffected by
the variance difference, but that is not true of all ‘non-parametric’ methods. It is the use of ranks that
eliminates the problem, not the non-parametric-ness.





Oct 07, 2019
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here