PRINCIPLES OF STATISTICAL INFERENCE (PSI) Biostatistics Collaboration of Australia 5.1 PRINCIPLES OF STATISTICAL INFERENCE (PSI) Module 5 (Chapter 7) Practical Exercises The first 4 of these exercises...

1 answer below »
It is exercise 5 only in the module 5 pdf that I need answered.


PRINCIPLES OF STATISTICAL INFERENCE (PSI) Biostatistics Collaboration of Australia 5.1 PRINCIPLES OF STATISTICAL INFERENCE (PSI) Module 5 (Chapter 7) Practical Exercises The first 4 of these exercises parallel the development in Chapter 7, while the last exercise is based on the Extended Example. You will find that these exercises have a different flavour from many of the previous modules, reflecting the fact that Bayesian inference is about model specification and calculation of posterior probabilities. It is not concerned with the creation of estimators and test statistics whose properties are then examined in repeated sampling. (Which perhaps begs the question as to whether the process of conducting analyses as a Bayesian might have poor frequentist properties— another of those bigger issues that we don’t have space for in this course!) For many of the probability calculations (mostly relating to the normal distribution) it is recommended that you use a computer program. Excel is probably the simplest option but any software of your choice is fine. The required exercise for submission is Exercise 5. Exercise 1: The simplest example of Bayes’ Rule is the calculation of probabilities relating to just two alternative values of a “parameter” based on a single binary observation. This calculation has a classical application in clinical medicine in obtaining the predictive value of a test result for a patient. This is not a statistical application, in the sense that the “data” consist of just one observation, the patient’s test result. Nevertheless, the calculation and thinking behind it are very important and connect directly to the statistical use of Bayes’ Rule. Suppose the patient either has a disease (D+) or does not (D-), and a diagnostic test can produce a positive (T+) or negative (T-) result. (a) After the test has been done, what is the probability that the patient has the disease? Write a formula for the two probabilities, corresponding to the two possible results, using Bayes’ Rule to express each probability in terms of the properties of the test, sensitivity (sn) = Pr(T+|D+) and specificity (sp) = Pr(T-|D-), and the pre-test probability that the patient has the disease, Pr(D+). Often the latter quantity is referred to as the prevalence of the disease: why? (b) Now assume that the test has sensitivity of 90% and specificity of 70%, and suppose that the patient has a positive test result. Create a table and/or a graph showing the posterior probability of disease (“positive predictive value”) for the following range of pre-test probabilities: Pr(D+) = 0.001, 0.01, 0.1, 0.2, 0.5, 0.8. For which of these values is the positive test result most useful? In what sense? Can the test result (the “data”) be meaningfully interpreted for any individual patient without using some assumption about the pre-test probability of disease? Biostatistics Collaboration of Australia 5.2 Exercise 2: The beta distribution arises naturally in Bayesian estimation of proportions, as explained in the notes. (a) Show that the uniform distribution is a beta distribution with α = β = 1. If this distribution is used as a prior distribution for estimating a prevalence from data represented as an observed proportion p and a sample size n, what are the parameters of the resulting beta posterior distribution? (b) Using the facts about the beta distribution given in the textbook Appendix, what are the mean, mode, and variance of the posterior distribution? (c) What is the expected value of a new observation from the population, i.e. what is the probability that a randomly selected individual will be positive on the outcome, given the data already collected? In particular, what is the expected value for the extreme cases where either all or none of the existing sample has been positive? (d) Suppose we take a sample of 100 children and 20 of them have a history of asthma. What is the posterior probability that the prevalence of asthma history in the population is greater than 15%? Find an answer under each of the following beta prior distributions: (i) Uniform (α = β = 1) (ii) α = β = 0.5 (iii) α = β = 0 (is this a proper prior distribution?) (iv) α = 1, β = 4 (v) α = 1, β = 9 Interpret the results in each case—you should find it helpful to sketch the prior and posterior distributions (remember, to do this you don’t need to worry about the constant term or normalising constant), and to use the properties of the beta distribution. How much influence do these prior distributions have? How strong are the prior distributions, in terms of the prior beliefs they embody? N.B. For the actual calculation, you will need to evaluate the incomplete beta integral, which may sound formidable but it is no more formidable than calculating normal curve probabilities if you have an appropriate table or computer program. In this case the most convenient option will probably be to use MS Excel (see the “BETA.DIST” function). (e) Confirm, algebraically (in general) or numerically (for each case), that the posterior mean for the prevalence lies between the prior mean and the empirical proportion, p. This is a general feature of Bayesian inference: the posterior distribution is centred at a location that is a compromise between the location of the prior distribution and the information in the data. How is this compromise controlled?—is the posterior mean closer to the observed proportion or to the prior mean? Biostatistics Collaboration of Australia 5.3 Exercise 3: The estimation of a normal mean using the conjugate normal prior distribution… (a) Fill in the missing algebraic steps for the expressions for P(θ|y) section 7.5.1 of the textbook. This will involve the ancient art of “completing the square” which you may have to dig out from your high-school maths! (b) Show that the posterior mean µ1 in (a) can be re-arranged into the following two alternative forms: 2 0 2 2 0 001 )( τσ τ µµµ + −+= y and 2 0 2 2 01 )( τσ σµµ + −+= yy . The first expression shows the posterior mean as the prior mean plus an adjustment towards the observed y, while the second shows the posterior mean as the observed y with an adjustment towards the prior mean. Both again indicate the essential feature of Bayesian estimation, that of providing “shrinkage estimates” or compromise between sources of information. Exercise 4: Refer back to the fertility study example discussed in previous modules. Recall that the data consist of n couples who take a total of y attempts to achieve a pregnancy (in this example none fail to become pregnant). (a) Show that the beta distribution is a conjugate prior for estimating the parameter p (here we only consider the case of a single group). Is there any difference between this problem and the prevalence estimation problem? This is an example where two problems have different sampling models (in the one case, binomial, in the other, geometric), but the likelihood functions from the two models are actually identical (ignoring the normalizing constants). It follows therefore that inferences about the “success rate” p are the same, from a likelihood or Bayesian point of view, whether the data arise from a binomial model (sampling n cases and recording x events) or from a geometric model (sampling n events and recording y trials). (b) With reference to Example 4.2 and Figure 4.2 in the textbook, how does a Bayesian analysis get around the fact that the MLE is at the boundary of the parameter space, i.e. at p = 1? You might experiment with some alternative prior distributions, but ultimately the main answer should be that the Bayesian analysis does not seek to produce “an estimate” but a posterior distribution (which may be summarized in a variety of ways). Biostatistics Collaboration of Australia 5.4 Exercise 5: This exercise is based on the Extended Example. Before attempting the questions below, make sure you have worked through the details of the Extended Example, including the checking of calculations where indicated. In your solution show any formula that you have used. Now suppose that the study described were continued until the sample size had doubled, with the resulting data shown in the following Table: Table. Observed data for (continued) randomised trial of HIV therapies Mono therapy Combination therapy Total Response x0 = 90 x1 = 114 204 No response 108 88 196 Total n0 = 198 n1 = 202 n = 400 (a) Work out the 3 (approximate) posterior distributions for the difference in response rates, as in the Extended Example notes, based on the new data, and obtain the posterior probability of the difference being greater than 0, 0.05 and 0.1, as in Table 2 of the notes. How do the interpretations change, from the perspective of the “sceptic” and the “enthusiast”? (b) Suppose you wish to adopt what you regard as a “realistic” prior distribution, which is normal in shape, but allows a priori for a 20% chance that the difference in rates is negative (i.e. favours the control monotherapy) and a 20% chance that it is greater than 0.1. Figure out the appropriate parameters for this normal prior distribution, and work these through to obtain the corresponding posterior distribution and posterior probabilities as before. How different are the conclusions from this fourth prior distribution to those obtained under each of the previous three prior distributions? HINT: For this exercise, it will be convenient to set up a spreadsheet in Microsoft Excel or similar or code in a software package to implement the formulas required to perform the various calculations. A final comment: you should have observed that with the larger sample size of these new data, the differences between the results under the range of prior distributions is less pronounced than it was for the smaller sample size. This illustrates a very important general fact: the more data that accumulate, the less important the prior distribution. All approaches to statistics agree that large sample sizes are the best strategy for accumulating reliable evidence!
Answered Same DayJun 13, 2021

Answer To: PRINCIPLES OF STATISTICAL INFERENCE (PSI) Biostatistics Collaboration of Australia 5.1 PRINCIPLES OF...

Rajeswari answered on Jun 14 2021
147 Votes
60408 Assignment
Exercise 5
     
    no,p0
    n1,p1
     
    Monotherapy
    Combination
    x
    90
    114
    n

    198
    202
    p-hat
    0.454545455
    0.564356436
    Var
    0.001252191
    0.00121712
     
     
     
    Theta_hat
    0.109810981
     
    Var(Theta)
    0.002469311
     
a)
Prior distribution of theta:
Right of 0.1.
Since we assumed both are equally effective we must have hypothetically
And variance = std dev = unknown..
(But we find std deviation and variance assuming this follows a std normal distribution)
Using this we get P(:
P(
P(
These can be obtained in excel...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here