ST 502 R project 3 For this project you will work in groups of 2. The project involves involves analyzing a data set using the chi-square test for homogeneity, deriving this LRT, and conducting a...

Simulation part


ST 502 R project 3 For this project you will work in groups of 2. The project involves involves analyzing a data set using the chi-square test for homogeneity, deriving this LRT, and conducting a Monte Carolo simulation to determine properties of a similar test. You and your partner will create a final report to turn in. Please make sure that your R file follows the guidelines on moodle. If these guidelines are not met, you will lose credit. You can submit a .R or a .Rmd file for the code portion You should submit an HTML or PDF file for the report portion. Data Example Consider three different hospitals. Each hospital has patients that end up with infections. Suppose we have the following data: Hospital Surgical Site Infections Pneumonia Infections Bloodstream Infections Total A 41 27 51 119 B 36 3 40 79 C 169 106 109 384 Here we might consider that for a given hospital, we have a random sample with a fixed number of trials. This implies that we have three separate multinomial distributions, one for each hospital. We might have an interest in whether or not the multinomials are homogenous across the hospitals. Use R to conduct a chi-square test for homogeneity using this data. You’ll need to manually create the table (probably easiest to just use the matrix() function - leave off the total column). I’d like you to manually calculate the LRT statistic, the Pearson Chi-square statistic, the critical value, and find approximate p-values for the hypotheses using both test statistics (they will be very small). Derivation Ok, we’ve used the likelihood ratio test. Let’s derive it! Consider the generic case of comparing J independent multinomial distributions, each with I categories. Multinomial Cat 1 Cat 2 ... Cat I Sample size 1 π11 π21 ... πI1 n1 2 π12 π22 ... πI2 n2 ... ... ... ... ... ... J π1J π2J ... πIJ nJ We want to test if H0 : π11 = π12 = ... = π1J , π21 = π22 = ... = π2J , and , πI1 = πI2 = ... = πIJ vs HA : At least some probabilities differ The likelihood here (in the general case) is just the product of the J multinomials. Under the null hypothesis, π11 = π12 = ... = π1J so we can just replace this with a common π1. Similarly, we can just consider having π1, ..., πI . Derive the likelihood ratio test for the homogeneity test. Remember it should come out to be LRT = 2 J∑ j=1 I∑ i=1 Obsij ln ( Obsij Expij ) with approximate large sample distribution given by a χ2(I−1)(J−1) and expected cell counts given by n•jni•/n where n is the total sample size. 1 Note: We did some of the details (like looking at the null max in the notes). You need to derive the test form first and then reproduce other relevant parts. Simulation The Pearson statistic can be derived as a Taylor series approximation to the LRT. For the last part of the project, we’ll investigate the α control of the Pearson chi-square test and its power (so we don’t have to worry about the ln(0) that can sometimes pop up for the LRT). Goal of simulation study: • Determine how well the asymptotic rejection region performs at controlling α • Determine the power of the asymptotic test when comparing certain alternative situations Setup: • Two multinomial case only, where each multinomial has three categorie • All combinations of four sample sizes for each multinomial (16 total cases) n1 = 20, 30, 50, 100 n2 = 20, 30, 50, 100 • Three different probabilities that may generate a particular multinomial: p1 = 1/3, 1/3, 1/3 (equal) p2 = 1/10, 3/10, 6/10 (mixed 1) p3 = 1/10, 1/10, 8/10 (mixed 2) • Use 50000 randomly created tables (but start with a much smaller number until you get your code working) • Add 0.5 to any expected counts that end up being 0 so as to avoid the divide by 0 case To determine α control, you should generate data where both multinomials come from the same p vector. This should be done for each of the sample size combinations (16 total situations where both multinomials are generated from p1 for instance). In total, you’ll have 48 simulated α values. You should create a plot similar to the one below to summarize. To inspect the power, we’ll use the same sample sizes and probabilities, but we’ll vary the probabilities used rather than using the same one for each multinomial. • Compare Equal vs Mixed 1 2 • Compare Equal vs Mixed 2 • Compare Mixed 1 vs Mixed 2 You should summarize your results into something similar to that below. Some coding hints: • rmultinom(1, size, prob) can be used to generate one multinomial sample • Two calls of this (using appropriate size and prob) would create a single ‘table’ to be analyze • You can combine the two samples using cbind() and then transpose it using t() to get it in a similar form to how you analyzed the hospital data • Calculate the Pearson chi-square value and compare it to the appropriate theoretical cut-off (returns a TRUE/FALSE) • If you wrap all of the above into a replicate(N, { code to do above }), you can then just take the mean of the result to find the approximated alpha or power value • You can then copy and paste this a bunch of times or wrap that process in a function that allows you to change n1, n2, prob1, prob2 (corresponding to the two multinomials) Report You should then write up all of the above into a coherent report with the following pieces: • Introduce the general idea of testing for homogeneity in an introduction • Analyze the dataset given and briefly discuss what the results are • Derive the homogenity test (you should use math type, latex, markdown, etc. to typeset your math symbols) • Describe the simulation you will do. Your report should include the R code in the text or in an appendix. The plots should be within the text with a brief discussion of the results. That’s it! You’ve then finished ST 502 - woot 3 ST 502 R project 3 Data Example Derivation Simulation Report
May 06, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here