Can you do the questions in this problem set?
Stat 310/Econ 307 Exam 3 - Page 2 of 6 Thursday, April 18 1. (10 points) Test Statistics: Suppose you have a random sample of size n from a N(µ,�2) PDF. The Neyman-Pearson Lemma gives the following 6 statistics (or slight variations thereof) as the best use of the data under various assumptions: (a) X̄ � µ0 �/ p n (b) X̄ � µ0 S/ p n (c) (n� 1)S2 � 2 0 (d) X̄ � µ �/ p n (e) X̄ � µ S/ p n (f) (n� 1)S2 �2 . Explain when and how you should use these statistics appropriately for testing and reporting confidence intervals. Hint: State the purpose and assumptions about the normal parameters. 2. (10 points) Power-Like Calculation: The Texas Department of Agriculture is responsible for inspecting gas station pumps to ensure their performance is within acceptable tolerances. Section 13.024 of the Texas statute defines the standard for liquid capacity to be (a) a gallon; (b) a barrel (31.5 gallons); or (c) a hogshead (2 barrels). NIST (National Institute for Standards and Technology) defines the basic tolerance for a gas pump that has been in service for more than 30 days to be one cubic inch, plus one cubic inch per each gallon indicated. Thus if we draw a 5-gallon test draft, the tolerance will be ±6 cubic inches, which converts to ±0.026 gallons. The test error on a single trial must not exceed this amount to certified. (Dispensing either too little or too much is considered a reason not to certify.) Consider a simple design of a gas pump that uses an impeller mechanism, dispensing a “quan- tum” amount of gas described by a Normal random variable X with mean µX = 0.01 gallon and standard deviation �X = 0.0003 gallon. Each such amount of gas dispensed is independent of the previous amount. Then the “actual” amount pumped when the gas pump reads 5 gallons can be modeled after 500 “clicks” as Y = 500X i=1 Xi . (a) (4 points) Justify this formula for a 5-gallon test. What is the PDF of Y ? Hint: Your answer should include µY and �Y as well as the form of the PDF. (b) (4 points) There is no adjustment possible for �X on the pump. It is fixed for an impeller pump. However, there is an adjustment for µX . What is the formula for the probability that the pump is certified, as µX is adjusted from 0.0099 to 0.0101? Graph it. (c) (2 points) If you wish to ensure that your pump is certified with probability at least 95%, what interval should µX fall in? Mark these points on your graph. Hint: An approximate answer from your graph is su�cient. Luis Perales Luis Perales Stat 310/Econ 307 Exam 3 - Page 3 of 6 Thursday, April 18 3. (10 points) Goodness-of-Fit Test: How good is the Uniform random generator provided with R? To really stress it, let us subject it to a rigorous Pearson goodness-of-fit test. Generate a large sample, {U1, U2, . . . , Un} ⇠ Unif(0, 1) of length n = 105, which you will count into 1000 bins, using the following commands: > set.seed(8736); n=1e5; nb=1000 (sample size and number of bins) > tk = seq(0,1, , nb+1) (bin boundaries; notice the extra comma) > nk = hist( runif(n), tk )$counts (nk contains the 1000 bin counts) You can check your bin counts: 105 113 97 ... 104 102 111. (a) (5 points) Perform a goodness-of-fit test at the 5% level of the hypotheses H0 : Ui ⇠ Unif(0, 1) versus H1 ; Ui 6⇠ Unif(0, 1) Hint: Give the critical region C, the test statistic, and your decision. (b) (5 points) What is the p-value for your experiment? Sketch it. 4. (10 points) Pooled Variance: Consider the two-sample t-test setup: {X1, X2, . . . , Xnx} and {Y1, Y2, . . . , Yny} are 2 random samples from N(µx,�2) and N(µy,�2), with all parameters unknown, but with �2x = � 2 y = � 2. We wish to test H0 : µx = µy versus H1 ; µx 6= µy . (a) (3 points) Show that both S 2 X = 1 nx � 1 nxX i=1 (Xi � X̄)2 and S2Y = 1 ny � 1 nyX i=1 (Yi � Ȳ )2 are both unbiased for �2. Hint: Just cite the appropriate result in our book. (b) (3 points) Conclude that the pooled variance estimator S 2 P = (nx � 1)S2X + (ny � 1)S2Y nx + ny � 2 is also unbiased for �2. Hint: Consider using the MGF technique. (c) (4 points) Compare the variances of S2X , S 2 Y , and and S 2 P . Which is best? Hint: Recall the first two moments of a �2p PDF are p and 2p. 5. (10 points) Test of Independence: One often hears stories of elderly individuals dying shortly after significant events, such as birthdays, anniversaries, or holidays. A study was conducted in California for the years 1960-1984 examining the mortality patterns of elderly Chinese women who died of natural causes immediately before and after the Chinese Harvest Moon Festival, for which the senior woman of the household plays a central ceremonial role. Their data were compared to elderly Jewish women, and are given in the following table: Luis Perales Stat 310/Econ 307 Exam 3 - Page 4 of 6 Thursday, April 18 Time of Death Chinese Jewish Total 2nd week before festival 55 141 196 1st week before festival 33 145 178 1st week after festival 70 139 209 2nd week after festival 49 161 210 Total 207 586 793 (a) (4 points) Set up an appropriate Pearson �2 hypothesis test of the independence of the rows and columns characteristics, giving the null and alternative hypotheses. What is the critical region at the ↵ = 5% level? (b) (3 points) What is the test statistic and your decision? Write down the matrix of expec- tations. (c) (2 points) What is the p-value of your test? (d) (1 point) Explain how the pattern in the table relates to your decision. 6. (15 points) Best Critical Region: Suppose you have a random sampleX = {X1, X2, . . . , Xn} from the geometric PMF pX(x) = p · (1� p)x�1 , x = 1, 2, . . . µX = 1 p and �2X = 1� p p . We wish to test H0 : p = p0 versus H1 ; p = p1 < p0="" .="" (thus="" we="" are="" wondering="" if="" the="" time="" to="" the="" first="" success="" is="" longer="" than="" thought.)="" (a)="" (5="" points)="" find="" the="" log-likelihood="" `(p|x).="" then="" use="" it="" in="" the="" neyman-pearson="" lemma="" to="" find="" the="" generic="" form="" of="" the="" best="" critical="" region,="" c.="" hint:="" you="" may="" assume="" x̄=""> 1. Why? (b) (5 points) Suppose n = 12 and p0 = 1 4 and p1 = 1 5 . Find the best critical region at the level ↵ = 5%. Hint: Use the CLT. (c) (5 points) What is the power of your test at the alternative hypothesis? 7. (10 points) Equivalence of Tests and Confidence Intervals: Suppose you have a random sample of size n from a Normal PDF with unknown mean and variance. You wish to test the null hypothesis H0 : µ = µ0 versus a two-sided alternative. Show that you reject the null hypothesis if-and-only-if the corresponding confidence interval for µ does not contain the µ0. Hint: Your answer should be both algebraic and graphical. Luis Perales Stat 310/Econ 307 Exam 3 - Page 5 of 6 Thursday, April 18 8. (10 points) ANOVA: A small brewery decided to experiment with label designs on its summer ale. Twenty stores of similar size were selected. The number of cases sold over a two-month period were recorded: Design Total 1 11 17 16 14 15 73 2 12 10 15 19 11 67 3 23 20 18 17 21 99 4 27 33 22 26 28 136 (a) (8 points) Test the null hypotheses that the mean number of sales at a store are the same for all four label designs, versus the alternative that some are di↵erent. Use ↵ = 1%. (b) (2 points) What is the p-value? 9. (10 points) Comparing Correlations: In addition to the Father (F) – Son (S) height data discussed in Chapter 1, Pearson and Lee also collected heights of Mothers (M) and Daughters (D). The sample correlations (which can be denoted by either ⇢̂ or R) are ⇢̂FD = 0.511 nFD = 1376 ⇢̂MS = 0.494 nMS = 1057 . The subscripts indicate the Father-Daughter and Mother-Son correlations, and the sample sizes upon which they are based. You may assume these correlation estimates are independent since based upon di↵erent families. (Recall the Father-Son correlation was also about 0.5.) (a) (7 points) Devise a level ↵ = 5% test for the hypotheses H0 : ⇢FD = ⇢MS versus H1 : ⇢FD 6= ⇢MS , What is the value of your test statistic? Your decision? Hint: This requires you putting several pieces (facts) together in a manner similar to some of the tests we have studied. Not a formal Neyman-Pearson procedure. Show your work. (b) (3 points) What is the p-value for these data? Explain how your answer is consistent with part (a). Hint: Highlight the area representing the p-value. 10. (15 points) Linear Regression: Using the data {(xi, yi), i = 1, . . . , 24} from the 1974–1997 Boston Marathon winning times (in minutes) for the women runners (n = 24), we summarize 24X i=1 xi = 47,652 24X i=1 (xi � x̄)2 = 1150 24X i=1 yi = 3620.867 24X i=1 (xi � x̄) yi = �1145.45 (a) (5 points) Find the best fitting least-squares straight line and add it to the scatter diagram provided below. Give the constants â and b̂, using the form given in Equation (8.26). (b) (5 points) Some of the winning times for years since 1997 are given in the table: Luis Perales Stat 310/Econ 307 Exam 3 - Page 6 of 6 Thursday, April 18 Year 2000 2003 2006 2009 2012 2015 2018 Time 146.2 145.3 143.6 152.3 151.8 144.9 159.9 We wish to see how well our linear equation predicted the “future.” Compute �̂2✏ and sketch 95% population intervals for the years 1980–2020. Add the data in the table to the scatter diagram. How far into the future do the predictions seem to hold? Comments? (c) (5 points) What is your prediction for 2019 (run on April 15)? How many standard deviations was the actual winning time of 143.48 minutes from your prediction? Hint: Give the value of Tn�1. Hints: In part (b), give the form of the intervals you have plotted. Do not worry about adjusting the ↵-level