Stat 310/Econ 307 Exam 3 - Page 2 of 6 Thursday, April 18 1. (10 points) Test Statistics: Suppose you have a random sample of size n from a N(µ,�2) PDF. The Neyman-Pearson Lemma gives the following 6...

1 answer below »
Can you do the questions in this problem set?


Stat 310/Econ 307 Exam 3 - Page 2 of 6 Thursday, April 18 1. (10 points) Test Statistics: Suppose you have a random sample of size n from a N(µ,�2) PDF. The Neyman-Pearson Lemma gives the following 6 statistics (or slight variations thereof) as the best use of the data under various assumptions: (a) X̄ � µ0 �/ p n (b) X̄ � µ0 S/ p n (c) (n� 1)S2 � 2 0 (d) X̄ � µ �/ p n (e) X̄ � µ S/ p n (f) (n� 1)S2 �2 . Explain when and how you should use these statistics appropriately for testing and reporting confidence intervals. Hint: State the purpose and assumptions about the normal parameters. 2. (10 points) Power-Like Calculation: The Texas Department of Agriculture is responsible for inspecting gas station pumps to ensure their performance is within acceptable tolerances. Section 13.024 of the Texas statute defines the standard for liquid capacity to be (a) a gallon; (b) a barrel (31.5 gallons); or (c) a hogshead (2 barrels). NIST (National Institute for Standards and Technology) defines the basic tolerance for a gas pump that has been in service for more than 30 days to be one cubic inch, plus one cubic inch per each gallon indicated. Thus if we draw a 5-gallon test draft, the tolerance will be ±6 cubic inches, which converts to ±0.026 gallons. The test error on a single trial must not exceed this amount to certified. (Dispensing either too little or too much is considered a reason not to certify.) Consider a simple design of a gas pump that uses an impeller mechanism, dispensing a “quan- tum” amount of gas described by a Normal random variable X with mean µX = 0.01 gallon and standard deviation �X = 0.0003 gallon. Each such amount of gas dispensed is independent of the previous amount. Then the “actual” amount pumped when the gas pump reads 5 gallons can be modeled after 500 “clicks” as Y = 500X i=1 Xi . (a) (4 points) Justify this formula for a 5-gallon test. What is the PDF of Y ? Hint: Your answer should include µY and �Y as well as the form of the PDF. (b) (4 points) There is no adjustment possible for �X on the pump. It is fixed for an impeller pump. However, there is an adjustment for µX . What is the formula for the probability that the pump is certified, as µX is adjusted from 0.0099 to 0.0101? Graph it. (c) (2 points) If you wish to ensure that your pump is certified with probability at least 95%, what interval should µX fall in? Mark these points on your graph. Hint: An approximate answer from your graph is su�cient. Luis Perales Luis Perales Stat 310/Econ 307 Exam 3 - Page 3 of 6 Thursday, April 18 3. (10 points) Goodness-of-Fit Test: How good is the Uniform random generator provided with R? To really stress it, let us subject it to a rigorous Pearson goodness-of-fit test. Generate a large sample, {U1, U2, . . . , Un} ⇠ Unif(0, 1) of length n = 105, which you will count into 1000 bins, using the following commands: > set.seed(8736); n=1e5; nb=1000 (sample size and number of bins) > tk = seq(0,1, , nb+1) (bin boundaries; notice the extra comma) > nk = hist( runif(n), tk )$counts (nk contains the 1000 bin counts) You can check your bin counts: 105 113 97 ... 104 102 111. (a) (5 points) Perform a goodness-of-fit test at the 5% level of the hypotheses H0 : Ui ⇠ Unif(0, 1) versus H1 ; Ui 6⇠ Unif(0, 1) Hint: Give the critical region C, the test statistic, and your decision. (b) (5 points) What is the p-value for your experiment? Sketch it. 4. (10 points) Pooled Variance: Consider the two-sample t-test setup: {X1, X2, . . . , Xnx} and {Y1, Y2, . . . , Yny} are 2 random samples from N(µx,�2) and N(µy,�2), with all parameters unknown, but with �2x = � 2 y = � 2. We wish to test H0 : µx = µy versus H1 ; µx 6= µy . (a) (3 points) Show that both S 2 X = 1 nx � 1 nxX i=1 (Xi � X̄)2 and S2Y = 1 ny � 1 nyX i=1 (Yi � Ȳ )2 are both unbiased for �2. Hint: Just cite the appropriate result in our book. (b) (3 points) Conclude that the pooled variance estimator S 2 P = (nx � 1)S2X + (ny � 1)S2Y nx + ny � 2 is also unbiased for �2. Hint: Consider using the MGF technique. (c) (4 points) Compare the variances of S2X , S 2 Y , and and S 2 P . Which is best? Hint: Recall the first two moments of a �2p PDF are p and 2p. 5. (10 points) Test of Independence: One often hears stories of elderly individuals dying shortly after significant events, such as birthdays, anniversaries, or holidays. A study was conducted in California for the years 1960-1984 examining the mortality patterns of elderly Chinese women who died of natural causes immediately before and after the Chinese Harvest Moon Festival, for which the senior woman of the household plays a central ceremonial role. Their data were compared to elderly Jewish women, and are given in the following table: Luis Perales Stat 310/Econ 307 Exam 3 - Page 4 of 6 Thursday, April 18 Time of Death Chinese Jewish Total 2nd week before festival 55 141 196 1st week before festival 33 145 178 1st week after festival 70 139 209 2nd week after festival 49 161 210 Total 207 586 793 (a) (4 points) Set up an appropriate Pearson �2 hypothesis test of the independence of the rows and columns characteristics, giving the null and alternative hypotheses. What is the critical region at the ↵ = 5% level? (b) (3 points) What is the test statistic and your decision? Write down the matrix of expec- tations. (c) (2 points) What is the p-value of your test? (d) (1 point) Explain how the pattern in the table relates to your decision. 6. (15 points) Best Critical Region: Suppose you have a random sampleX = {X1, X2, . . . , Xn} from the geometric PMF pX(x) = p · (1� p)x�1 , x = 1, 2, . . . µX = 1 p and �2X = 1� p p . We wish to test H0 : p = p0 versus H1 ; p = p1 < p0="" .="" (thus="" we="" are="" wondering="" if="" the="" time="" to="" the="" first="" success="" is="" longer="" than="" thought.)="" (a)="" (5="" points)="" find="" the="" log-likelihood="" `(p|x).="" then="" use="" it="" in="" the="" neyman-pearson="" lemma="" to="" find="" the="" generic="" form="" of="" the="" best="" critical="" region,="" c.="" hint:="" you="" may="" assume="" x̄=""> 1. Why? (b) (5 points) Suppose n = 12 and p0 = 1 4 and p1 = 1 5 . Find the best critical region at the level ↵ = 5%. Hint: Use the CLT. (c) (5 points) What is the power of your test at the alternative hypothesis? 7. (10 points) Equivalence of Tests and Confidence Intervals: Suppose you have a random sample of size n from a Normal PDF with unknown mean and variance. You wish to test the null hypothesis H0 : µ = µ0 versus a two-sided alternative. Show that you reject the null hypothesis if-and-only-if the corresponding confidence interval for µ does not contain the µ0. Hint: Your answer should be both algebraic and graphical. Luis Perales Stat 310/Econ 307 Exam 3 - Page 5 of 6 Thursday, April 18 8. (10 points) ANOVA: A small brewery decided to experiment with label designs on its summer ale. Twenty stores of similar size were selected. The number of cases sold over a two-month period were recorded: Design Total 1 11 17 16 14 15 73 2 12 10 15 19 11 67 3 23 20 18 17 21 99 4 27 33 22 26 28 136 (a) (8 points) Test the null hypotheses that the mean number of sales at a store are the same for all four label designs, versus the alternative that some are di↵erent. Use ↵ = 1%. (b) (2 points) What is the p-value? 9. (10 points) Comparing Correlations: In addition to the Father (F) – Son (S) height data discussed in Chapter 1, Pearson and Lee also collected heights of Mothers (M) and Daughters (D). The sample correlations (which can be denoted by either ⇢̂ or R) are ⇢̂FD = 0.511 nFD = 1376 ⇢̂MS = 0.494 nMS = 1057 . The subscripts indicate the Father-Daughter and Mother-Son correlations, and the sample sizes upon which they are based. You may assume these correlation estimates are independent since based upon di↵erent families. (Recall the Father-Son correlation was also about 0.5.) (a) (7 points) Devise a level ↵ = 5% test for the hypotheses H0 : ⇢FD = ⇢MS versus H1 : ⇢FD 6= ⇢MS , What is the value of your test statistic? Your decision? Hint: This requires you putting several pieces (facts) together in a manner similar to some of the tests we have studied. Not a formal Neyman-Pearson procedure. Show your work. (b) (3 points) What is the p-value for these data? Explain how your answer is consistent with part (a). Hint: Highlight the area representing the p-value. 10. (15 points) Linear Regression: Using the data {(xi, yi), i = 1, . . . , 24} from the 1974–1997 Boston Marathon winning times (in minutes) for the women runners (n = 24), we summarize 24X i=1 xi = 47,652 24X i=1 (xi � x̄)2 = 1150 24X i=1 yi = 3620.867 24X i=1 (xi � x̄) yi = �1145.45 (a) (5 points) Find the best fitting least-squares straight line and add it to the scatter diagram provided below. Give the constants â and b̂, using the form given in Equation (8.26). (b) (5 points) Some of the winning times for years since 1997 are given in the table: Luis Perales Stat 310/Econ 307 Exam 3 - Page 6 of 6 Thursday, April 18 Year 2000 2003 2006 2009 2012 2015 2018 Time 146.2 145.3 143.6 152.3 151.8 144.9 159.9 We wish to see how well our linear equation predicted the “future.” Compute �̂2✏ and sketch 95% population intervals for the years 1980–2020. Add the data in the table to the scatter diagram. How far into the future do the predictions seem to hold? Comments? (c) (5 points) What is your prediction for 2019 (run on April 15)? How many standard deviations was the actual winning time of 143.48 minutes from your prediction? Hint: Give the value of Tn�1. Hints: In part (b), give the form of the intervals you have plotted. Do not worry about adjusting the ↵-level
Answered Same DayApr 30, 2021

Answer To: Stat 310/Econ 307 Exam 3 - Page 2 of 6 Thursday, April 18 1. (10 points) Test Statistics: Suppose...

Rajeswari answered on May 01 2021
137 Votes
39833 Stats assignment
Q.No.1
Under the assumptions that conditions for normality are met, sample sizes are large and samples are drawn without bias.
a) This is used for testing sample mean with population mean assumed as average of all sample means as mu 0, when population std deviation is
known and n is the sample size.
b) This is used when population std deviation is not known and so sample std deviation is used
c) This is used for testing variance with a predicted value where s^2 is sample variance and sigma0^2 is the value of variance estimated from a number of samples, which is claimed under null hypothesis.
d) Here population mean and population sd deviation are known and testing is done for mean value
e) This is when population mean is known but std deviation is not known and hypothesis done for testing of sample mean with population mean.
f) This is for testing of variance where population variance sigma^2 is know and used in null hypothesis.
Q.No.2
Hence E(Y) = E(X1)+E(X2)+…+E(x500) = 500(0.01) = 5
Var(Y) = Var(X1)+…Var(X500) =500(0.003) = 1.5
Hence Y is normal with
This will be a normal distribution as sample size is large 100.
b) When
we have
c) for 95% confidence we have std error =1.5
Margin of error = 1.5/sq rt 500 *(1.96)=0.1314
Hence confidence interval = (5-0.1314,5+0.1314)
= (4.8686, 5.1314)
QNo.3
For problem 3, if you give data set for 100 entries fully, I can do it. You gave some commands of computer which I am unable to comprehend.
Q.No4
a) to prove that s^2 is an unbiased estimator of population variance.
Where
Let us prove this for X first. Then on the same lines we can say that this is true for y.
We have
We have E(X1+X2+…+Xn) = n and hence
E() = and
Var(X1+X2+…) =
So
Variance (=
(Expectation of sum is the sum of expectations and also E(cx) = cE(x)) and
Var(cx) = c^2 var(x)
(since 2 and x bar are constant we have taken out and last term is a constant repeated n times)
Since sum of Xi = x bar *n we get
The term =
i.e. We get E(S^2) = E(
Hence proved. Same lines for y also we can prove
b) Under the same principle where we combined X and Y we have degree of freedom as n1+n2-2 and combined variance as
c) Pooled variance is the best of the three because both variations are taken into account correct and degrees of freedom taken correctly as nx+ny-2
When paired test is done with equal sample sizes then sometimes we consider any one of other two if equal variances are given.
Q.No.5
Chi square test
Set up hypotheses as:
H0: Death is independent of the no of weeks before/after festival
Ha: There is association between death time and festival
(Two tailed chi square test at 5% significance level)
    Observed(Expected)(chisquare) table
    
    Chinese
    Jewish
    
    II week before
    55(51.16)(0.29)
    141(144.84)(0.10)
    I week before
    33(46.46)(3.90)
    145(131.54)(1.38)
    I week after
    70(54.56)(4.37)
    139(154.44)(1.54)
    II week...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here