STAT 431 Exam 1 For each task you should: Summarize your data in verbal, tabular or graphic form. Perform EDA on the data. Explain what the testing problem is about. You should clearly state the null...

1 answer below »
STAT 431 Exam 1
For each task you should:
Summarize your data in verbal, tabular or graphic form. Perform EDA on the data.
Explain what the testing problem is about.
You should clearly state the null hypothesis and what its rejection means.
XXXXXXXXXXpoints) Returns on the Major U.S. Stock Exchanges (New York Stock Exchange (NYSE), American Stock Exchange (AMEX) and NASDAQ) for the period 12/31/1926 through 12/31/2018 are in the file exam2.indexReturns.csv in this Exam's data directory which is on Canvas in Canvas/Files/Data/_Exam 2 Data. NOTE: For your convenience and support, we also provide the long-form dataset that R requires, that filename is exam2.indexReturns.long.csv.
Universe refers to the major U.S. stock exchanges, (New York Stock Exchange (NYSE), American Stock Exchange (AMEX) and NASDAQ). Each of these indexes quote and trade thousands of securities. The return data is the annual percent return in an index (NYSE, AMEX, NASDAQ) for the year end date indicated.
There are four ways to calculate an index returns, these are the "type" factor. Each level corresponds to a different constituent stock weighting scheme in producing the index level, for which the annual returns (in percent) are found. These levels are
vwretdMarket capitalization-weighted return with dividend vwretxMarket capitalization-weighted return without dividend ewretdEqual-weighted return with dividend
ewretxEqual-weighted return without dividend
The questions we seek to answer include:
· If there a difference in the universe considered?
· Is there a difference between types of market returns, i.e., EW and MW, with and without dividends?
· Are there interactions between the universe and the return type, and what is their meaning?
In addition, please answer the following questions (10 points each):
a. Obtain the mean, median, and geometric mean (CAGR) annual returns for each universe and return type. Discuss these results.
b. Describe the data: time range, frequency, summary statistics, etc. Note that R's Anova and other functions require long-form data, which has been provided to you.
c. Check all major parametric Anova assumptions. You are familiar with normality and HOV diagnostics; independence can be checked with a runs test. Be sure and order your universe factors to NYSE, AMEX and NASDAQ, otherwise R assumes they are alphabetical, and we want to make comparisons relative to the NYSE.
d. Assuming you reject the omnibus hypothesis (be sure and state it), perform post-hoc testing to determine which regions of the market are significantly different.
e. Using R's {pwr} package or an online power program or GPower (freely available for Win or Mac OS-X; see tutorial provided in the Exam data directory), perform a power analysis for the problem. For this data, what power did we achieve? Determine the sample size required to detect a
± 4% difference in mean returns with a probability (power) of 80%.
2. (40 points) An observational study obtained decibel sound pressure level (SPL) data for an exponential horn (type of loudspeaker), as a function of distance from the audio source. The data is available in Canvas/Files/Data/_Exam 2 Data/spl.txt.
a. Prepare a comparative analysis of different regression models for this data. You should consider the linear model, and local (curvilinear) models, such as polynomial, spline and LOESS regression. Evaluate the model fits and provide your conclusion about which models best get at the data generating process. In your quantitative criteria be sure and include MAE and RMS error, as well as your recommendations for final model. Devise a tabular comparison to facilitate review of your models.
b. Unfortunately, the SPL sensor is subject to detecting random bursts of energy, resulting in occasional abnormally high readings. The data including these bad readings are in the file spl.contaminated.txt. Based on visual inspection, you would repeat your previous models and also include robust repression, such
as quantile regression (at least try the median), and Kendall-Theil regression. Be sure an provide pseudo R2 (one can use the nagelkerke function in the {rcompanion} package). Perform a complete analysis and comparison, make your conclusions and recommendation for the best model to use for this system.
3. This question has to do with expected mean squares. One way to state it is in terms of expected mean squares of treatment and errors. Although you will rarely need to know the expected value of MST or MSE, it is important to see that both expected values are the same when the null hypothesis is true and that the expected value of MST is larger when the null hypothesis is false.
Under H0 , E(MStmt)  E(MSerr)   , and under H , E(MStmt)   so that the resulting F ratio can
increase. You may recall that SStmt  n (x  x )2  n x  nx 2 .
iii i
a. For a single factor Anova, find/derive E(MSerr).
b. Find/derive E(MStmt) for H0 when it is both true and false.
4. Polynomial regression with interactions. This problem uses the pollution dataset exam2.pollute.txt in the exam data directory.
Find a parsimonious model for this data. Be sure and write the estimating equation for your final model. You should evaluate your stopping point with R2 and AIC. Do not mindlessly use stepwise search. You should consider including polynomial predictors. Be sure and consider interactions.
Fully interpret your resulting model. You should be able to obtain an R2 of at least .76 with an AIC of 326 or better.
Answered Same DayMay 10, 2021


Saravana Priyan answered on May 11 2021
25 Votes

Submit New Assignment

Copy and Paste Your Assignment Here