STA 9708, Baruch CollegePart C added; see pages 4-9 Prof. L. Tatum Draft: XXXXXXXXXX Final Project Assignment Project Due Date 10:00 PM, Thursday, XXXXXXXXXX General Instructions for the Project: ·...

1 answer below »
Assignment is fully outlined in the attached document (Actual Assignment) and additional support is in the Lecture Note 10 doc.




STA 9708, Baruch CollegePart C added; see pages 4-9Prof. L. Tatum Draft: 12-6-20 Final Project Assignment Project Due Date 10:00 PM, Thursday, 12-17-2020 General Instructions for the Project: · Write in the first person. · Share some of your thought-processes, miscues, work-arounds, and insights. · Ground your work in the concepts and approaches taken in the Lecture Notes. · Use Microsoft Word for the project; insert Excel tables, graphs, and summaries into the Word file. Document your data sources in the Word document. · Use an Excel file to hold scratch work and technical procedures. · See LN10.A for examples of · (i) writing in the first person; · (ii) expressing thought-processes, miscues, work-arounds, and insights; · (iii) presenting the project in Microsoft Word, including tables, graphics, and summaries of Microsoft Excel work; · Sharing your Project work with anyone else is cheating: Don’t do it. · Do not plagiarize. **************************** Point Allocation Part A: 45 points Part B: 40 points Part C: 15 points Part A. One-sided, Two-Sample z-Test of Population Proportions. (See LN10.) In this section, get your data from the U.S. Census. (1) Develop a research hypothesis to use in a one-sided two-sample test for population proportions. (2) Gather data from U.S. Census data on individual people. Show details of the data sourcing. Keep sample sizes between 15 and 35. See Section 1 of LN10 for an example. (3) Explain where (a) population proportions will be used in the test and where (b) sample proportions will be used in the test, in the context of your setting. (4) Walk the reader through the hypothesis test in the context of your data. Explain how the mechanics of the test provides the answer to the question, “How far is far?” in the context of your setting. (5) Graph the test, labeling all relevant portions. (6) State the probabilistic meaning of your p-value, referencing (a) the numeric value of your p-value, (b) the value of test-statistic found, and (c) the relevant area in the graph of the test. (7) State the conclusion of the test and the grounds. Summarize the reasoning behind the conclusion. Be specific to your setting. (8) To check your work, go to MathCracker website and run the appropriate program. Show the results; do they agree? https://mathcracker.com/z-test-for-two-proportions Part B. One-sided, Two-Sample t-Test of Population Averages (see LN9) In this section, get your data from Ellis Island immigration records. (1) Develop a research hypothesis to use in a one-sided two-sample test of population averages. (2) Create your own data set consisting of two different samples drawn from Ellis Island immigration data. Keep sample sizes between 10 and 30. Show details of the sourcing and context. An example of accessing Ellis Island data is given in Section 4 of LN10. (3) State the null hypotheses in words that apply to the particular topic you are addressing. Define the populations you are referencing, their approximate sizes, and how the population averages would (in principle) be computed. (4) Perform a one-sided, two-sample t-test. Explain what you are doing as you go along. (5) Show the graph of the test with all features shown and labeled. (6) State the probabilistic meaning of your p-value, referencing (a) the numeric value of your p-value, (b) the value of test-statistic found, and (c) the relevant area in the graph of the test. (7) State the conclusion of the test and the grounds. Summarize the reasoning behind the conclusion. Part C. CAPM Regression Study Introduction LN12 included an introduction to CAPM, the Capital Assets Pricing Model. The central interest of CAPM is the value of the sample slope coefficient found from regressing the percentage return on a stock against the percentage return on the S&P500. In casual usage in finance, that sample slope is called Beta. I recently conducted an experiment to see if I could match the value of Beta reported by Yahoo Finance for Lions Gate Entertainment Corp (LGF-A). Yahoo reported the Beta as 1.95, as seen in the screenshot to the right. I was able to do reasonably well because I finally took the time to search carefully and read carefully about how Yahoo does the computation. Authoritative Yahoo sources state that Beta is computed using monthly returns for a period of 5 years. This idea is supported, perhaps, by the cryptic 5Y Monthly modifier-notation used by Yahoo. So, I downloaded five years of monthly data for both LGF and S&P500 (^GSPC), computed the monthly returns for both, and regressed the LGF returns on the S&P500 returns. Lo and behold, I got a sample slope of 1.903. I was surprised and gratified to get a value so close to the one reported by Yahoo. You are welcome to explore the issue, but I have not worried at this point about the difference between 1.95 and 1.903. If you decide to investigate, one problem is that I downloaded five years (or 60 months) of adjusted closing price data. Thus when computing monthly returns, I am only left with 59 months of monthly returns. So, my sample size was likely one short of what Yahoo used. I employed the 5Y (five year) button in Yahoo and found it helpful. In the screenshot I highlight the key features of using that button, and the selection of monthly returns. After clicking Done and Apply but before Download, always double check the highlighted features of the top line, shown below, to insure all intended changes have been made. It will occur to you to wonder why this is done with monthly returns and for five years. Would a different underlying value of the population slope be involved when daily returns were used for five years instead of monthly returns? Or, monthly returns for ten years instead of five? I am posting a nicely written article by Aswath Damodaran, a Finance professor at NYU, in which those topics are explored. This paper is an informal review, not an academic-journal publication. That is good because it makes it readable, but it is a bit informal and hence, for example, does not include page numbers (that I could find). A Note on Notation In the work-a-day world of applied finance, that sample slope is called “Beta,” but when writing a serious paper for publication, professors of Finance will always employ the statistical nuances you have encountered in the course, where sample slope and population slope are distinguished. In casual discussion in Finance, the term Beta covers both types of slope: a sample slope or the true slope, and the reader is left to work out which is meant by reflecting on the context. Here is a reliable rule: if Beta is to be computed from sample data, then it is a sample slope and it is not the population slope. A sample slope estimates the population slope. Many different samples from the same population can be drawn; each sample will yield its own sample slope, yet there is still only one true slope. Thinking back to the start of the semester, though, we should note that there is one true slope only if the underlying system is stable across time with respect to that parameter. If not, then the true slope can vary with time; that possibility is of grave concern in Finance. In a statistics course, our primary subject is to distinguish between the measurements we can take and the true but unobservable value of the thing being measured. Hence, even in an introductory statistics class, we are careful to distinguish between a sample average and the population average. And, in regression, to distinguish between a slope computed from a sample and the unique slope that could only be computed from knowledge of the values found from the entire population. In Statistics texts and classes, a sample slope is typically denoted either as b1 or as . And, the true slope (or population slope) is denoted by the subscripted Greek letter beta, . Here is a notational issue you may see: the regression model is sometimes written as Y = α + βx + ε. That notation is common in discussions of CAPM. It is fine, but it creates confusion because we also use α (the Greek “alpha”) to denote the (assigned) probability of rejecting the null hypothesis when the null is true. So, be on your guard! One last technical note. If you read around, you will find in finance that the sample slope can be computed by dividing the sample covariance of x and y by the sample variance of x. It is surprising that it can be done in that way, but it works. For purposes of this course, I recommend you do not pursue lines of thought that arise from adopting that direction. Daily versus Monthly Returns I wanted to examine the impact on the sample slope of using daily returns instead of monthly. Therefore, I downloaded daily adjusted closing prices for LGF and S&P for five years, then regressed the daily returns of LGF on those of S&P. Here is a table of comparisons. Summary of Daily and Monthly Returns of LGF Regressed on S&P500 Returns Slope Coef S.E. Slope t Stat P-value Low 95% Up 95% Avg Volume n Daily 1.016 0.0619 16.41 0.0000 0.895 1.138 1,278,144 1258 Monthly 1.903 0.3194 5.96 0.0000 1.263 2.542 26,416,843 59 Using daily returns cut the sample slope almost in half! One wonders if this is an ordinary occurrence. Note that the standard error of slope fell about five-fold when switching from monthly to daily returns. To some extent, this is due to the much larger sample size of daily returns (sample size n=1258 versus n=59), but I do not know if this result is common. We also see that the 95% confidence intervals for the true beta do not overlap for the daily and monthly returns.
Answered Same DayDec 16, 2021

Answer To: STA 9708, Baruch CollegePart C added; see pages 4-9 Prof. L. Tatum Draft: XXXXXXXXXX Final Project...

Suraj answered on Dec 17 2021
153 Votes
Part A:
(1)
Research hypothesis: Here, we have collected data from the US census data on individual people. Here, someone claim that the proportion of males is higher than the proportion of females in the California city from US census data
. Thus, we want to test this claim. Since this is a claim for proportions. Thus, we can use z-test for this problem. Hence, this is a one-tailed two sample population proportion problem.
Here, p1 denotes population proportion of males and p2 denotes the population proportion of females. The hypotheses for this test are given as follows:
The population proportion for males is not higher than population proportion of females that is .
The population proportion for males is higher than population proportion of females that is .
(2)
The data is collected from the US census data. The data is described as follows:
    S. no.
    SEX
    BIRTHPLACE
    1
    M
    California
    2
    F
    California
    3
    M
    California
    4
    M
    California
    5
    M
    California
    6
    F
    California
    7
    F
    California
    8
    M
    California
    9
    M
    California
    10
    M
    California
    11
    F
    California
    12
    M
    California
    13
    F
    California
    14
    M
    California
    15
    F
    California
    16
    M
    California
    17
    F
    California
    18
    F
    California
    19
    M
    California
    20
    F
    California
    21
    M
    California
    22
    M
    California
    23
    F
    California
    24
    M
    California
    
    
    
Here, the above data set consist of two columns that the sex and the birthplace as we are conducting our test bases on the birthplace that is California.
(3)
The sample size for the test is, n = 24
Let and denote the sample proportion used in the test and p1 and p2 are the population proportion used in the test.
The main assumptions to conduct the z-test for two proportion are given as follows:
1. The data is random for both the populations.
2. Both the populations follow normal distribution.
Let x1 denote the total number of males in the sample and x2 denote the total number of females in the sample.
Here, n = n1 = n2
Thus,
Thus, the total number of males in the sample are 14 and total number of females in the sample are 10.
Thus, the values of the sample proportion are given as follows:
(4)
The test statistic is given for the test as follows:
Substitute the values in the above test statistic, we get
Here, the test statistic value of the test is 1.106.
The level of...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here