STAT 1021 – Introduction to Statistics Spring 2020 Assignment #3 In 1978, William J. Wilson (then at the University of Chicago’s sociology department; now at Harvard, where good professors go to die)...

need help completing



STAT 1021 – Introduction to Statistics Spring 2020 Assignment #3 In 1978, William J. Wilson (then at the University of Chicago’s sociology department; now at Harvard, where good professors go to die) published his first major book, The Declining Signifi-cance of Race, which posits that class-based factors have begun to supplant factors related to race per se as the primary causes of unequal social outcomes for Black Americans. In short, Wilson argues that Black folks were formerly disadvantaged simply because they were Black; today, by contrast, Black Americans’ unequal access to education, jobs, quality neighborhoods, and other resources are responsible for persistent disparities in social outcomes (rather than race as such). We will test Wilson’s hypothesis using data from the oldest (1972) and newest (2018) GSS waves. If Wilson is correct, we expect that in the older data, black respondents will show poorer outcomes than white respondents even after controlling for education and other factors; in the newer data, while blacks may still have poorer outcomes in general, these disparities will shrink or even vanish after controlling for education and other factors. Tables analyzing the data from the 1972 and 2018 GSS data are presented below. Because Wilson was primarily interested in black-white differences, analyses have been restricted to black and white respondents only (other racial groups provided relatively few observations anyway). The SEI variable was not added to the GSS until 1988, so we will use occupational prestige as our primary outcome. (Assume for the purpose of the assignment that the 1970 and 2010 scales for occupational prestige are directly comparable.) 1.From the 1972 GSS data: A. [10 points] Table 1A shows the results of a naïve linear regression model for the effect of being black on occupational prestige. In a short paragraph, define and interpret the main terms, including r2. In doing so, answer the questions: How much difference is there between blacks and whites? How significant is it? How much of the variation in occupational prestige is explained by race alone? Table 1A: 1972 GSS linear regression model results B. [15 points] Table 1B shows the results of a more complex model, taking into account years of education (recentered on 12) and age (recentered on 18), as well as interaction effects for race and education and for race and age. In a paragraph, define and interpret all the terms in the model, including r2. In doing so, answer the questions: How much difference is there between blacks and whites, either at baseline (main effect) or in response to changes in age or education (interaction effects)? How significant are the differences? How much of the variation in occupational prestige is explained by this model? Table 1B: 1972 GSS multiple regression model results C.[10 points] Check the adequacy of the model in (B). Assess the assumptions of: i.linear relationship between (continuous) predictors and outcome, from figures below: Figure 1.C.i.a: prestige vs. educ Figure 1.C.i.b: prestige vs. age ii.no uncontrolled confounders iii.normal distribution of error terms, centered on 0, from the figure below: Figure 1.C.iii: distribution of error terms iv.homoscedasticity of error terms, from the figure below: Figure 1.C.iv: error terms vs. predicted values v.independence of error terms. 2.Using the 2018 GSS data: A.[10 points] Table 2A shows the results of a naïve linear regression model for the effect of being black on occupational prestige. In a short paragraph, define and interpret the main terms, including r2. In doing so, answer the questions: How much difference is there between blacks and whites? How significant is it? How much of the variation in occupational prestige is explained by race alone? How do the size and significance of the difference, and the amount of variance explained in this model, compare to the results from 1972? Table 2A: 2018 GSS linear regression model results B.[15 points] Table 2B shows the results of a more complex model, taking into account years of education (recentered on 12) and age (recentered on 18), as well as interaction effects for race and education and for race and age. In a paragraph, define and interpret all the terms in the model, including r2. In doing so, answer the questions: How much difference is there between blacks and whites, either at baseline (main effect) or in response to changes in age or education (interaction effects)? How significant are the differences? How much of the variation in occupational prestige is explained by this model? How do the sizes and significances of effects, and the amount of variance explained in this model, compare to the results from 1972? Table 2B: 2018 GSS GSS multiple regression model results C.[10 points] Check the adequacy of your model in (B). Assess the assumptions of: i.linear relationship between (continuous) predictors and outcome, from figures below: Figure 2.C.i.a: prestige vs. educ Figure 2.C.i.b: prestige vs. age ii.no uncontrolled confounders, iii.normal distribution of error terms, centered on 0, from the figure below: Figure 2.C.iii: distribution of error terms iv.homoscedasticity of error terms, from the figure below: Figure 2.C.iv: error terms vs. predicted values v.independence of error terms. 3.[30 points] Using what you have discovered, write a ~1 page summary in which you compare the results from your models and reflect on your findings. State whatever conclusions you are willing to draw regarding Wilson’s hypothesis. Is the significance of race declining? How do we explain persistent disparities in outcomes that we observe even in 2018? Are there any additional steps you’d like to take, with the GSS data, to explore these questions further? -40 -20 0 20 40 Residuals 102030405060 Fitted values _ c o n s 4 5 . 7 9 0 8 5 . 3 3 1 4 1 1 9 1 3 8 . 1 7 0 . 0 0 0 4 5 . 1 4 0 9 4 6 . 4 4 0 8 b l a c k - 4 . 7 5 7 6 1 3 . 7 8 0 2 5 7 6 - 6 . 1 0 0 . 0 0 0 - 6 . 2 8 7 8 1 6 - 3 . 2 2 7 4 0 9 p r e s t g 1 0 C o e f . S t d . E r r . t P > | t | [ 9 5 % C o n f . I n t e r v a l ] T o t a l 3 6 6 7 7 1 . 8 9 2 2 0 0 0 1 8 3 . 3 8 5 9 4 6 R o o t M S E = 1 3 . 4 2 1 A d j R - s q u a r e d = 0 . 0 1 7 8 R e s i d u a l 3 6 0 0 7 4 . 8 6 4 1 9 9 9 1 8 0 . 1 2 7 4 9 6 R - s q u a r e d = 0 . 0 1 8 3 M o d e l 6 6 9 7 . 0 2 8 1 4 1 6 6 9 7 . 0 2 8 1 4 P r o b > F = 0 . 0 0 0 0 F ( 1 , 1 9 9 9 ) = 3 7 . 1 8 S o u r c e S S d f M S N u m b e r o f o b s = 2 0 0 1 _cons 45.79085 .3314119 138.17 0.000 45.1409 46.4408 black -4.757613 .7802576 -6.10 0.000 -6.287816 -3.227409 prestg10 Coef. Std. Err. t P>|t| [95% Conf. Interval] Total 366771.892 2000 183.385946 Root MSE = 13.421 Adj R-squared = 0.0178 Residual 360074.864 1999 180.127496 R-squared = 0.0183 Model 6697.02814
Apr 07, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here