S371: Statistics for Sociology Final Project Due Thursday, May 7, 12:30pm via Canvas 100 points possible This project provides an opportunity for you to review and apply the skills you learned...

Stats R programming project


S371: Statistics for Sociology Final Project Due Thursday, May 7, 12:30pm via Canvas 100 points possible This project provides an opportunity for you to review and apply the skills you learned throughout the course. You will analyze data from the General Social Survey (GSS) and summarize and interpret your results in a report. Data This wave of the GSS was collected in 2018 from a probability sample of 2,348 of American adults aged 18 and above. The data, “project.RData,” and accompanying codebook are available on Canvas. Additional information on the questionnaires and data can be found at gss.norc.org. This dataset includes 56 variables. The data has been minimally preprocessed to simplify analysis. Most variables are not available for all 2,348 individuals though. There are some cases missing for most variables. Missing values, refused questions, and inapplicable questions have all been marked with NA. Analysis You will choose and analyze data for four variables from the dataset. You must choose one of each of the following variable types: 1) quantitative dependent (response) variable 2) categorical dependent (response) variable 3) quantitative independent (explanatory) variable 4) categorical independent (explanatory) variable < you will examine univariate distributions and bivariate relationships. the bivariate relationships you examine will be between the dependent and independent variables. so, you will have four variable pairs: dependent variable (response) independent variable (explanatory) quantitative categorical quantitative pair 1 pair 2 categorical pair 3 pair 4 there are few true quantitative variables available in the data set. you may treat ordinal variables with five or more categories as quantitative (e.g. polviews). if you choose, you can create new secondary variable(s) based on one or more of the variables in the dataset. for example, i recommend limiting categorical variables to between two and five categories. you should also collapse categories with a small number of individuals (i.e. less than 5% of your sample). you will complete the analysis in r and report and summarize your results in a word document. results must be presented in tables and graphs in the body of the report as instructed below. graphs should be made in r. raw r output should not be included in the body of the report. instead, a final r script of your analysis will be pasted into an appendix. sections 1) background (10 pts) a. data: briefly describe the original source of your data, the 2016 general social survey. identify your sample, describing who is in the sample and the sample size. (this may just be all adults 18+ without missing data or you may further limit the sample in some way to match your target population of interest.) b. variables: identify your four variables, describing first your dependent variables and then your independent variables. say what concept they measure (e.g. socio-economic status), say how they measure it (describe response categories or units), and note whether each is quantitative or categorical. c. hypotheses: briefly explain why and how you expect each independent variable to be related to each of the dependent variables for each of your four pairs. you do not have to include formal hypotheses (although you can), but you should comprehensively describe the relationships you expect to find for each pair of independent and dependent variables. (you should draft this before you do the analysis so that you are not influenced by the results.) 2) univariate distributions (30 pts) for each of your four variables, describe and interpret the univariate distributions. each variable’s univariate distribution should also be presented in a table and graph as described below: · two quantitative variables - create a summary table that includes: a. measures of center (mean and median) b. measures of variation (standard deviation and five number summary) c. lower and upper bounds of 95% confidence intervals for the means · two categorical variables - create a summary table that includes: a. categories for each categorical variable b. percentages of the sample in each category c. lower and upper bounds of 95% confidence intervals for the percentage in each category · all four variables – create univariate graphs (one for each variable). provide an appropriate graph showing the distribution of each variable. these graphs may be histograms, boxplots, or bar graphs and should be made in r. 3) bivariate relationships (30 pts) for each of the four pairs of dependent and independent variables, report and interpret your analysis of the relationship between the two variables. this text will be based on describing the analyses below. the analyses will differ depending on the variable types, for: · pair 1 – both dependent and independent variables are quantitative · scatterplot (include this graph) · correlation (can be reported in text only) · regression (can report slope and r2 in text only) · pair 4 – both dependent and independent variables are categorical · two-way table with column percentages (include this as a table) · chi-square test (if one variable has three or more categories) or two-sample z test of proportions (if both variables have two categories) · pairs 2 and 3 – one variable is categorical and the other is quantitative · if categorical variable has two categories: two-sample t-test of means (include the means and test results in a table) · if categorical variable has three or more categories: choose one category as a “reference” category and perform a two-sample t-test of means for each other category and this reference category. 4) conclusion (10 pts) – provide a summary and reflections on your findings. return to your hypotheses from the background section and describe whether your results met those expectations. for areas in which your results did not meet your expectations, speculate on why you may have gotten the unexpected results. also, if you thought the variables would be associated due to a causal relationship state whether or not this provides definitive evidence of such a causal relationship and explain why. 5) r appendix (20 pts) – paste an r script from your final analyses. in the course of your analysis, you will perform many analyses that are not part of the final report. these extra analyses include, for example, mistakes and additional exploration. do not include the output from the r script. format text · main text · double-spaced · 12 pt font size · one-inch margins · r script appendix · single-spaced · courier new font · size 9 · one-inch margins · there are no length requirements; take the space you need tables and figures · must have numbered titles that include the sample size · example: table 1. descriptive statistics for variable 1 and variable 2 (n = 1,256). · should be placed near the text that they go with · figures (graphs) should be made directly in r · tables should be “pretty” aka report ready. don’t copy raw r output as your tables. notes and tips: 1. use the word significant as appropriate. always use a significance level (α) of 0.05 with two-tailed tests. 2. look back at the tables used to summarize r output in the problem sets for ideas on how to format tables. 3. as described on the gss website, not all individuals had the same probability of being selected into the samples. thus, to make the sample truly representative of the population, sample weights should be used in the analyses. we did not cover the use of sample weights in our course though. so, act as if each individual did have the same probability of being selected into the sample and that the sample is representative of the broader population (without the use of weights). you="" will="" examine="" univariate="" distributions="" and="" bivariate="" relationships.="" the="" bivariate="" relationships="" you="" examine="" will="" be="" between="" the="" dependent="" and="" independent="" variables.="" so,="" you="" will="" have="" four="" variable="" pairs:="" dependent="" variable="" (response)="" independent="" variable="" (explanatory)="" quantitative="" categorical="" quantitative="" pair="" 1="" pair="" 2="" categorical="" pair="" 3="" pair="" 4="" there="" are="" few="" true="" quantitative="" variables="" available="" in="" the="" data="" set.="" you="" may="" treat="" ordinal="" variables="" with="" five="" or="" more="" categories="" as="" quantitative="" (e.g.="" polviews).="" if="" you="" choose,="" you="" can="" create="" new="" secondary="" variable(s)="" based="" on="" one="" or="" more="" of="" the="" variables="" in="" the="" dataset.="" for="" example,="" i="" recommend="" limiting="" categorical="" variables="" to="" between="" two="" and="" five="" categories.="" you="" should="" also="" collapse="" categories="" with="" a="" small="" number="" of="" individuals="" (i.e.="" less="" than="" 5%="" of="" your="" sample).="" you="" will="" complete="" the="" analysis="" in="" r="" and="" report="" and="" summarize="" your="" results="" in="" a="" word="" document.="" results="" must="" be="" presented="" in="" tables="" and="" graphs="" in="" the="" body="" of="" the="" report="" as="" instructed="" below.="" graphs="" should="" be="" made="" in="" r.="" raw="" r="" output="" should="" not="" be="" included="" in="" the="" body="" of="" the="" report.="" instead,="" a="" final="" r="" script="" of="" your="" analysis="" will="" be="" pasted="" into="" an="" appendix.="" sections="" 1)="" background="" (10="" pts)="" a.="" data:="" briefly="" describe="" the="" original="" source="" of="" your="" data,="" the="" 2016="" general="" social="" survey.="" identify="" your="" sample,="" describing="" who="" is="" in="" the="" sample="" and="" the="" sample="" size.="" (this="" may="" just="" be="" all="" adults="" 18+="" without="" missing="" data="" or="" you="" may="" further="" limit="" the="" sample="" in="" some="" way="" to="" match="" your="" target="" population="" of="" interest.)="" b.="" variables:="" identify="" your="" four="" variables,="" describing="" first="" your="" dependent="" variables="" and="" then="" your="" independent="" variables.="" say="" what="" concept="" they="" measure="" (e.g.="" socio-economic="" status),="" say="" how="" they="" measure="" it="" (describe="" response="" categories="" or="" units),="" and="" note="" whether="" each="" is="" quantitative="" or="" categorical.="" c.="" hypotheses:="" briefly="" explain="" why="" and="" how="" you="" expect="" each="" independent="" variable="" to="" be="" related="" to="" each="" of="" the="" dependent="" variables="" for="" each="" of="" your="" four="" pairs.="" you="" do="" not="" have="" to="" include="" formal="" hypotheses="" (although="" you="" can),="" but="" you="" should="" comprehensively="" describe="" the="" relationships="" you="" expect="" to="" find="" for="" each="" pair="" of="" independent="" and="" dependent="" variables.="" (you="" should="" draft="" this="" before="" you="" do="" the="" analysis="" so="" that="" you="" are="" not="" influenced="" by="" the="" results.)="" 2)="" univariate="" distributions="" (30="" pts)="" for="" each="" of="" your="" four="" variables,="" describe="" and="" interpret="" the="" univariate="" distributions.="" each="" variable’s="" univariate="" distribution="" should="" also="" be="" presented="" in="" a="" table="" and="" graph="" as="" described="" below:="" ·="" two="" quantitative="" variables="" -="" create="" a="" summary="" table="" that="" includes:="" a.="" measures="" of="" center="" (mean="" and="" median)="" b.="" measures="" of="" variation="" (standard="" deviation="" and="" five="" number="" summary)="" c.="" lower="" and="" upper="" bounds="" of="" 95%="" confidence="" intervals="" for="" the="" means="" ·="" two="" categorical="" variables="" -="" create="" a="" summary="" table="" that="" includes:="" a.="" categories="" for="" each="" categorical="" variable="" b.="" percentages="" of="" the="" sample="" in="" each="" category="" c.="" lower="" and="" upper="" bounds="" of="" 95%="" confidence="" intervals="" for="" the="" percentage="" in="" each="" category="" ·="" all="" four="" variables="" –="" create="" univariate="" graphs="" (one="" for="" each="" variable).="" provide="" an="" appropriate="" graph="" showing="" the="" distribution="" of="" each="" variable.="" these="" graphs="" may="" be="" histograms,="" boxplots,="" or="" bar="" graphs="" and="" should="" be="" made="" in="" r.="" 3)="" bivariate="" relationships="" (30="" pts)="" for="" each="" of="" the="" four="" pairs="" of="" dependent="" and="" independent="" variables,="" report="" and="" interpret="" your="" analysis="" of="" the="" relationship="" between="" the="" two="" variables.="" this="" text="" will="" be="" based="" on="" describing="" the="" analyses="" below.="" the="" analyses="" will="" differ="" depending="" on="" the="" variable="" types,="" for:="" ·="" pair="" 1="" –="" both="" dependent="" and="" independent="" variables="" are="" quantitative="" ·="" scatterplot="" (include="" this="" graph)="" ·="" correlation="" (can="" be="" reported="" in="" text="" only)="" ·="" regression="" (can="" report="" slope="" and="" r2="" in="" text="" only)="" ·="" pair="" 4="" –="" both="" dependent="" and="" independent="" variables="" are="" categorical="" ·="" two-way="" table="" with="" column="" percentages="" (include="" this="" as="" a="" table)="" ·="" chi-square="" test="" (if="" one="" variable="" has="" three="" or="" more="" categories)="" or="" two-sample="" z="" test="" of="" proportions="" (if="" both="" variables="" have="" two="" categories)="" ·="" pairs="" 2="" and="" 3="" –="" one="" variable="" is="" categorical="" and="" the="" other="" is="" quantitative="" ·="" if="" categorical="" variable="" has="" two="" categories:="" two-sample="" t-test="" of="" means="" (include="" the="" means="" and="" test="" results="" in="" a="" table)="" ·="" if="" categorical="" variable="" has="" three="" or="" more="" categories:="" choose="" one="" category="" as="" a="" “reference”="" category="" and="" perform="" a="" two-sample="" t-test="" of="" means="" for="" each="" other="" category="" and="" this="" reference="" category.="" 4)="" conclusion="" (10="" pts)="" –="" provide="" a="" summary="" and="" reflections="" on="" your="" findings.="" return="" to="" your="" hypotheses="" from="" the="" background="" section="" and="" describe="" whether="" your="" results="" met="" those="" expectations.="" for="" areas="" in="" which="" your="" results="" did="" not="" meet="" your="" expectations,="" speculate="" on="" why="" you="" may="" have="" gotten="" the="" unexpected="" results.="" also,="" if="" you="" thought="" the="" variables="" would="" be="" associated="" due="" to="" a="" causal="" relationship="" state="" whether="" or="" not="" this="" provides="" definitive="" evidence="" of="" such="" a="" causal="" relationship="" and="" explain="" why.="" 5)="" r="" appendix="" (20="" pts)="" –="" paste="" an="" r="" script="" from="" your="" final="" analyses.="" in="" the="" course="" of="" your="" analysis,="" you="" will="" perform="" many="" analyses="" that="" are="" not="" part="" of="" the="" final="" report.="" these="" extra="" analyses="" include,="" for="" example,="" mistakes="" and="" additional="" exploration.="" do="" not="" include="" the="" output="" from="" the="" r="" script.="" format="" text="" ·="" main="" text="" ·="" double-spaced="" ·="" 12="" pt="" font="" size="" ·="" one-inch="" margins="" ·="" r="" script="" appendix="" ·="" single-spaced="" ·="" courier="" new="" font="" ·="" size="" 9="" ·="" one-inch="" margins="" ·="" there="" are="" no="" length="" requirements;="" take="" the="" space="" you="" need="" tables="" and="" figures="" ·="" must="" have="" numbered="" titles="" that="" include="" the="" sample="" size="" ·="" example:="" table="" 1.="" descriptive="" statistics="" for="" variable="" 1="" and="" variable="" 2="" (n="1,256)." ·="" should="" be="" placed="" near="" the="" text="" that="" they="" go="" with="" ·="" figures="" (graphs)="" should="" be="" made="" directly="" in="" r="" ·="" tables="" should="" be="" “pretty”="" aka="" report="" ready.="" don’t="" copy="" raw="" r="" output="" as="" your="" tables.="" notes="" and="" tips:="" 1.="" use="" the="" word="" significant="" as="" appropriate.="" always="" use="" a="" significance="" level="" (α)="" of="" 0.05="" with="" two-tailed="" tests.="" 2.="" look="" back="" at="" the="" tables="" used="" to="" summarize="" r="" output="" in="" the="" problem="" sets="" for="" ideas="" on="" how="" to="" format="" tables.="" 3.="" as="" described="" on="" the="" gss="" website,="" not="" all="" individuals="" had="" the="" same="" probability="" of="" being="" selected="" into="" the="" samples.="" thus,="" to="" make="" the="" sample="" truly="" representative="" of="" the="" population,="" sample="" weights="" should="" be="" used="" in="" the="" analyses.="" we="" did="" not="" cover="" the="" use="" of="" sample="" weights="" in="" our="" course="" though.="" so,="" act="" as="" if="" each="" individual="" did="" have="" the="" same="" probability="" of="" being="" selected="" into="" the="" sample="" and="" that="" the="" sample="" is="" representative="" of="" the="" broader="" population="" (without="" the="" use="" of="">
Apr 06, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here