2.Write script to install and load the packages “ggm”, “polycor”, “ggplot2”, “boot”, and “Hmisc”using the “install.packages(“”)” and “library()” functions.3.Write twolines of script to set your...

2.Write script to install and load the packages “ggm”, “polycor”, “ggplot2”, “boot”, and “Hmisc”using the “install.packages(“”)” and “library()” functions.3.Write twolines of script to set your working directory, using the “setwd()” function. The first line should set your working directory to a folder on your home computer. The second line should set your working directory to your OSU RStudio Server folder. Place a “#” at the start of the first line(for usewhen you don’t have access to RStudio Server).Import Data & Clean Variables4.Import your villagedata set, using the “read.csv()” function. Be sure touse the “-“ command operator to assign the data object a name of your choice.5.List the names of all the variables in your data object, using the “names()” function.Then determine the dimensions of your data object, using the “dim()” function.NOTE: This data set only contains observations for the year XXXXXXXXXXSelect a sub-sample of your data using the “data.frame[,]” function, creating a new data object that includesall observations (rows) and the following variables(columns): "yearY234567", "vidY234567", "hhidY23456", "idssnY234567","hhheadY234567", "idage_TAPS_Y234567", "ihceduY234567", "iskrhelptimeY567", "iskrespectY5", "iwtotalwealthY234567", "ihcmc_cccoY25", "ibed", "isick", "wealthcat","gender", "heightad", "weightad". Use “dim()” to check the dimensions of this object.7.Select a sub-sample from the new data object created in Task 6.Using the “data.frame[,] function, createa new data object that includes allobservations from household heads (i.e. hhheadY234567==1) and all variables. Use “dim()” to check the dimensions of this object.How many observations (rows) and variables (columns) does your data set contain? Write these answers in your script, preceded by a “#”Assumptions of Pearson’s r8.For this Lab, we are going to focus on therelationship between education("ihceduY234567")and wealth("iwtotalwealthY234567"). Both of these variables currently have funky names, so let’s use the “colnames()” function to rename them. Choose new names for each variable that make sense to you.If you’re Lab Assignment #4| ANTH 593 | Spring 20202comfortable with the existing variable name, no need to change it.Then use the “names()” function to check that you renamed the right columns.9.Now use the functions in the “ggplot2” package to generate a histogram of education, including a layer that uses “stat_function()”to add a normal curve(HINT: don’t forget to set an appropriate bin width). Be sure to save this combined graph as a new graph object.Based on your visual inspection of the histogram, would you say education is normally distributed in your village sample? Add your short answer with a brief explanation as an annotation.10.Take the next step and perform a Shapiro-Wilk test to determine if education is normally distributed. Include an annotation that uses the output of this test to: a) determine if the sample is normally distributed and b) state your opinion about whether or not the sampling distributionof education is normally distributed.Calculating Pearson’s r11.After consulting the table on p.216 of our text, choose a function to calculate the Pearson’s rcorrelation between education and wealth, as well as the statistical significance (p-value) of this correlation. Using the output of this test, answer the following questions in annotations to your script:a.What is the direction of the correlation? Is itpositive, negative, or neutral?b.What is the size of the correlation? Is it large, medium, small, or zero?c.What is the statistical significance of the correlation?12.Now use the “Hmisc::rcorr()” function to calculate pairwise correlations for multiple variables in your village data set. Specifically:a.Use the “as.matrix()” function to convert a sub-sample of your data into a new matrix object that includes all observations and variables 6:13 & 15:17b.Run the “Hmisc::rcorr()” function with this new matrix, andsave the results as a new object called “multipearson1”c.Using the object “multipearson1”, generate a new object that includes the Pearson’s rbetween education and the other variables in your sub-sample.HINT: display the object “multipearson1” to determine the column number for education, then use the “[]” to extractthat column number.d.Using the object “multipearson1”, generate a new object that includes the p-values for the Pearson’s rbetween education and the other variables in your sub-sample.HINT: see hint for 12c.e.Using the object you created in 12c and the arithmetic operator “^2”, generate a new object that is equal to the R-square of each pairwise correlation between education and the other variables in your sub-sample.f.Use the “cbind()” function to combine these threenew objects from 12c,12d, and 12einto a new object called “pearsoneduc”g.Write a line of script todisplay the newobject from 12f13.Using the output from Task 12g, answer the following questions using annotations in your script:a.Which variables are positivelycorrelated with education in your village? Write the name of each variable, followed by parentheses that include the correlation coefficient. Ignore the p-values for now.b.Which variables are negatively correlated with education in your village? Write the name of each variable, followed by parentheses that include the correlation coefficient. Ignore the p-valuesfor now.Lab Assignment #4| ANTH 593 | Spring 20203c.Of the variables that are positively or negatively correlated with education in your village, which correlations are statistically significant? Write the name of each variable, followed by parentheses that include the correlation coefficient andp-values.d.Looking at the R-square values for all statistically significant pairwise correlations, which variable explains the most variance in education? List this variable, followed by parentheses that include the correlation coefficient, p-value, and R-square for this variable.e.Using the variable you chose for 13d, generate a scatter plot that illustrates the association with education. Use the geom_point() function of ggplot2 and add a line of best fit with the geom_smooth() function and “lm” for the “method=” option. Be sure to add axis labels and a title to your graph, save the graph as an object in the R workspace, and save this graph as a .png file named: “yourlastname-lab4-graph2”Non-Parametric Correlations14.Depending on the distribution of education and the size of yoursample from your village, you may or may not have been comfortable assuming normality of the sampling distribution, which is required for Pearson’s r. Using an appropriate function from the table on p. 216, calculate the Spearman’s correlation coefficient between education and wealth. In an annotation to your script, explain how the results of Spearman’s compare to the results of Pearson’s you calculated for Task 11? Be sure to discuss the direction (positive/negative), size(large/small), and statistical significance of Spearman’s.15.Now calculate Kendall’s Tau. Using an appropriate function from the table on p.216, calculate Kendall’s Tau between education and wealth. In an annotation to your script, explain how the results of Kendall’s Tau compare to the results of Pearson’s and Spearman’s you calculated previously? Be sure to discuss the direction (positive/negative), size(large/small), and statistical significance of Kendall’s in relation to the other tests.16.Based on your conclusions from Task 14 and Task 15, does it matter whether we use parametric (Pearson’s) or non-parametric (Spearman’s, Kendall’s) correlations to understand the relationship between education and wealthin your village?Place your answertothis question inan annotation toyour script.
May 13, 2021

Submit New Assignment

Copy and Paste Your Assignment Here