Data Analysis assignment 2 Due: August 30 at 11:59pm See correction to 1.2 below The second data analysis assignment contains three data sets. You are required to select an 80% subset for each data...

1 answer below »
I have no idea how to start the assignment since I am not familiar with R programming


Data Analysis assignment 2 Due: August 30 at 11:59pm See correction to 1.2 below The second data analysis assignment contains three data sets. You are required to select an 80% subset for each data set. This will ensure that every student has a different data set and should get different answers. Identical answers or almost identical answers will be considered cheating. There are three problems with subsections. Please, answer the questions in context using statistical measures covered in class. Every problem is to be answered using statistical measures and techniques covered in class prior to the due date. Using other methods will not earn points. Your assignment can be handwritten with plots copied and pasted or typed or based on an R Markdown file that is annotated. It must then be uploaded to Canvas before the due date. Data problem 1: For the data sets given below, state the null and alternative hypothesis in each case explicitly in terms of the parameters the hypotheses are about and carry out the hypothesis test. In each case provide the value of the test-statistic, the p-value and the degrees of freedom. Use α=.05. Check if assumptions for χ2 tests for your problem hold. 1. The data set cavalry.csv contains the number of death by horse kicks per year in the Prussian army over a twenty year period in 10 different regiments. Thus, there are a total of 200 records. You will randomly select 160 of these and carry out a test to see if it is reasonable to assume the number of death by horse kicks follows a Poisson distribution. Carry out the test and state your conclusions in context. 2. The data set numboys.csv contains data for families with 2 children. The probability at conception of a boy is p=0.5 and each time a baby is conceived, the probability it is a boy is p=0.50, regardless of the gender of any other children. Determine the probability of having 2 boys, 1 boy, 1 girl and 2 girls. Does the data agree with what we would expect? Carry out a goodness-of-fit test and state your results in context. 3. The data set dayofbirth.csv provides the day of birth of children. We would like to test whether a baby is as likely to be born on a weekday as on a weekend. Find a 95% confidence for the proportion of babies born on the weekend. Is the hypothesized value in that interval? State your conclusions in context. Data problem 2: In problem 2 we are concerned with associations between two qualitative variables. In each case, clearly state a null and alternative hypothesis in context, carry out the hypothesis test, provide the value of the test-statistic and degrees of freedom, the p-value and state a conclusion at α=.01. Check assumptions. 1. For the data set fry.csv test if survival is the same in wild as in hatchery trout. State your conclusions in context. Find 95% confidence intervals for the proportion of wild and the proportion of hatchery trout surviving. 2. Using the data set wormgetsbird.csv we would like to know if there is an association between infection (uninfected, lightly and highly infected) and getting eaten. State your conclusions based on your test statistic and p-value. Which worms are most likely to get eaten? Calculate the proportion of worms getting eaten. Data problem 3: In this problem we will analyze continuous data. We will use the data set leadinsoil.csv. It contains data on the lead content in soil before and after hurricane Katrina and the difference before and after. We would like to analyze this data as follows. 1. Get estimates (point and interval estimates) of the lead content before Katrina and similarly after Katrina. 2. Get estimates of the difference between and after using two approaches: • Use the data on the difference to test if the before and after content are the same. Clearly state null and alternative hypothesis and identify the parameter you are using. • Use the before and the after data to repeat the analysis for the difference. Do you reach the same conclusion? 3. Using the student survey data set, compare the overall SAT for students who scored higher on the Math to students who scored higher on the Verbal. Carry out the test that mean scores are the same vs the alternative that they are different. Who scores higher overall? Could you do a test by calculating a difference for pairs of data?
Answered Same DayAug 26, 2021

Answer To: Data Analysis assignment 2 Due: August 30 at 11:59pm See correction to 1.2 below The second data...

Bezawada Arun answered on Aug 28 2021
133 Votes
#---------- Data problem 1------------------
# insatlling the necessary packages
library(magrittr)
library(ggplot2)
library(dplyr)
#-------------------1---------------------
--
# importing the set cavalry dataset
cavalry = read.csv("cavalry.csv")
# displaying the top six records of cavalry
head(cavalry)
# Selecting 160 random records
samples <- sample(1:nrow(cavalry), nrow(cavalry)*0.8)
print(samples)
samp <- as.data.frame(samples)
# summary statistics of cavalry
summary(samples)
# hypothesis setup
# Null hypo: There is no significant difference between the deaths by horse kicks
# Alt hypo: There is a significant difference between the deaths by horse kicks
# Performing chi square test
chisq.test(samp)
# Since p < 0.05 we accept the null hypothesis
# Checking the poisson distribution
x <- 0:4
len <- length(samples)
lambda <- mean(cavalry$numberOfDeaths)
exp <- len*dpois(x,lambda)
# plotting the deaths
deaths <- ggplot(data.frame(deaths=cavalry$numberOfDeaths),aes(x=deaths))+
geom_histogram(binwidth = 0.5, color="green")
deaths
# From the plot we observe that the data does not follow poisson distribution
#-------------------2-----------------------
# importing the numboys dataset
numboys = read.csv("numboys.csv")
# displaying the top six records of numboys
head(numboys)
chisq.test(numboys$numberOfBoys)
# Probability of having 2 boys
p <- 0.50
n <- 2
dbinom(2, size = n,prob = p)
# Probability of having 1 boy
p <- 0.50
n <- 1
dbinom(1, size = n, prob = p)
# probability of having 1 girl
p <- 0.50
n <- 1
dbinom(1, size = n, prob = p)
# probability of having 2 girls
p <- 0.50
n <- 2
dbinom(2, size = n, prob = p)
#--------------------3----------------------
# importing the day of birth...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here