Problem 1. The file SpeedTrap.RData is an R data set that contains a data frame called SpeedTrap....

Question

Problem 1. The file	SpeedTrap.RData	is an R data set that contains a data frame called	SpeedTrap. This data frame consists of 184 observations (rows) and 7 variables (columns). Each row corresponds to a town in the Chicago area. The variables are as follows:In each community we want to compare the rate of ticketing outsiders who are stopped for a traffic violation to the rate of ticketing residents who are stopped. To do this we will use the	odds ratio	which is defined as follows:																				where πout	and πres	are the probabilities of being ticketed for outsiders and resident, respectively. The odds ratio is used to compare probabilities between two populations. It often is preferred to using the straight difference πout	- πres	in statistical modeling. An odds ratio of 1.0 implies that the two probabilities are equal. An odds ratio greater than 1.0 implies that πout	is greater than πres. The odds ratio is estimated from the counts of successes and failures in each community by replacing πout	and πres	with sample estimates.(a) Begin by calculating the estimated odds ratio for each community. Append this variable to the data frame (call it	OddsRatio). The first three values should match the following:	> SpeedTrap[1:3,"OddsRatio"] 	[1] 1.146857 1.201661 1.264754(b) Fit a regression model using	OddsRatio	as the outcome variable and	Pop, PPSQMI, 	PPHU, and	PCI	as predictor variables. Using diagnostic plots, describe how well the regression conforms to the assumptions of the normal, linear regression model.(c) Identify those communities for which the leverage exceeds three times the average value. Re-run the regression with these communities removed from the data set. Describe how their removal affects the fitted model.(d) Re-run the regression in (a), replacing each of the predictors by its logarithm. How does this change affect the presence of observations with high leverage?(e) Using log-transformed predictors, find a Box-Cox transformation of the outcome variable that maximizes the likelihood. Re-fit the model with the transformed outcome variable. Does it better conform to the assumptions of the normal, linear regression model than the model that you fit originally? In what respects are the diagnostics still troublesome?(f) Produce and interpret a set of partial residual plots for the model that you fit in (e). Do the predictor variables appear to be treated appropriately in the model?(g) Assuming that all necessary assumptions are met with the model that you fit in (e):i. Test the null hypothesis that the coefficients on	log(PPHU)	and	log(PPSQMI)	are both zero. ii. Give a 95% confidence interval for the coefficient on	log(PCI). iii. Give a 95% prediction interval for the estimated odds ratio in a community that has a population of 25,000; 4000 persons per square mile; 2.8 persons per housing unit, and a per capita income of $26,000.(h) Conduct an outlier analysis on residuals from the regression in (e). Use a family-wide Type I error probability of α = .01. Which communities should be considered for removal from the regression?	Problem 2. The file	Ozone.RData	contains a vector named	ozone	which has length n = 111. This vector was obtained from a regression of air quality measurements (ozone) taken on 111 consecutive days in New York City in 1973. Each entry of	ozone	is either –1 if the residual is negative or +1 if the residual is positive. We are interested in testing the null hypothesis that the residuals are not serially correlated versus the alternative hypothesis that the residuals are serially correlated. Using the Runs Test, report a p-value and state your conclusion at the .05 test level.	Problem 3.	For this problem you will use the	prostate	data that is available in the faraway package. The outcome variable is	lcavol, all other variables are predictors. We want to determine if a regression model behaves differently for younger (under age 65) subjects than for older (age 65 and over) subjects. (a) To do this, introduce a new variable called	Young	to the data set as a factor that distinguishes younger from older men. Introduce it in a way that separate intercepts and slopes are applied to the two groups of men. Show a summary of your regression. Note: we will accept the validity of all regression assumptions in this exercise.(b) Using the model in (a) conduct an F-test to see if you reject the null hypothesis that coefficients associated with	Young	are all equal to zero. Explain in practical terms what your results mean.

Subhanbasha · Accepted Answer

Report
Problem 1:
b).
Ans: 
From the above plot the residuals are not normal.
There are some outliers in the data where it need to remove.
There is some relation between the residuals with fitted values.
From the above all plots we can say that the data does not follow the regression assumptions.
c).
Ans: The outlier communities are WAYNE, HILLSIDE and ITASCA. So we are removed these three and the model performance somehow has been increased.
The model performance increased and the assumptions are somehow better than above model.
d).
Ans:

Problem 1 . The file SpeedTrap.RData is an R data set that contains a data frame called SpeedTrap . This data frame consists of 184 observations (rows) and 7 variables (columns). Each row corresponds...

Answer To: Problem 1 . The file SpeedTrap.RData is an R data set that contains a data frame called SpeedTrap ....

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment