OpenIntro Here is OpenIntro’s RLab for Chapter 8: http://htmlpreview.github.io/?https://github.com/andrewpbray/oiLabs-base-R/blob/master/simple_regression/simple_regression.html Record and turn in...

1 answer below »

OpenIntro


Here is OpenIntro’s RLab for Chapter 8:



http://htmlpreview.github.io/?https://github.com/andrewpbray/oiLabs-base-R/blob/master/simple_regression/simple_regression.html




Record and turn in your answers for the On Your Own section.




  • In questions that ask for a number, give the numerical answer along with what that number describes. Be specific.




  • In questions that ask for your code input, give the code and also describe what R returns.




  • In questions that ask for explanation or analysis, provide a reasonably detailed response with evidence supporting your claims.




Supplemental Questions


The idea of these supplemental questions is for you to run through the mechanics of the above lab with the data sets you chose for WH08. Repeat these steps for each data set.




To import your data set into R, save your data sets as .csv files on your desktop or documents folder. Use the file browser in the bottom right corner to locate your file and import it. If you are in RStudio Cloud, you can upload the csv file. Once it is imported you can give a short-hand name to your data set if it has a long name.
::Click here for pictures of these steps::






  1. Create a scatterplot of your two variables with plot(y~x).




  2. Generate the line of best fit with model




  3. Add a this line to your scatterplot with abline(model).




  4. Create a new data set that measures the residuals with residuals




  5. Investigate the linearity of your data set:





    1. Create a scatterplot of your residuals with plot(model$residuals ~ data$x).




    2. Create a histogram of your residuals with hist(model$residuals).




    3. Create a qqplot of your residuals with qqnorm(model$residuals).




    4. Comment on what these graphics tell you about the linearity of your data.





  6. Compute the correlation of your data with cor(data$y$, data$x).




  7. How do these computations relate to your estimates from WH08?



Answered Same DayJun 08, 2021

Answer To: OpenIntro Here is OpenIntro’s RLab for Chapter 8:...

Sudharsan.J answered on Jun 11 2021
137 Votes
On Your Own:
Question1:
fit<-lm(runs~homeruns,data = mlb11)
summary(fit)
output:
Call:
lm(formula = runs ~ homeruns, data = mlb11)
Residuals:
Min 1Q Median 3Q Max
-91.615 -33.410 3.231 24.292 104.631
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 415.2389 41.6779 9.963 1.04e-10 *
**
homeruns 1.8345 0.2677 6.854 1.90e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 51.29 on 28 degrees of freedom
Multiple R-squared: 0.6266,    Adjusted R-squared: 0.6132
F-statistic: 46.98 on 1 and 28 DF, p-value: 1.9e-07
Observation:
The above table says that runs~ homeruns are significant and good relationship.
Based on the test of significance and adjusted R-squared value it is found model.1 to be “best model” when compared to other model with adj-R square (0.6132).
from the above output table---, for the best model f-statistic is reported as 46.98 on 1 and 28 degrees of freedom, with p-values <0.0001(9e-07)
Since the p-value is less than 0.05, it indicates the variable is statistically significant and R2 that tell us how well a model build represents the given data. here,R-squared value is 0.6132, which say the fitted model is best fit.
Scatterplot output actual vs predict:
ggplot(mlb11,aes(x = runs,y = pred))+geom_point(size=1.2,col="blue",shape="circle")+
geom_smooth(method = "lm",se = T,col="red")
From the above scatterplot we can conclude that runs vs predicted having good relationship between each other.
Question2:
fit<-lm(runs~homeruns+hits,data = mlb11)
summary(fit)
Call:
lm(formula = runs ~ homeruns + hits, data = mlb11)
Residuals:
Min 1Q Median 3Q Max
-47.134 -24.852 0.975 19.706 64.234
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -228.29781 97.99600 -2.330 0.0275 *
homeruns 1.23374 0.18748 6.581 4.65e-07 ***
hits 0.52147 0.07662 6.806 2.61e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 31.7 on 27 degrees of freedom
Multiple R-squared: 0.8625,    Adjusted R-squared: 0.8523
F-statistic: 84.68 on 2 and 27 DF, p-value: 2.33e-12
Runs vs At_bats output:
fit1<-lm(runs~at_bats,data=mlb11)
summary(fit1)
Call:
lm(formula = runs ~ at_bats, data = mlb11)
Residuals:
Min 1Q Median 3Q Max
-125.58 -47.05 -16.59 54.40 176.87
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2789.2429 853.6957 -3.267 0.002871 **
at_bats 0.6305 0.1545 4.080 0.000339 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 66.47 on 28 degrees of freedom
Multiple R-squared: 0.3729,    Adjusted R-squared: 0.3505
F-statistic: 16.65 on 1 and 28 DF, p-value: 0.0003388
From the above two output we came to know the model1 gives the best model.
Based on the test of significance and adjusted R-squared value it is found model.1 to be “best model” when compared to other model with adj-R square (0.8523).
from the above output table---, for the best model f-statistic is reported as 86.48 on 2 and 27 degrees of freedom, with p-values <0.0001(2.33e-12)
Since the p-value is less than 0.05, it indicates the variable is statistically significant and R2 that tell us how well a model build represents the given data. here,R-squared value is 0.8523, which say the fitted model is best...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here