Question 1 Use the MLB data for this problem. Chapter 13 Let attendance be the dependent variable and total team salary be the independent variable. Determine the regression equation and answer the...

1 answer below »



Question 1




Use the MLB data for this problem.



Chapter 13


Let attendance be the dependent variable and total team salary be the independent variable. Determine the regression equation and answer the following questions.


a. Draw a scatter diagram. From the diagram, does there seem to be a direct relationship between the two variables?


b. What is the expected attendance for a team with a salary of $100.0 million?


c. If the owners pay an additional $30 million, how many more people could they expect to attend?


d. At the .05 significance level, can we conclude that the slop of the regression line is positive? Conduct the appropriate hypothesis test.


e. What percentage of the variation in attendance is accounted for by salary?


f. Calculate the correlation between attendance and team batting average then between attendance and team ERA. Which is stronger?




Chapter 14


Let the number of games won be the dependent variable and the following variables be the independent variables: team batting average, team Earned Run Average (ERA), number of home runs, and whether the team plays in the National or American League.


a. Develop a correlation matrix. Which independent variables havestrong or weak correlations with the dependent variable? Do you see any problems with multicolinearity? Are you surprised that the correlation coefficient for ERA is negative?


b. Use Excel to calculate the multiple regression equation. How did you select the variables to include in the equation? Write out the regression equation. Interpret the R-square. Is the number of wins affected by whether the team plays in the National or American League?






Question 2







Use the Lincolnville School Bus (Buena) data set for this problem.



Chapter 13


Develop a regression equation that expresses the relationship between age of the bus and maintenance cost. The age of the bus is the independent variable.


a. Draw a scatter diagram. What does this diagram suggest as to the relationship between the two variables? Is it direct or indirect? Does it appear to be strong or weak?


b. Develop a regression equation. How much does an additional year add to the maintenance cost? What is the estimated maintenance cost for a 10-year-old bus?


c. Conduct a test of hypothesis to determine whether the slop of the regression line is greater than zero. Use the .05 significance level. Interpret your findings.



Chapter 14


Add a variable to change the type of engine (diesel or gasoline) to a qualitative variable. If the engine type is diesel, then set the variable to 0. If the engine type is gasoline, then set the qualitative variable to 1. Develop a regression equation using Excel with maintenance cost as the dependent variable and age, odometer miles, miles since last maintenance, and engine type as the independent variables.


a. develop a correlation matrix. Which independent variables have strong or weak correlations with the dependent variable? Do you see any problems with multicolinearity?


b. Use Excel to determine the multiple regression equation. How did you select the variables to include in the equation? How did you use the information in the correlation analysis? Show that your regression equation shows a significant relationship. Write out your equation and interpret the practical implications.


Answered Same DayOct 09, 2021

Answer To: Question 1 Use the MLB data for this problem. Chapter 13 Let attendance be the dependent variable...

Pritam answered on Oct 12 2021
135 Votes
MLB data Problems:
Chapter 13:
a) The scatter diagram is given below.
As it seems there is a direct positive relationship between salary and attendance. The attendance seems to increase with the increase in the team salary.
b) First of all, the regression equation wi
th salary as the predictor and the attendance as a response variable is given below along with the regression output. From this equation, one can estimate the attendance of a player with a salary of $100 million dollars.
Attendance = 1208968.43 + 10152.69 * Team Salary
Hence the required attendance of a player with $100 million dollars salary is given by
Attendance = 1208968.43 + 10152.69 * 100 = 2224238 (approx.)
c) One unit of team salary increase is associated with 10153 (approx.) more attendance and hence with an increase of $30 million dollars salary the attendance will be increased by 304581 (approx.).
d) T-test can be applied to check the hypothesis whether the slope is positive or negative. One may conduct the test with the null hypothesis as slope is negative or zero vs the alternative hypothesis of slope being greater than zero at the 5% significance level. The corresponding p-value of the one-sided test comes out to be 4.76*10^(-5) which is highly significant and implying that the null hypothesis could be rejected and hence the slope is assumed to be greater than zero at 5% significance level.
e) Almost 42.49% of the variance in the response variable attendance is explained by the predictor variable team salary.
f) The correlation between attendance and batting average is given by 0.15 and that between attendance and ERA is -0.47 which is obviously stronger than the previous one.
Chapter 14:
a) The correlation matrix is given below.
     
    Wins
    ERA
    BA
    HR
    League_dummy
    Wins
    1
    
    
    
    
    ERA
    -0.79649
    1
    
    
    
    BA
    0.34089
    0.044491
    1
    
    
    HR
    0.371229
    -0.13992
    0.014965
    1
    
    League_dummy
    -0.09851
    -0.04167
    -0.16239
    -0.31423
    1
From the correlation matrix, it is evident that ERA and Wins have the highest correlation between them and BA and HR also have mild correlation with the attendance. The Rest of the variables seem to be weakly correlated with the response variable Wins. Otherwise, there is no serious problem of multicollinearity as the predictors are negligibly correlated with each other.
b) The regression model was built firstly based on the multicollinearity. Since there is no serious multicollinearity present, the variable with the least significant p-value is removed and then thus the final model is selected with all significant variables. The regression equation is given below.
Wins = 37.85 – 18.20 * ERA + 404.03 * BA + 0.09 * HR
The...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here