Please ensure the directions are specifically followed. Each problem requires the followingHypothesized Model: State your hypothesized model as ˆy = β0 +β1x with numerical values for β0 and β + 1 A scatterplot showing your hypothesized model overlaying the original data. Parameter estimates and confidence intervals where requested (unless otherwise noted, use the tdistribution for tests/intervals) A test for validity in your model (either slope or coefficient of correlation), remember all portions of a hypothesis test – Null and Alternative Hypothesis – Test Statistic – Either p-value or a critical value to compare – Correctly stated conclusion Predictions and estimates (where requested) for your model. Validation of your assumptions for Regression
as well as the other questions asked
Requesting a word document with all visuals in the word document (scatter plots, ect)
DASC 512, Homework 6 Instructions: Each problem will require you to perform a simple regression analysis. A regression analysis includes the following information/steps: Hypothesized Model: State your hypothesized model as ŷ = β0 +β1x with numerical values for β0 and β + 1 A scatterplot showing your hypothesized model overlaying the original data. Parameter estimates and confidence intervals where requested (unless otherwise noted, use the t- distribution for tests/intervals) A test for validity in your model (either slope or coefficient of correlation), remember all portions of a hypothesis test – Null and Alternative Hypothesis – Test Statistic – Either p-value or a critical value to compare – Correctly stated conclusion Predictions and estimates (where requested) for your model. Validation of your assumptions for Regression 1. In baseball, it is hypothesized that we can use the run differential to predict the number of wins a team will have by the end of the season. Use the file ‘TeamData.csv’ to test this concept. Note: The only thing different from HW5 is the graphical analysis of your assumptions. (a) Create a column of data for Run Differential (R−RA) and a column for Win Percentage (W/(W+ L)). Use these values to determine if the Run Differential can be used to predict the percentage of wins a team will end up with and graphically validate that your regression model (Residual Analysis) (b) Bill James, the godfather of sabermetrics, emperically derived a non-linear formula to estimate winning percentage called the Pythagorean Expectation. Wpct = R2 R2 +RA2 Create a new variable representing R 2 R2+RA2 , the pythagorean model. Now use this new column to replace the Run Differential and re-run your analysis, perform a graphical analysis on the residuals to validate this model. HW 6, DASC 512, Page 1 2. Using the data file ‘UScrime.csv’ fit a model that can be used to predict the rate of offenses per 1000000 population in 1960 (Achieve an R2a ≥ 0.7). Run residual analysis (graphically) to determine if your model is accurate. The data is aggregating data from 47 states in the US in 1960 with the following columns: Variable Description M percentage of males aged 14–24 in total state population So indicator variable for a southern state Ed mean years of schooling of the population aged 25 years or over Po1 per capita expenditure on police protection in 1960 Po2 per capita expenditure on police protection in 1959 LF labour force participation rate of civilian urban males in the age-group 14-24 M.F number of males per 100 females Pop state population in 1960 in hundred thousands NW percentage of nonwhites in the population U1 unemployment rate of urban males 14–24 U2 unemployment rate of urban males 35–39 Wealth wealth: median value of transferable assets or family income Ineq income inequality: percentage of families earning below half the median income Prob probability of imprisonment: ratio of number of commitments to number of offenses Time average time in months served by offenders in state prisons before their first release Crime crime rate: number of offenses per 100,000 population in 1960 Helpful hint: Only one of Po1 and Po2, and only one of U1 and U2, remain in the final regression, because of high collinearity. Once your model is complete, noting that this is not implying a causal relationship, comment on the level of association (slope) between the variables in your model with crime rates. What does a positive/negative slope mean? HW 6, DASC 512, Page 2 "","M","So","Ed","Po1","Po2","LF","M.F","Pop","NW","U1","U2","Wealth","Ineq","Prob","Time","Crime" "1",15.1,1,9.1,5.8,5.6,0.51,95,33,30.1,0.108,4.1,3940,26.1,0.084602,26.2011,791 "2",14.3,0,11.3,10.3,9.5,0.583,101.2,13,10.2,0.096,3.6,5570,19.4,0.029599,25.2999,1635 "3",14.2,1,8.9,4.5,4.4,0.533,96.9,18,21.9,0.094,3.3,3180,25,0.083401,24.3006,578 "4",13.6,0,12.1,14.9,14.1,0.577,99.4,157,8,0.102,3.9,6730,16.7,0.015801,29.9012,1969 "5",14.1,0,12.1,10.9,10.1,0.591,98.5,18,3,0.091,2,5780,17.4,0.041399,21.2998,1234 "6",12.1,0,11,11.8,11.5,0.547,96.4,25,4.4,0.084,2.9,6890,12.6,0.034201,20.9995,682 "7",12.7,1,11.1,8.2,7.9,0.519,98.2,4,13.9,0.097,3.8,6200,16.8,0.0421,20.6993,963 "8",13.1,1,10.9,11.5,10.9,0.542,96.9,50,17.9,0.079,3.5,4720,20.6,0.040099,24.5988,1555 "9",15.7,1,9,6.5,6.2,0.553,95.5,39,28.6,0.081,2.8,4210,23.9,0.071697,29.4001,856 "10",14,0,11.8,7.1,6.8,0.632,102.9,7,1.5,0.1,2.4,5260,17.4,0.044498,19.5994,705 "11",12.4,0,10.5,12.1,11.6,0.58,96.6,101,10.6,0.077,3.5,6570,17,0.016201,41.6,1674 "12",13.4,0,10.8,7.5,7.1,0.595,97.2,47,5.9,0.083,3.1,5800,17.2,0.031201,34.2984,849 "13",12.8,0,11.3,6.7,6,0.624,97.2,28,1,0.077,2.5,5070,20.6,0.045302,36.2993,511 "14",13.5,0,11.7,6.2,6.1,0.595,98.6,22,4.6,0.077,2.7,5290,19,0.0532,21.501,664 "15",15.2,1,8.7,5.7,5.3,0.53,98.6,30,7.2,0.092,4.3,4050,26.4,0.0691,22.7008,798 "16",14.2,1,8.8,8.1,7.7,0.497,95.6,33,32.1,0.116,4.7,4270,24.7,0.052099,26.0991,946 "17",14.3,0,11,6.6,6.3,0.537,97.7,10,0.6,0.114,3.5,4870,16.6,0.076299,19.1002,539 "18",13.5,1,10.4,12.3,11.5,0.537,97.8,31,17,0.089,3.4,6310,16.5,0.119804,18.1996,929 "19",13,0,11.6,12.8,12.8,0.536,93.4,51,2.4,0.078,3.4,6270,13.5,0.019099,24.9008,750 "20",12.5,0,10.8,11.3,10.5,0.567,98.5,78,9.4,0.13,5.8,6260,16.6,0.034801,26.401,1225 "21",12.6,0,10.8,7.4,6.7,0.602,98.4,34,1.2,0.102,3.3,5570,19.5,0.0228,37.5998,742 "22",15.7,1,8.9,4.7,4.4,0.512,96.2,22,42.3,0.097,3.4,2880,27.6,0.089502,37.0994,439 "23",13.2,0,9.6,8.7,8.3,0.564,95.3,43,9.2,0.083,3.2,5130,22.7,0.0307,25.1989,1216 "24",13.1,0,11.6,7.8,7.3,0.574,103.8,7,3.6,0.142,4.2,5400,17.6,0.041598,17.6,968 "25",13,0,11.6,6.3,5.7,0.641,98.4,14,2.6,0.07,2.1,4860,19.6,0.069197,21.9003,523 "26",13.1,0,12.1,16,14.3,0.631,107.1,3,7.7,0.102,4.1,6740,15.2,0.041698,22.1005,1993 "27",13.5,0,10.9,6.9,7.1,0.54,96.5,6,0.4,0.08,2.2,5640,13.9,0.036099,28.4999,342 "28",15.2,0,11.2,8.2,7.6,0.571,101.8,10,7.9,0.103,2.8,5370,21.5,0.038201,25.8006,1216 "29",11.9,0,10.7,16.6,15.7,0.521,93.8,168,8.9,0.092,3.6,6370,15.4,0.0234,36.7009,1043 "30",16.6,1,8.9,5.8,5.4,0.521,97.3,46,25.4,0.072,2.6,3960,23.7,0.075298,28.3011,696 "31",14,0,9.3,5.5,5.4,0.535,104.5,6,2,0.135,4,4530,20,0.041999,21.7998,373 "32",12.5,0,10.9,9,8.1,0.586,96.4,97,8.2,0.105,4.3,6170,16.3,0.042698,30.9014,754 "33",14.7,1,10.4,6.3,6.4,0.56,97.2,23,9.5,0.076,2.4,4620,23.3,0.049499,25.5005,1072 "34",12.6,0,11.8,9.7,9.7,0.542,99,18,2.1,0.102,3.5,5890,16.6,0.040799,21.6997,923 "35",12.3,0,10.2,9.7,8.7,0.526,94.8,113,7.6,0.124,5,5720,15.8,0.0207,37.4011,653 "36",15,0,10,10.9,9.8,0.531,96.4,9,2.4,0.087,3.8,5590,15.3,0.0069,44.0004,1272 "37",17.7,1,8.7,5.8,5.6,0.638,97.4,24,34.9,0.076,2.8,3820,25.4,0.045198,31.6995,831 "38",13.3,0,10.4,5.1,4.7,0.599,102.4,7,4,0.099,2.7,4250,22.5,0.053998,16.6999,566 "39",14.9,1,8.8,6.1,5.4,0.515,95.3,36,16.5,0.086,3.5,3950,25.1,0.047099,27.3004,826 "40",14.5,1,10.4,8.2,7.4,0.56,98.1,96,12.6,0.088,3.1,4880,22.8,0.038801,29.3004,1151 "41",14.8,0,12.2,7.2,6.6,0.601,99.8,9,1.9,0.084,2,5900,14.4,0.0251,30.0001,880 "42",14.1,0,10.9,5.6,5.4,0.523,96.8,4,0.2,0.107,3.7,4890,17,0.088904,12.1996,542 "43",16.2,1,9.9,7.5,7,0.522,99.6,40,20.8,0.073,2.7,4960,22.4,0.054902,31.9989,823 "44",13.6,0,12.1,9.5,9.6,0.574,101.2,29,3.6,0.111,3.7,6220,16.2,0.0281,30.0001,1030 "45",13.9,1,8.8,4.6,4.1,0.48,96.8,19,4.9,0.135,5.3,4570,24.9,0.056202,32.5996,455 "46",12.6,0,10.4,10.6,9.7,0.599,98.9,40,2.4,0.078,2.5,5930,17.1,0.046598,16.6999,508 "47",13,0,12.1,9,9.1,0.623,104.9,3,2.2,0.113,4,5880,16,0.052802,16.0997,849 "","teamID","yearID","lgID","G","W","L","R","RA" "1","ANA",2001,"AL",162,75,87,691,730 "2","ARI",2001,"NL",162,92,70,818,677 "3"