Data StateOBR (in percentage)pop (in Millions)ED (bachelor's or higher as a percentage)IN (median household Income in thousands)AGE (under age 18 in percentage)SP (spending on Obesity in...

This is a regression project, files are attached.


Data StateOBR (in percentage)pop (in Millions)ED (bachelor's or higher as a percentage)IN (median household Income in thousands)AGE (under age 18 in percentage)SP (spending on Obesity in thousands $) AL334.822.643.2232505OBR% of population that is obese AK25.70.727.570.725.6459POPPopulation in Millions AZ266.726.949.724.42113EDEducation as percentage of population who have obtained a bachelor's degree or higher AR34.52.920.140.7241254INMedian Household Income ($) in thousands CA2538.830.76123.915223AGEPercentage of population under the age of 18 CO20.55.33758.423.51637SPSpending in thousands of dollars on obesity CT25.63.536.569.421.81719 DE26.90.928.959.822592 FL25.219.826.446.920.68079 GA29.1102849.124.94226 HI23.61.430.167.421.9470 ID26.81.625.146.726.5550 IL28.112.831.456.723.56368 IN31.46.523.248.224.13520 IA30.43.125.751.823.41435 KS29.92.930.351.3251327 KY31.34.321.54323.12372 LA34.74.621.844.824.12383 ME28.41.327.948.419.7767 MD27.65.936.873.522.73032 MA22.96.739.466.820.83511 MI31.19.925.948.422.75349 MN25.75.432.659.823.62800 MS34.62.920.13924.71586 MO29.6626.247.323.13196 MT24.3128.746.222.1379 NE28.61.828.551.624.91002 NV26.22.822.452.823.71048 NH27.31.333.764.920.5594 NJ24.68.935.871.622.74447 NM27.1225.844.924.3663 NY23.619.733.25821.611114 NC29.69.927.346.323.24599 ND29.70.727.253.722.5371 OH30.111.525.248.322.96896 OK32.23.823.545.324.61721 OR27.33.929.750.221.81678 PA29.112.727.552.221.36997 RI25.7131.356.320.4566 SC31.64.825.144.722.62291 SD28.10.826.249.424.6409 TN31.16.523.844.2233656 TX29.226.926.751.926.610262 UT24.32.930.358.830.9953 VT23.70.634.854.219.6291 VA27.48.335.263.922.63387 WA26.8725.759.422.92977 WV33.81.818.34120.61171 WI29.75.726.852.422.83078 WY24.60.524.757.423.6203 sources http://www.ncsl.org/research/health/obesity-statistics-in-the-united-states.aspx http://quickfacts.census.gov/qfd/index.html Sheet1 Stat 2070 – Final Regression Project (5% of your grade) – Spring 2019 DUE: By on the 1st of May by 10pm (53 pts) Name:_________________ Expectations for this project: I expect it to be written up and to look nice, as if you were presenting it to a group of advisers. This final regression project incorporates all parts of this class into one project that would be similar to a real world application of statistics in a future job or research position. Think of this as a great opportunity to build an example you can use in an interview for a job or internship where you have to demonstrate skills you’ve learned in a semester that has a real application to the world. You will need to make it as an electronic document and submit it to me via email as a PDF or a word document. The due date is May 1st, however you can submit it any time before the due date. The sooner the better. The Scenario: A select group of parents have decided that obesity is caused by the consumption of Ice Cream and nothing else. They have now started campaigning the U.S. Government to stop the production and sale of Ice Cream on the national level. This is all on the hopes to prevent childhood obesity and what they believe to be the main contributor to adulthood obesity, some states have resisted. You think this is a little extreme and have decided to do some digging into what the facts are and what might be the ramifications, before you jump to any conclusions. As you know correlation doesn’t imply causation, and data speaks volumes. On a side tangent, you find the following financial statistics that in order for the U.S. Government to combat Ice Cream sales, it needs to spend $51 Billion to pay for the war on Ice Cream, but by enacting this law against Ice Cream the federal government also losses out on $46.7 Billion in Ice Cream annual tax revenue. The economist inside of you says this is more like a $97.7 Billion financial burden on the U.S. tax payers, which you guess is less than the TARP (Troubled Asset Relief Program) bailout which authorized $700 billion in purchasing illiquid mortgage backed securities (although this was true only $426 Billion was used, but still quite a tidy sum of money). You still think that this seems like a huge financial fumble for the U.S. Government to enact such a law, with no real justification, you are inspired to gather some data about the real results of Obesity and Children, and other factors. So you regress back to the problem you want to address and to provide a detailed report to the advisers/policy makers who will decide the fate of Ice Cream. In the process of gathering data you find the following variables: · OBR (Obesity rate as a percentage) hint: you may need to make this a proportion if using a proportion test, along with any other percentage!, · POP (Population in millions), · EDU (Education, percentage of population with a bachelor’s degree or higher), · MHI (Median Household Income in thousands of dollars), · AGE (percentage of population under the age of 18), · SP (spending on obesity in thousands). 1) (8pts) Do the variables: a) OBR, b) POP, c) EDU, and d) MHI, look normally distributed (use histograms and normal quantile plots to support your claim). Remember the purpose of this document you are presenting to a group of advisers. 2) (5pts)Find the summary statistics on the variables: EDU, MHI and OBR (summary statistics are mean, max, min, median, and standard deviation). Present them in a table, don’t just give the numbers, elaborate on them, you’re presenting to a group of law makers who know nothing...enlighten them. 3) (6pts) In the attempt to find answers to obesity you stumble across articles that say educated and wealthy tend to be fit and healthy. This prompts you to be interested in the relationships between: a. Obesity and Education (OBR & EDU), b. Obesity and Income (OBR & MHI). Find the following for the each of the above: i. Linear correlation coefficient, and elaborate on it. What does the correlation coefficient mean? ii. Plot a scatterplot for each of the two relationships. Does it agree with your above linear correlation coefficient. (hint if it doesn’t maybe you need to redo it). c. Do you think MHI and EDU are positively or negative correlated? How can you find out? What is the Correlation r value between them? (Show it, do it! Remember you’re presenting this, not just blowing by the answer and saying “no” or “yes”). 4) (4pts) Suppose the same group who wants to outlaw Ice Cream read an article that says it can lower the obesity rate to below 17%, if this law was enacted. Identify the null and alternative hypothesis and test the claim using (hint you need a critical value, a test statistic, and some of the summary statistics you got from above, hint: probably find the average for the U.S., this is your sample proportion). By testing this claim you are seeing if this is statistically different then what the current obesity rate is, and whether the law should be enacted. In other words if it’s not significant do you think we should enact it? 5) (2pts) Based on the following confidence interval you saw in reading and article about the obesity rate in the United States (0.1164, 0.4436) would you reject or fail to reject the null hypothesis from question #4 above. Why? 6) (10pts) Now suppose you wanted to find the best single variable regression equation (). If you were to run a single variable regression on the following two models (remember what is your independent variable and dependent variable): a. Obesity and Education (OBR & EDU), b. Obesity and Income (OBR & MHI). For each of the models: a) Report the regression equation, (can be simply the equations, with what each value represents) b) Interpret the coefficient variable estimate and the model, (now interpret, what does the coefficient estimate for each mean?) c) Indicate if the variable is significant, d) Explain if the model is a good fit, (how do you know if a model is a good fit?) e) Which model is better, for predicting obesity? What do you use from the above to show which is better? 7) (3pts) Now suppose you are comparing the U.S. Obesity rate to that of the Canada’s. You want to show that the we don’t have to ban Ice Cream to have a lower Obesity rate. The U.S. statistics are , and Canada . Test the claim that the U.S. and Canada have the same proportion of the population that has obesity. (Remember what your data means, is it a percentage or a proportion, has it already been converted over, what kind of test is this?) 8) (10pts) Run a multiple regression for the following three models (OBR is your dependent variable): a) OBR, EDU, MHI, b) OBR, EDU, MHI, POP, AGE, c) OBR, EDU, MHI, AGE, SP and use the two single variable regression models from above #6, decide which is the best model of the five models. Explain. (Use the relevant statistics from each model and explain, this does not need to be as detailed as the above, but you do you need to discuss which models are good and why and which is best). 9) (6pts) Suppose you are rethinking your data and want to come up with some more variables to try to generate more significant results, Explain your thinking/logic behind each, a. Why could Median Household Income be replaced with Educational attainment. How could you check? b. Weight is not highly correlated with Body Mass Index (BMI), and therefore would be good to use. c. True or False: A better model would incorporate regions of the U.S., such as the south, north east, etc. (dummy variables). Extra credit (2pts): It is not coincidence I have capitalized Ice Cream in all of the beginning scenario, what am I referring to?
Apr 08, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here