Milestone 2: Exploratory Data Analysis and RegressionOverview and ObjectivesThe primary objective of this project will be to provide an in-depth analysis of data through the lens of regression...

1 answer below »
Hello, instructions are in the folder, Thank you.


Milestone 2: Exploratory Data Analysis and Regression Overview and Objectives The primary objective of this project will be to provide an in-depth analysis of data through the lens of regression analysis. Modern research looks at the interaction between a multitude of variables to provide conclusions based on experimental data. You will have the opportunity to explore a few realistic regression models and provide conclusions on them based on your own observations. You will also observe a few measures of central tendency of the data. You will gain an understanding of the distribution of data through identifying the standard deviation and quartiles of your data set. You will then use these measures of central tendency to identify outliers in your data and determine whether they should be omitted in your regression. You’ll also make inferences based on your regression data and determine which variables are significant in your model. This will help you identify the independent variables that adequately explain the variance in your model. The techniques you apply in this project will hopefully form the foundation of future research and exploration. Part One: Brain Size Excel Tasks: In the tab labelled brain size, please compute the following on your Excel sheet using the Head Size column. · Five Number Summary · Interquartile Range and Lower/Upper Limits for Outlier · Identify whether any of the values are outliers Next follow then next steps to create your scatterplot: · Create a scatterplot that shows the relationship between head size and brain weight. Be sure to include your equation/r-squared value in your scatterplot. · Use the Excel Regression tool to create a Residual vs. Fitted plot for your data and copy your plot on the original tab. *don’t forget to adjust your x-axis Analysis of Results: please respond in 2-4 complete sentences to each question. 1. What kind of relationship exists between head size and brain weight? 2. Are the outlier(s) of the data set reasonable? Should you omit them? 3. Do you think the other variables would be significant in predicting brain weight along with head size? Why? Part Two: Infection Risk in Hospitals Excel Tasks: Using the Infection Risk dataset determine the following measures of central tendency. · Mean · Median · Mode · Standard deviation Once you’ve found these measures, find a regression model that predicts the infection risk of patients using the other data recorded: · Determine which columns might influence the chance that a patient is infected while they are in the hospital. · Run a multiple regression that creates a model that predicts the infection risk of a patient using the columns you indicated. Analysis of Results: please respond in 2-4 complete sentences to each question. 1. What is the typical age of a participant of this study? What is the range of patient ages that are within three standard deviations of the mean? 2. What variables will you use from the data to predict infection risk? 3. Is your regression model a good predictor of infection risk? Are all of the variables you selected statistically significant? Part Three: Using Medical Expenses to Project Insurance Rates Excel Tasks: Using the Medical Expenses dataset, please compute the following in Excel. · Calculate mean, standard deviation, and the z-score for each individual value. · Use this data to determine whether the values are outliers. · Use the Countif function to find how many outliers there are in your data. Next create a regression model that will predict medical expenses by the other variables listed in each column. Analysis of Results: 1. Are all the variables statistically significant in predicting the medical expenses of a patient? 2. What equation would you use to predict the medical expenses of a patient who is not part of this sample? 3. Use the equation from (2) to predict the medical expenses of someone who is 34, female, 32 BMI, 2 children, and a smoker. 4. Do you think this model would accurately predict the medical costs of a patient given the following information? If not, what additional predictors could the model include to improve the prediction? Could the data be adjusted to be more meaningful? Brain Size Brain WeightGenderHead SizeAge RangeOutlier? 1530145121Five Number Summary for Head Size 1297137381Minimum: 13351426111st quartile: 1282137771Median: 15901417713rd quartile: 1300135851Maximum: 1400137851 1255135591Outlier Parameters 1355136131IQR: 1375139821Lower Limit 1340134431Upper Limit 1380139931 1355136401 1522142081 1208138321 1405138761 1358134971 1292134661 1340130951 1400144241 1357138781 1287140461 1275138041 1270137101 1635147471 1505144231 1490140361 1485140221 1310134541 1420141751 1318137871 1432137961 1364141031 1405141611 1432141581 1207138141 1375135271 1350137481 1236133341 1250134921 1350139621 1320135051 1525143151 1570138041 1340138631 1422140341 1506143081 1215131651 1311136411 1300136441 1224138911 1350137931 1335142701 1390140631 1400140121 1225134581 1310138901 1560141662 1330139352 1222136692 1415138662 1175133932 1330144422 1485142532 1470137272 1135133292 1310134152 1154133722 1510144302 1415143812 1468140082 1390138582 1380141212 1432140572 1240138242 1195133942 1225135582 1188133622 1252139302 1315138352 1245138302 1430138562 1279132492 1245135772 1309139332 1412138502 1120133092 1220134062 1280135062 1440139072 1370141602 1192133182 1230136622 1346138992 1290137002 1165137792 1240134732 1132134902 1242136542 1270134782 1218134952 1430138342 1588138762 1320136612 1290136182 1260136482 1425140322 1226133992 1360139162 1620144302 1310136952 1250135242 1295135712 1290135942 1290133832 1275134992 1250135892 1270139002 1362141142 1300139372 1173133992 1256142002 1440144882 1180136142 1306140512 1350137822 1125133912 1165131242 1312140532 1300135822 1270136662 1335135322 1450140462 1310136672 1027228571 1235234361 1260237911 1165233021 1080231041 1127231711 1270235721 1252235301 1200231751 1290234381 1334239031 1380238991 1140234011 1243232671 1340234511 1168230901 1322234131 1249233231 1321236801 1192234391 1373238531 1170231561 1265232791 1235237071 1302240061 1241232691 1078230711 1520237791 1460235481 1075232921 1280234971 1180230821 1250232481 1190233581 1374238031 1306235661 1202231451 1240235031 1316235711 1280237241 1350236151 1180232031 1210236091 1127235611 1324239791 1210235331 1290236891 1100231581 1280240051 1175231811 1160234791 1205236421 1163236321 1022230692 1243233942 1350237032 1237231652 1204233542 1090230002 1355236872 1250235562 1076227732 1120230582 1220233442 1240234932 1220232972 1095233602 1235232282 1105232772 1405238512 1150230672 1305236922 1220234022 1296239952 1175233182 955227202 1070229372 1320235802 1060229392 1130229892 1250235862 1225231562 1180232462 1178231702 1142232682 1130233892 1185233812 1012228642 1280237402 1103234792 1408236472 1300237162 1246232842 1380242042 1350237352 1060232182 1350236852 1220237042 1110232142 1215233942 1104232332 1170233522 1120233912 Source: R.J. Gladstone (1905). "A Study of the Relations of the Brain to to the Size of the Head", Biometrika, Vol. 4, pp105-123 Description: Brain weight (grams) and head size (cubic cm) for 237 adults classified by gender and age group. Variables/Columns Gender 8 /* 1=Male, 2=Female */ Age Range 16 /* 1=20-46, 2=46+ */ Head size (cm^3) 21-24 Brain weight (grams) 29-32 Infection Risk IDStayAgeInfctRskCultureXrayBedsCensusMedSchoolRegionNursesFacilities 17.1355.74.1939.62792072424160 28.8258.21.63.851.78051225240 38.3456.92.78.17410782235420Calculate for Age Column 48.9553.75.618.9122.8147532414840mean: 511.256.55.734.588.91801342115140median: 69.7650.95.121.9971501472210640mode: 79.6857.84.616.7791861512312940standard deviation: 811.1845.75.460.585.86403991236060 98.6748.24.324.490.81821302311840 108.8456.36.329.682.68559216640 1111.0753.24.928.51227685911165680 128.357.24.36.883.8167105235940 1312.7856.87.746116.93222521134957.1 147.5856.73.720.8889759227937.1 15956.34.214.676.47261233817.1 1611.0850.25.518.663.63873262340557.1 178.2848.14.526101.810884247337.1 1811.6253.96.425.599.21331132110137.1 199.0652.84.26.975.91341032212537.1 209.3553.84.115.980.98335472351977.1 217.53424.223.198.99547244917.1 2210.24494.836.3112.61951632217037.1 239.7852.3517.695.92702401119857.1 249.8462.24.81282.36004682349757.1 259.252.2417.571.12982441423657.1 268.2849.53.912113.15464131243657.1 279.3147.24.530.2101.31701242117337.1 288.1952.13.210.859.2176156218837.1 2911.6554.54.418.696.12482172118937.1 309.8950.54.917.7103.61671132210637.1 3111.0349.9519.7102.13182702133557.1 329.84535.217.772.62102002223954.3 3311.7754.15.317.3561961642116534.3 3413.59546.124.2111.73122582116954.3 359.7454.46.311.476.12211702217254.3 3610.3355.8521.2104.32661812114954.3 379.9758.22.816.576.59069224234.3 387.8449.14.67.187.96050234534.3 3910.4753.24.15.769.11961682215354.3 408.1660.91.31.9587349232114.3 418.4851.13.712.192.81661452311834.3 4210.7253.84.723.294.1113902310734.3 4311.2453778.913095235634.3 4410.1251.75.614.979.13623131326454.3 458.3750.75.515.184.811596228834.3 4610.1654.24.68.451.58315811462974.3 4719.5659.96.517.2113.73062732117251.4 4810.957.25.510.671.95934462221151.4 497.6751.71.82.540.410693233511.4 508.8851.54.210.186.93052382319751.4 5111.4857.65.620.3822522072125151.4 529.2351.64.311.642.66204132242071.4 5311.4161.17.616.697.95353302327351.4 5412.0743.77.852.4105.3157115227631.4 558.63543.18.456.27639214431.4 5611.1556.53.97.773.92812172119951.4 577.14593.72.675.87037243531.4 587.6547.14.316.465.73182652431451.4 5910.7350.63.919.31014453741234551.4 6011.4656.94.515.697.71911532313231.4 6110.42583.485911967216431.4 6211.18515.718.855.95955461239268.6 637.9364.15.47.598.16842244928.6 649.6652.14.49.998.38366229528.6 657.7845.5520.971.64893912332948.6 669.4250.64.324.862.85084212152848.6 6710.0249.54.48.3932651912220248.6 688.58553.77.495.93042482321848.6 699.6152.44.56.987.24874042322048.6 708.0354.23.524.387.39765215528.6 717.39514.214.688.47238226728.6 727.0852212.356.48752235728.6 739.5351.55.21565.72982412319348.6 7410.05524.536.787.51841441115168.6 758.4538.83.412.9852351432212448.6 766.748.64.51380.87651247928.6 778.949.72.912.786.95237213528.6 7810.2353.24.99.977.97525951244668.6 798.8855.84.414.176.82371652218248.6 8010.359.65.127.888.9175113227345.7 8110.7944.22.92.656.64613201219665.7 827.9449.53.56.292.31951392211645.7 837.6352.15.511.661.11971092411045.7 848.7754.54.75.24714385248725.7 858.0956.91.77.656.99261236145.7 869.0551.24.120.579.81951272311245.7 877.9152.82.911.979.54773492318865.7 8810.3954.64.31488.33532232220065.7 899.3654.14.818.390.61651272115845.7 9011.4150.45.823.8734243591333545.7 918.8651.32.99.587.510065235325.7 928.935626.272.59559235625.7 938.9253.91.32.279.5564022145.7 948.1554.95.312.379.89955247125.7 959.7750.25.315.789.71541232214825.7 968.5456.12.52782.59857217545.7 978.6652.83.86.869.52461782317745.7 9812.0152.84.810.896.92982372111545.7 997.9551.82.34.654.9163128239342.9 10010.1551.96.216.459.25684521337162.9 1019.7653.22.66.980.16447245522.9 1029.8945.24.311.8108.71901412111242.9 1037.1457.62.713.192.69240245022.9 10413.9565.96.615.6133.53563082118262.9 1059.4452.54.510.958.52972302326342.9 10610.863.92.91.657.413069236222.9 1077.1451.71.44.145.711590231922.9 1088.02552.13.846.59144223222.9 10911.853.85.79.1116.95714411246962.9 1109.549.35.84270.99868234622.9 1117.756.94.412.267.9129852413662.9 11217.9456.25.926.491.88357911140762.9 1139.4159.53.120.691.72920232222.9 Attribution: Data source: Applied Regression Models, (4th edition), Kutner, Neter, and Nachtsheim Medical Expenses agesexbmichildrensmokerExpensesz-scoreoutlier? 19027.90116884.92Calculate for Expenses Column 18133.8101725.55mean: 28133304449.46standard deviation: 33122.70021984.47# of outliers: 32128.9003866.86 31025.7003756.62 46033.4108240.59 37027.7307281.51 37129.8206406.41 60025.80028923.14 25126.2002721.32 62026.30127808.73 23134.4001826.84 56039.80011090.72 27142.10139611.76 19124.6101837.24 52030.81010797.34 23123.8002395.17 56140.30010602.39 30135.30136837.47 600360013228.85 30032.4104149.74 18134.1001137.01 34031.91137701.88 37128206203.9 59027.73014001.13 63023.10014451.84 55032.82012268.63 23117.4102775.19 31136.32138711 22135.60135585.58 18026.3002198.19 19028.6504687.8 63128.30013770.1 28136.41151194.56 19120.4001625.43 620333015612.19 26120.8002302.3 35136.71139774.28 60139.90148173.36 24026.6003046.06 31036.6204949.76 41121.8106272.48 37030.8206313.76 38137.1106079.67 55137.30020630.28 18038.7203393.36 28034.8003556.92 60024.50012629.9 36135.21138709.18 18035.6002211.13 21033.6203579.83 481281123568.27 36134.40137742.58 40028.7308059.68 581372147496.49 58031.82013607.37 18131.72134303.17 53022
Answered 2 days AfterOct 16, 2022

Answer To: Milestone 2: Exploratory Data Analysis and RegressionOverview and ObjectivesThe primary...

Subhanbasha answered on Oct 19 2022
47 Votes
Answers
Part -1
1.
Ans: There is a positive relationship between head size and brain weight which means if head size increases, th
en there is a chance of an increase in brain weight.
2.
Ans: yes, the outliers are reasonable no need to remove any outliers. Because in the scatter plot we can observe that there are one or two data points somehow far not that considerable.
3.
Ans: Yes, by observing the regression output, we can say that the other variables also affect brain weight. The p values of corresponding measures are significant.
    
Part -2
1.
Ans: The typical age of the participants is 53.2 years which can be achieved by the calculating average value. The range of the age with 3 sigma limits are 39.84704 and 66.61668.
2.
Ans: The variables that are Stay, Culture, Xray, Beds, MedSchool, Region, Facilities are the effective variables to use in the regression model because they have some significant relationship with the dependent variable which is an infection risk.
3.
Ans: Apart from the Beds, MedSchool all other variables are significant in the model, and they will be useful in the model prediction. The model R square value is 55% so we say that this is not a great model to predict the output.
Part – 3
1.
Ans: Except for the Sex variable all other variables are significant in the model. By...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here