DASC 512, Final Instructions This is an individual final, and you are expected to do your own work. Do not discuss this exam with anyone other than the course instructors. The primary task of this...

1 answer below »
The instructions are in the final.pdf. I need the instructions followed to a T. If after reviewing and you think it will be more than 15 pages with graphs then please let me know and I will adjust the requirement


DASC 512, Final Instructions This is an individual final, and you are expected to do your own work. Do not discuss this exam with anyone other than the course instructors. The primary task of this assignment is to write a report detailing how and why you came to a regression model relating median home price to predictor variables (if any). Write a coherent and concise report that flows well and clearly describes your analysis and conclusions. Remember that there is no absolutely best answer and I expect to receive many different answers. You will be graded on the predictive accuracy of your model, the interpretation and justification of your model, the description of the process used to develop your model, and the communication of your results and conclusions. Data The data in this problem were collected by two economists to be used in constructing a regression equation to serve as a price index for owner-occupied housing in a region containing a large U.S. city. Data were obtained for each of 506 census tracts in and around the city. (The U.S. Census Bureau has partitioned the entire country into geographical regions, called census tracts that contain about the same number of people.) The values for some variables were reported on a census tract basis while other variables were reported on a community basis. For example, the property tax rate is determined by each community. If a community consists of more than one census tract, the property tax rate will be the same for each census tract in that community. Note that census tracts between 357 and 488, inclusive, are all part of the city. The other census tracts are in towns or suburbs in the surrounding metropolitan area, but they are not in the city. Census tracts in the city have the same values for the property tax, pupil-teacher ratio, zoning and highway access variables. The data for the 506 census tracks are in the attached data file ‘student data.csv’, although you should note that the last 50 data points have missing Y values. There is one line of data for each census tract. Values for the variables appear in the order they are listed in the following table. Use these variable names in formulas and tables presented in your report. With the exception of Census Tract, which is a three digit identification, the variables are described below. Y: The median value of owner-occupied homes in the census tract. X1: Per capita crime rate in the community. Assuming crime rates are related to people’s perception of danger, areas with higher crime rates may have lower median housing values. X2: Percentage of a community’s residential land zoned for lots greater than 25,000 square feet. X3: Percentage of acres in the community zoned for non-retail business. This variable serves as a proxy for variables associated with industry such as noise, heavy traffic, ugly buildings, etc. and could have a negative correlation with housing values. X4: A dummy variable with value 1 if the tract borders a specific river and with value 0 otherwise. Locations of homes along some sections of this river are very desirable. X5: The average concentration (parts per 100 million) of nitrogen oxides in the air. This is a measure of the level of air pollution. X6: The average number of rooms per owner-occupied home. This variable represents the average size of homes. X7: Percentage of owner-occupied homes that are more than thirty years old. X8: Natural logarithm of the weighted distances to five major employment centers in the metropolitan area. A larger value indicates that the census tract is farther away from the major employment centers. According to traditional theories housing values should be higher near employment centers. X9: Natural logarithm of an index of accessibility to radial highways that is calculated on a community basis. Larger values represent better access to major highways. Final, DASC 512, Page 1 X10: Property tax rate (in dollars) per $10,000 of property value. This measures costs paid by homeown- ers to maintain schools and public services in each community. Higher values may indicate better public services such as police and fire protection, libraries, quality of roads, busses and other pub- lic transportation, or higher values may represent more expensive and less efficient delivery of public services. X11: Pupil-teacher ratio in each school district. Lower values of this variable may represent higher quality of primary and secondary education. X12: The percentage of the population in the census tract of lower socio-economic status (percentage adults without a high school diploma or classified as laborers). Task Your task is to analyze the data for the 456 census tracks for which you have complete data and construct one or more good regression models for predicting Y , the median value of owner-occupied homes. You may include additional explanatory variables constructed from functions of the variables on the data file if you think they are worthwhile. You should summarize your analysis in a report that includes the following discussions. 1. Provide a one-two paragraph “Executive Summary” of your major conclusions about the relationships between median housing prices and the explanatory variables with some mention of the nitrogen oxide variable. This should not contain any formulas or mathematical symbols. It should be written so that it could be easily understood by a real estate investor with no formal training in statistics. 2. Provide a description of the steps taken to identify your best model (or models). Do not submit any python output in this section, but graphical analysis and summary statistics are encouraged. Simply outline the issues you considered, your decisions, and the sequence of steps you took to develop a model. Be detailed – tell me what you did, why you did it and if it worked. (a) For the purposes of this course, consider only models with main effects, quadratic effects (X2) and following interactions: X1 ∗X5, X4 ∗X5, X5 ∗X6, X5 ∗X8. This should be “reasonable” while still forcing you to explore the model building process. (b) Because of this model limitation, it is possible that there may be some higher order effects and/or non-linearity in the data that you cannot model because I have limited the variables under con- sideration. Remember this when looking at residual plots scatter plots. Point out if you think there may be deficiencies caused in this way. You may and should transform Y if you deem it necessary/useful. 3. Provide a formula for your best model (or models), standard errors for coefficients, and the R2 value. You can summarize python results in tables of your own creation. Discuss and interpret any important features of your model. Pay some attention to the nitrogen oxide (air pollution) variable as a predictor of median housing values, although you may conclude that it is not important. 4. Provide convincing evidence that the model you selected is a good model for using some or all of the twelve explanatory variables to predict median housing values. Discussion of residual plots and other diagnostic checks would be appropriate. Statistical tests should be formulated correctly with appropriate hypotheses and conclusions. You may attach graphs or tables, but lists of raw python output should not be submitted and will be ignored. 5. You may submit one more paragraph outlining additional analyses that you would have done if you had more time. You will earn points for good suggestions and lose points for suggestions with little potential value. (This is optional.) Final, DASC 512, Page 2 6. The last output you must provide is a set of predictions for the missing Y data points (the last 50 observations). Use your final, best model to predict Y and create a 95% prediction interval. Points for the “Predictive Ability” section will be based on MSPR = ∑ y−ŷ n (lower is better) coverage of your confidence intervals (95% intervals→ 2-3 missed intervals due to random error), and the width of your confidence intervals (more narrow = better so long as coverage is okay). Deliverables 1. A ‘last first.pdf’ PDF Document with your write-up, use MS Word’s “print to pdf” feature as necessary. You should have no raw python output. Only graphical and tabular output is allowed. 2. A ‘last first.csv’ CSV file with your predictions. This should have the following columns: ‘Census Tract’, ‘Prediction’, ‘Lower Prediction CI’, ‘Upper Prediction CI’. 3. A ‘last first.py’ python file with your complete analysis, including plot generation, statistical tests, and predictions. Grading This final will be worth 80 points broken up as follows ˆ Writing: 10 points, Emphasis on precision, clarity and efficiency. You should use paragraphs, transi- tions, etc. ˆ Executive Summary: 10 points, Clear and concise use of language to convey your model in a limited space. ˆ Model Building Process, Logic, Appropriate Conclusions: 40 Points, Appropriate use of tools from this course applied correctly and communicated effectively. ˆ Predictive Ability: 20 points, Coverage of true values in your CI, MSPR will play into this score. Final, DASC 512, Page 3 Instructions Data Task Deliverables Grading Census Tract,Y,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12,X1*X5,X4*X5,X5*X6,X5*X8 1,158347,0.00632,18,2.31,0,5.38,6.575,65.2,1.40854,0,296,15.3,4.98,0.0340016,0,35.3735,7.5779452 2,174521,0.02731,0,7.07,0,4.69,6.421,78.9,1.60283,0.69315,242,17.8,9.14,0.1280839,0,30.11449,7.5172727 3,218497,0.0273,0,7.07,0,4.69,7.185,61.1,1.60283,0.69315,242,17.8,4.01,0.128037,0,33.69765,7.5172727 5,203425,0.06905,0,2.18,0,4.58,7.147,54.2,1.80207,1.09861,222,18.7,5.33,0.316249,0,32.73326,8.2534806 6,165284,0.02985,0,2.18,0,4.58,6.43,58.7,1.80207,1.09861,222,18.7,5.21,0.136713,0,29.4494,8.2534806 7,216533,0.08829,13,7.87,0,5.24,6.012,66.6,1.71569,1.60944,311,15.2,12.43,0.4626396,0,31.50288,8.9902156 8,121727,0.14455,13,7.87,0,5.24,6.172,96.1,1.78347,1.60944,311,15.2,19.15,0.757442,0,32.34128,9.3453828 9,141576,0.21124,13,7.87,0,5.24,5.631,100,1.80535,1.60944,311,15.2,29.93,1.1068976,0,29.50644,9.460034 10,136074,0.17004,13,7.87,0,5.24,6.004,85.9,1.88587,1.60944,311,15.2,17.1,0.8910096,0,31.46096,9.8819588 11,104191,0.22489,13,7.87,0,5.24,6.377,94.3,1.84793,1.60944,311,15.2,20.45,1.1784236,0,33.41548,9.6831532 12,160469,0.11747,13,7.87,0,5
Answered 4 days AfterAug 27, 2021

Answer To: DASC 512, Final Instructions This is an individual final, and you are expected to do your own work....

Suraj answered on Aug 31 2021
137 Votes
,Predictions,Census...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here