he main goal of the final project is to apply the concepts that you have learned throughout the course to real-world data. In this project first you need to explore the data through descriptive...

1 answer below »
he main goal of the final project is to apply the concepts that you have learned throughout the course to real-world data. In this project first you need to explore the data through descriptive statistics and graphical summaries and then use multiple regression analysis to analyze the relationships between variables. You can select from any of the data sets provided. The real estate data set includes information on homes on the market by North Valley Real Estate. The other data sets include data on kickstarter projects, student debt, bike sharing in DC, and auto sales data. For each of those I have included a video in which I describe the data set.

The initial reportfor the project should be a 2-4 page paper (this does not include computer output/tables/graphs) that describes the details of each variable (types of the variable, summary statistics: mean, median, standard deviation, etc.) and shows the variable’s distribution using histograms or frequency polygons (or other graphical summary methods discussed in week 1).The final reportfor the project should be a 6-10 page paper (this does not include computer output/tables/graphs) that describes the question of interest, how you used the data set to analyze the question with details on the steps you used in your analysis, your findings about the question of interest and the limitations of your study. Specifically, your report should contain the following:


  1. Abstract: includes a one paragraph summary of what you set out to learn, and what you ended up finding. It should summarize the entire report.


  2. Introduction: includes a brief introduction about the data, a discussion of the question of interest:What properties of a home are related to its selling price on the market?A brief overview of your methodology used to examine the research question, a summary of the results of your study, and an outline of the remaining organization of the paper.


  3. Data Set: includes details about the variables in the data set, summary statistics, and visual tools to show the data (e.g. box plots, histograms, scatter graphs)Note:You can include your initial report for this section.


  4. Methodology and Results: includes testing if data meets the assumptions of regression (such as not correlated independent variables, linearity, and etc.), running the multiple regressions, using stepwise regression methodology to find the best model, providing inferences about the question of interest, and writing a detailed interpretation of the regression results (such as interpretation of the coefficients, ANOVA table, t tests, p-values, coefficient of determination, etc.) and discussion.


  5. Limitations of study and conclusion: includes describing any limitations of your study and how they might be overcome in future research and provide brief conclusions about the results of your study.

Answered Same DayJul 21, 2021

Answer To: he main goal of the final project is to apply the concepts that you have learned throughout the...

Suraj answered on Jul 22 2021
137 Votes
Assignment
Abstract: This paper will contain all the concepts that we learned till now. This paper will include the most important step of any kind of analysis that is to describe the data by using the descriptive statistics concepts and use appropriate visualizations to support the claims. Here, we are provided various kinds of data sets out of which we can choose anyone any apply all the statistical methods to make a report on the analysis of the particular selected data set. In this paper, as discussed in starting we will analyze the data set by using descriptive statistics techniques and explain its various terms like mean, median, standard deviation, standard error, kurtosis and skewness
of all the variables. After that we will make the distribution plots to describe the distribution of the variables. After doing this initial analysis the next step is to build the appropriate mode for the selected data set and use various techniques to check about the satisfaction of the assumptions. These assumptions are the basic assumptions of the regression analysis that is the assumption of linearity of the independent variables, the normality of the error term and the constant variance or we can say the homogeneity of the variance. For this we will use different kinds of plots and statistical tests. After all of this, we will apply backward elimination method of regression to build a perfect model to predict our dependent variable. We will provide a brief explanation about the backward elimination regression method in the next of the paper. After the model development, we will provide a detail conclusion part about all the analysis and in the last will give some limitations of this paper where we are failed. This is all about the overview of the paper and a complete report on the analysis of the data.
Introduction: Here, we are provided with 5 different types of data sets. We can choose any type of the data set. We will select the cars sales data set. This is the data about the sales of the different types of models of cars. Here the data set consist of many variables. The total number of variables in this data set are 14. Out of these 3 variables are categorical variables and rest the variables are quantitative variables. The level of measurement of the variables are nominal and ratio. The nominal scale is used for categorical variables and the ratio scale is used for the quantitative variables.
Here, we have some interesting questions regarding the data set. We will try to answer these questions throughout this paper. The questions are given as follows:
Question 1: What is the distribution of the different types of variables and how can we check this?
Question 2: How many independent variables are statistically significant for the dependent variables and whether the regression model will be significant model or not?
Question 3: How many independent variables follows assumption of linearity and how well the model perform to predict about the sales of the particular car.
For the first question we will use the descriptive statistics technique and for the remaining questions we will use the multiple regression analysis technique. In the last a brief explanation is provided about our findings.
Data set: In this section we will try to provide a brief explanation of the variables of the cars sales data set. The explanation is given as follows:
First, we will provide explanation about the dependent variable. Our dependent variable is Sales in thousands. This variable gives us the value of the sales of the particular model of the car in thousands.
Manufacturer: This variable defines about the name of the manufacturer. There are various manufacturers in the data set.
Model: This variable defines about the model of the car.
4-year resale value: This variable gives us the information about the resale value of the car after four years.
Vehicle type: This variable defines about the type of the vehicle that is whether it is a car or a passenger vehicle.
Price in thousands: This variable defines about the price of the particular car in thousands of dollars.
Engine size: This variable explains about the size of the engine of the vehicle.
Horsepower: This variable defines about the power of the car engine measures in horsepower.
Wheelbase: This variable explains about the wheelbase of the car.
Width: This variable defines about the maximum width of the particular car.
Length: This variable defines about the maximum length of the particular car.
Curb weight: This variable defines about the curb weight of the particular car.
Fuel capacity: This variable defines about the fuel capacity of the car that is the maximum limit that a particular car can stores fuel in it.
Fuel efficiency: This variable defines about the mileage of the car. That is the efficiency in 1 litter of petrol or diesel.
The summary statistics of the different variables is calculated in the following descriptive statistics table as follows:
To describe about the distribution of the variables we will make histograms of all the quantitative variables because histogram gives best description about the distribution of the variables.
In that way we can make histogram for all the quantitative variables. Here, the independent variable sales have positively skewed distribution because it has longer tail towards the right side. In same way, the next two histograms are also having positive skewed distribution. The next all the histograms have approximately symmetric distribution or we can say that variables are normally distributed.
Methodology:
In this section we will start our final analysis that the building multiple regression model. First of all, before building the regression model we will discuss some concepts those will used in this analysis. The most important part is the satisfaction of the assumptions. The first assumption is that the independent variables have linear relationship with the dependent variable. Thus, to check this assumption we will use the scatter plots because scatter plots provide very well results about the linearity of the variables.
The second assumption is that the error terms have normal distribution with mean and constant variance. This assumption is verified from the residual plots. When we plot the fitted values vs the residuals then if there is a kind of pattern of the residuals that is any u-shaped or any other kind of pattern then we conclude that the normality assumption of the residuals is violated and in same way for the homogeneity of variance. If there is any kind of funnel type pattern in the plot then the homogeneity assumption is violated.
Let’s make some scatter plots to check about the linearity assumptions.
Hence, from above all the scatter plots we see that thee is no perfect linear relationship between the dependent variables and the independent variables. Thus, there is violation of the assumption of the linearity. This violation of assumption will make impact on the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here