PMBA8051 Fundamentals of Statistical Analysis Fall 2019,Computer Project The computer project is to analyze the data stored in Retirement Funds , from case study at the end of chapter 15.The goal of...

1 answer below »


PMBA8051
Fundamentals of Statistical Analysis



Fall 2019,Computer Project



The computer project is to analyze the data stored inRetirement Funds, from case study at the end of chapter 15.The goal of our project is to use the data of 407 funds, and build a multiple regression model to predict the 3-year returns and to prepare a written report to present the results of your analysis. You can work alone, or you can work in groups of up to three people. Each group is to submit one report. The report will be due onTuesday, Octobeer 22, 2019in an attached
PDF
file mailed to me:[email protected]. You’re required to enter part of the results based on the following Steps 3, 4, and 5 in MyStatLab, an assignment titled “Retirement Funds” to ensure you’re on the right track with the model analysis.



Your final report should be no more than eight pages in length. The report should begin with an executive summary of one to three paragraphs. This summary – which is the last item written – should identify the problem, indicate your approach to solving it, and concisely state your conclusion.



The body of your report should indicate how you developed your conclusion. Begin with a concise statement of the regression objective from the business perspective. Next, use your knowledge of the dependent variable and predictor variables to formulate a model. You may need to identify several possible models before finalizing a fitted model which best serves your objective. You may follow the procedure given below when you explore the data.



1.Download your data from StatCrunth of MyStatLab fromRetirement Fundsof chapter 15. Format your data to include following variables:


Dependent:


Independent:









Things to be included in the report


i)Print a sample of your data set, say the 1st20 funds.


ii)Give a brief description of each of the above variables



2.Begin your study with a graphical investigation of the nature of the relationship between the dependent variable and each of the quantitative predictor variables. You can use scatterplots. Comment on the possible form of relationship (e.g., first-order or second-order) between the dependent variable and each quantitative predictor variable based on the graph.


Things to be included in the report


i)Print the Scatterplotsbetween the dependent variable and each of the quantitative predictor variables


ii)Give a visual assessment about the possible 2ndorder relationship.



3.The initial model to consider is a first-order model which includes all seven predictor variables. Use statistical techniques learned in this class to analyze the model.


Things to be included in the report


i)Print the output of the regression analysis.


ii)Write the regression equation and perform some basic analysis with respect to its usefulness.



4.Next you may want to try models that include the second-order terms of the quantitative predictor variables without interaction.Include all seven independent variables and one second-order term at a time.


Things to be included in the report


i)Identify a list ofquantitative predictor variables such that the second-order term is significant.



5.To investigate the effect of interaction, add a two-way interaction term (such as, one at a time, between two predictor variables, to the model. Try all the possible two-way interaction terms among the seven predictor variables to see if any two-way interaction is significant. (Note: The total number of interaction models you should run is theCombinationof 7 taken out 2, that is,models)


Things to be included in the report


i)Identify a list of interaction terms that are significant and the reasons why.



6.Build your 1stmultiple regression model by using all independent variables, all second-order terms identified in Step 4, and all interaction terms identified in step 5.


Things to be included in the report


i)Print the output of the regression analysis.


ii)Write the regression equation and perform some basic analysis with respect to its usefulness.



7.Based on the analysis in Step 6, find the best regression model for predicting the 3-year return.


Things to be included in the report


i)Print the output of the regression analysis.


ii)Write the regression equation and perform some basic analysis with respect to its usefulness



8.Perform a thorough Residual Analysis on the model from Step 7 to verify the four regression assumptions.


Things to be included in the report


i)Print all of residual analysis related plots


ii)Comment on whether the model assumptions hold.



In your report, you do not need to include full details of all the regression work that you tried. But you should list things that were attempted. Your report should provide enough details to justify your final selection of the best model and to show the major steps that lead to your decision.

Answered Same DayOct 14, 2021

Answer To: PMBA8051 Fundamentals of Statistical Analysis Fall 2019,Computer Project The computer project is to...

Pritam answered on Oct 21 2021
135 Votes
---
title: "Untitled"
author: "Untitled"
date: "21 October 2019"
output:
word_document: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
# Executive summary:
The main objective of this project is to analyze four-hundred and seven funds to build a m
ultiple linear regression model through which one could predict three years returns. The data set is enriched in terms of important attributes from the business analytics perspective. The data set consists of three different sizes of market namely large, medium and small capital market. Though for the model building we have reduced the dimension of the data to include only some important attributes like assets, turnover ratio, Sharpe ratio, Expense ratio, the type of growth of the fund, the risk of the fund. Apart from that since the regression model has been used to predict the three years returns, a few more attributes or the predictors for the model have been diminished from the model to make the model less complex and hence an accurate (as much as possible).
The backward stepwise regression has been used at each step to get the next best model and hence the beginning of the analysis starts with a full model and also to capture the effect of joint effect of some attributes, the interaction terms have been used in the model. Also, to capture the non-linear effect of some predictors to the response variable, the polynomial of those variables have been taken into consideration. Thus the best regression model contains three independent variables and a few interaction terms. At each point, the assumptions have been checked along with different statistical methods to ensure the accuracy and best model. The model evaluation part has been done with more concern since this type of predicting model needs the accuracy relying upon which important decisions are taken.
# A brief summary of the full data:
```{r, message=FALSE, warning=FALSE}
library(xlsx)
d1 = read.xlsx("fund.xlsx", sheetIndex = 1)
summary(d1)
```
Since we are not interested in working with the full data to build the regression model, only a few selective attributes have been taken to be included in the report. So, a brief summary is again shown of the reduced data set for a clearer picture.
```{r, message=FALSE, warning=FALSE}
d2 = d1[,c(3:8, 10, 13)]
d2$n1 = factor( with(d2, ifelse(( Type == "Growth" ), 1 , 0)))
d2$n2 = factor( with(d2, ifelse(( Risk == "High" ), 1 , 0)))
d3 = d2[,c(3:10)]
colnames(d3) = c("assets","turnover_ratio","SD","sharpe_ratio","3_yr_return",
"expense_ratio","fund_dummy","risk_dummy")
```
# Q1. i)
# A sample of the first 20 funds:
```{r}
samp = head(d3, 20)
samp
```
# ii)
# Brief description for each variable:
```{r, message=FALSE, warning=FALSE}
library(tidyverse)
summary(d3)
```
From the summary table, one can see that there are six numerical variables and two categorical variables. The dependent variable is the three years returns which is also a numerical variable and hence a linear regression model seems to be a valid choice for the prediction. One can see that there are...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here