1 University at Buffalo, Industrial and Systems Engineering IE322 Analytics and Computing for Industrial Engineers Lab#3 Fall 2022 Machine Learning Practices (This is an individual lab)...

1 answer below »
..


1 University at Buffalo, Industrial and Systems Engineering IE322 Analytics and Computing for Industrial Engineers Lab#3 Fall 2022 Machine Learning Practices (This is an individual lab) Due 23:59 November 13th, 2022 Description: The dataset for this lab is tuition.csv, and it is available on UBlearns. The dataset has information about school tuition. The description of each variable is displayed in Table 1. Requirements: Draft a report to document your R code and results (or partial results if there are too many) in each step. Note that your report will be graded on both technical content (70%) and report quality (30%). Submit two files to UBLearns: 1) your report, and 2) your R script. Table 1 VARIABLES DESCRIPTION DATA TYPE tuition College tuition ("out-of-state" rate). continuous. pcttop25 Percent of new students from the top 25% of high school class. continuous. sf_ratio Student to faculty ratio. continuous. fac_comp Average faculty compensation. continuous. accrate Fraction of applicants accepted for admission. continuous. graduat Percent of students who graduate. continuous. pct_phd Percent of faculty with Ph.D.'s. continuous. fulltime Percent of undergraduates who are full time students. continuous. alumni Percent of alumni who donate. continuous. num_enrl Number of new students enrolled. continuous. public.private Is the college a public or private institution? public=0, private=1 discrete. Abdullah Fahad Abdullah Fahad Abdullah Fahad 2 1. Basic plotting (20 pts) Read the tuition.csv data into R console as D0. Using D0 for the following questions. a) Change the data type of “public.private” into a factor. b) Use ggplot to draw a scatter plot, where the x-axis is “num_enrl” and y-axis is “fac_comp”, each data point is distinguished by “public.private”. c) Based on b), add linear regression lines for public institutions and private institutions. Copy and paste the final plot to your report. 2. Feature selection (30 pts) Using D0 for the following questions. a) Build a full linear regression model, named it as full_model, where “tuition” is dependent variable, and the rest of variables are independent variables. Report the summary of this full model into your report. b) Based on the full model, perform forward feature selection to select top 3 key features. This selection is based on the p-value of inclusion (i.e., penter). Report the results to the report. c) Based on the full model, perform backward feature selection to select top 3 key features. This selection is based on the p-value of exclusion (i.e., prem). Report the results to the report. 3. KNN (50 pts) Using D0 to create a subset named as D1, where D1 only includes three features: “accrate”, “graduat”, “public.private”. Then, delete all missing values from D1, and overwrite D1. Hint: D1<- na.omit(d1). among all three features in d1, we consider independent variables are “accrate”, “graduat”, and target variable is “public.private”. use d1 for the following questions. a) use min-max normalization to normalize two independent variables “accrate”, “graduat”. this step is to eliminate the effect of different value range on the model. b) set the seed number as 123456. hint: set.seed(123456). this step is to make sure that you will get same model results every time you run the code. c) split the d1 into training set with 70% of the data, and test set with the remaining 30% of the data. d) build a knn model using the training set, and test the model performance using the test set. report the confusion matrix into your report. na.omit(d1).="" among="" all="" three="" features="" in="" d1,="" we="" consider="" independent="" variables="" are="" “accrate”,="" “graduat”,="" and="" target="" variable="" is="" “public.private”.="" use="" d1="" for="" the="" following="" questions.="" a)="" use="" min-max="" normalization="" to="" normalize="" two="" independent="" variables="" “accrate”,="" “graduat”.="" this="" step="" is="" to="" eliminate="" the="" effect="" of="" different="" value="" range="" on="" the="" model.="" b)="" set="" the="" seed="" number="" as="" 123456.="" hint:="" set.seed(123456).="" this="" step="" is="" to="" make="" sure="" that="" you="" will="" get="" same="" model="" results="" every="" time="" you="" run="" the="" code.="" c)="" split="" the="" d1="" into="" training="" set="" with="" 70%="" of="" the="" data,="" and="" test="" set="" with="" the="" remaining="" 30%="" of="" the="" data.="" d)="" build="" a="" knn="" model="" using="" the="" training="" set,="" and="" test="" the="" model="" performance="" using="" the="" test="" set.="" report="" the="" confusion="" matrix="" into="" your="">
Answered 2 days AfterNov 09, 2022

Answer To: 1 University at Buffalo, Industrial and Systems Engineering IE322 Analytics and Computing for...

Mukesh answered on Nov 12 2022
44 Votes
1. Basic plotting (20 pts)
a) Change the data type of “public.private” into a factor.
D0$public.private <- as.factor(D0$public.private)
b) Use ggplot to draw a scatter plot, where the x-axis is “num_enrl” and y-axis is “fac_comp”, each data point is distinguished by “public.private”.
c) Based on b), add linear regression lines for public institutions and private institutions. Copy and paste the final plot to your report.
2. Feature selection
(30 pts)
Using D0 for the following questions.
a) Build a full linear regression model, named it as full_model, where “tuition” is dependent variable, and the rest of variables are independent variables. Report the summary of this full model into your report.
Call:
lm(formula = tuition ~ ., data = D0)
Residuals:
Min 1Q Median 3Q Max
-11617.5 -1363.4 -21.1 1423.1 10327.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.524e+03 9.103e+02 -3.872 0.000117 ***
pcttop25 1.821e+01 5.538e+00 3.289 0.001052 **
sf_ratio -2.365e+01 1.278e+01 -1.850 0.064763 .
fac_comp 1.614e-01 9.480e-03 17.025< 2e-16 ***
accrate -1.441e+02 6.604e+02 -0.218 0.827349
graduat 3.916e+00 2.488e+00 1.574 0.115899
pct_phd 1.710e+00 1.020e+00 1.677 0.094018 .
fulltime 1.823e+00 3.408e+00 0.535 0.592963
alumni 6.408e+01 7.780e+00 8.236 7.47e-16 ***
num_enrl -5.190e-01 1.194e-01 -4.346 1.57e-05 ***
public.private1 4.502e+03 2.453e+02 18.350< 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2243 on 780 degrees of freedom
(465 observations deleted due to missingness)
Multiple R-squared: 0.7082,    Adjusted R-squared: 0.7045
F-statistic: 189.3 on 10 and 780 DF, p-value: < 2.2e-16
b) Based on the full model, perform forward feature selection to select top 3 key features. This selection is based on the p-value of inclusion (i.e., penter). Report the results to the report.
Variables Entered: + public.private + fac_comp + alumni
Final Model Output
------------------
Model Summary
-------------------------------------------------------------------
R 0.837 RMSE 2319.531
R-Squared 0.701 Coef. Var 23.131
Adj. R-Squared 0.700 MSE 5380223.248
Pred R-Squared 0.698 MAE 1723.183
-------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------------
Regression 11744107363.813 3 3914702454.604 727.61 0.0000
Residual 5019748290.349 933 5380223.248
Total 16763855654.162 936
--------------------------------------------------------------------------------
Parameter Estimates
------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
------------------------------------------------------------------------------------------------------
(Intercept) -4057.780 376.732 -10.771 0.000 -4797.120 -3318.441
public.private1 5289.465 186.703 0.594 28.331 0.000 4923.059 5655.872
fac_comp 0.168 0.007 0.486 25.756 0.000 0.155 0.180
alumni 77.027 6.958 0.231 11.071 0.000 63.372 90.681
------------------------------------------------------------------------------------------------------
Selection Summary ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here