# Using the wbca data in the faraway package. Fit a binary logistic regression model with Class as the response and the other variables as predictors. Find the best logistic regression model from it...

• Using the wbca data in the faraway package. Fit a binary logistic regression model with Class as the response and the other variables as predictors. Find the best logistic regression model from it considering all the logistic regression assumptions and validate it.

• Comment on the model deviance and tests for the coefficients.

• Attempt model selection using the step function and comment on any reduction that takes place.(HINT THIS IS SIMLAR TO USING STEP with lm).Briefly discuss each and every step on this part. Finally find the best model from it

Answered 3 days AfterJan 21, 2023

## Answer To: Using the wbca data in the faraway package. Fit a binary logistic regression model with Class as the...

Mukesh answered on Jan 21 2023
#install.packages("faraway")
library(faraway)
## Warning: package 'faraway' was built under R version 4.2.2
data("wbca")
#Data summary
summary(wbca)
## Min. :0.0000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.:0.0000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000
## Median :1.0000 Median : 1.000 Median : 1.000 Median : 3.000
## Mean :0.6505 Mean : 2.816 Mean : 3.542 Mean : 3.433
## 3rd Qu.:1.0000 3rd Qu.
: 4.000 3rd Qu.: 6.000 3rd Qu.: 5.000
## Max. :1.0000 Max. :10.000 Max. :10.000 Max. :10.000
## Epith Mitos NNucl Thick
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 2.000 Median : 1.000 Median : 1.000 Median : 4.000
## Mean : 3.231 Mean : 1.604 Mean : 2.859 Mean : 4.436
## 3rd Qu.: 4.000 3rd Qu.: 1.000 3rd Qu.: 4.000 3rd Qu.: 6.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## UShap USize
## Min. : 1.000 Min. : 1.00
## 1st Qu.: 1.000 1st Qu.: 1.00
## Median : 1.000 Median : 1.00
## Mean : 3.204 Mean : 3.14
## 3rd Qu.: 5.000 3rd Qu.: 5.00
## Max. :10.000 Max. :10.00
colnames(wbca)
## [1] "Class" "Adhes" "BNucl" "Chrom" "Epith" "Mitos" "NNucl" "Thick" "UShap"
## [10] "USize"
#Converting class into factor
wbca\$Class=as.factor(wbca\$Class)
summary(wbca)
## Class Adhes BNucl Chrom Epith
## 0:238 Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1:443 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.: 2.000
## Median : 1.000 Median : 1.000 Median : 3.000 Median : 2.000
## Mean : 2.816 Mean : 3.542 Mean : 3.433 Mean : 3.231
## 3rd Qu.: 4.000 3rd Qu.: 6.000 3rd Qu.: 5.000 3rd Qu.: 4.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## Mitos NNucl Thick UShap
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. : 1.000
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.: 1.000
## Median : 1.000 Median : 1.000 Median : 4.000 Median : 1.000
## Mean : 1.604 Mean : 2.859 Mean : 4.436 Mean : 3.204
## 3rd Qu.: 1.000 3rd Qu.: 4.000 3rd Qu.: 6.000 3rd Qu.: 5.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
## USize
## Min. : 1.00
## 1st Qu.: 1.00
## Median : 1.00
## Mean : 3.14
## 3rd Qu.: 5.00
## Max. :10.00
View(wbca)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.2
library(GGally)
## Warning: package 'GGally' was built under R version 4.2.2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:faraway':
##
## happy
ggpairs(wbca, aes(colour = "class", alpha = 0.4))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#install.packages("scatterPlotMatrix")
library(scatterPlotMatrix)
## Warning: package 'scatterPlotMatrix' was built under R version 4.2.2
scatterPlotMatrix(wbca, zAxisDim = "Class")
#---------------------------------------------------------------------------
#Checking distribution of dependent variable
ggplot(wbca, aes(factor(Class),
fill = factor(Class))) + geom_bar()
Class Adhes BNucl Chrom Epith Mitos NNucl Thick
C
lass
BN
uclC
hrom
Epith
M
itos
N
N
ucl
Thick
Class Adhes BNucl Chrom Epith Mitos NNucl Thick
C
la
ss
he
s
BN
uc
l
C
hr
om
Ep
ith
M
ito
s
N
N
uc
l
Th
ic
k
NA NA NA NA NA NA NA
0100
200300
400
2 4 6 8 10
24
68
10
24
68
10
24
68
10
24
68
10
24
68
10
24
68
10
0 12
46
810
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
summary(wbca\$Class)
## 0 1
## 238 443
Step 2: Train and test split
set.seed(1)
#Use 70% of dataset as training set and remaining 30% as testing set
sample <- sample(c(TRUE, FALSE), nrow(wbca), replace=TRUE, prob=c(0.7,0.3))
train <- wbca[sample, ]
test <- wbca[!sample, ]
Step 3 Fit the logistic regression model
full_model = glm(Class ~ ., data = train, family = "binomial")
summary(full_model)
##
## Call:
## glm(formula = Class ~ ., family = "binomial", data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.57842 -0.01493 0.06474 0.11186 2.67502
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 10.41091 1.52178 6.841 7.85e-12 ***
## Adhes -0.12851 0.18290 -0.703 0.48230
## BNucl ...
SOLUTION.PDF