Lab 3_Naive Bayes1.docx MIS 545 Lab 3: Naive Bayes Classifier Predicting Mushroom Types 1 Overview In this lab, we will apply Naive Bayes to a Mushroom dataset. You can find a Mushroom dataset under...

1 answer below »
, turn The data files Balance_Scale.csv and abalone.csv are provided at D2L > Labs> Lab 3.


Lab 3_Naive Bayes1.docx MIS 545 Lab 3: Naive Bayes Classifier Predicting Mushroom Types 1 Overview In this lab, we will apply Naive Bayes to a Mushroom dataset. You can find a Mushroom dataset under D2L > Labs > Lab 3, called Mushroom.csv. Save it in your working directory. In the Mushroom dataset, there are 8123 observations belonging to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. There are two types of mushroom in terms of edibility. If classes=e, the mushroom is edible, if classes=p, the mushroom is poisonous. We want to tell which mushrooms are edible from those poisonous by looking at some of their characteristics. Note: The original dataset is downloaded from University of California Irvine’s machine learning data repository. For more details, please go to https://archive.ics.uci.edu/ml/datasets/Mushroom. 2 Data Packages We will need to install e1071 package for this lab, which is a well-developed public package on CRAN. # install package “e1071” install.packages("e1071") # To use the package in an R session, we need to load it in an R session via library() library(e1071) 3 Preprocessing Save Mushroom.csv under your working directory. Different from a clean dataset in Mushroom.csv, the null value in mushroom dataset is denoted by question mark. Given so, we will slightly adjust our read.csv() function. # read in csv file mushroom.csv. Note the question mark represents null value mushroom <- read.csv('mushroom.csv',="" na.strings='?' )="" call="" function="" summary()="" to="" see="" how="" our="" data="" looks="" like="" in="" general.="" look="" at="" column="" stalk_root.="" it="" is="" the="" only="" one="" column="" includes="" nas.="" summary(mushroom)="" #="" check="" completion="" nrow(mushroom[!complete.cases(mushroom),])="" ##="" [1]="" 2480="" recall="" naive="" bayes="" is="" an="" algorithm="" depends="" on="" probability.="" to="" predict="" a="" conditional="" probability,="" we="" have="" to="" figure="" out="" the="" prior="" probability="" of="" each="" predictive="" variables.="" therefore,="" a="" dataset="" with="" null="" value="" will="" raise="" risk="" for="" our="" prediction.="" #="" we="" can="" retain="" observations="" that="" do="" not="" contain="" na(null)="" value="" mushroom="mushroom[complete.cases(mushroom),]" 4="" training="" and="" testing="" sets="" next,="" we="" will="" create="" train="" and="" test="" sets="" of="" the="" data.="" we="" will="" fit="" the="" model="" with="" the="" training="" set,="" and="" use="" the="" test="" set="" to="" evaluate="" the="" model.="" we="" will="" do="" a="" 70/30="" split="" (70%="" will="" be="" training="" data).="" #="" 70%="" of="" original="" data="" will="" be="" used="" for="" training="" sample_size=""><- floor(0.7="" *="" nrow(mushroom))="" #="" randomly="" select="" index="" of="" observations="" for="" training="" training_index=""><- sample(nrow(mushroom),="" size="sample_size," replace="FALSE)" train=""><- mushroom[training_index,]="" test=""><- mushroom[-training_index,]="" 5="" fitting="" and="" model="" performance="" there="" is="" a="" naive="" bayes="" classifier="" in="" the="" e1071="" package,="" loaded="" into="" our="" current="" session="" already="" via="" function="" library(e1071).="" fit="" the="" model="" to="" the="" training="" data.="" #="" note="" the="" period="" coming="" after="" tilde.="" it="" means="" all="" the="" other="" variables="" in="" that="" dataset="" will="" be="" predictive="" variable="" mushroom.model=""><- naivebayes(classes="" ~="" .="" ,="" data="train)" #="" we="" can="" explore="" the="" detail="" conditional="" probabilities="" for="" each="" variables="" by="" calling="" the="" object="" mushroom.model="" itself.="" mushroom.model="" after="" fitting,="" run="" the="" test="" data="" through="" the="" model="" to="" get="" the="" predicted="" class="" for="" each="" observation.="" #="" the="" result="" of="" prediction,="" a="" vector,="" will="" be="" attached="" to="" test="" set="" labelled="" as="" “class”.="" the="" return="" of="" prediction="" is="" a="" vector="" including="" predicted="" type="" of="" mushroom="" mushroom.predict=""><- predict(mushroom.model,="" test,="" type='class' )="" show="" the="" performance="" metrics="" of="" the="" model:="" #="" pick="" actual="" value="" and="" predicted="" value="" together="" in="" a="" dataframe="" called="" results="" results=""><- data.frame(actual="test[,'classes']," predicted="mushroom.predict)" #="" we="" can="" get="" a="" popular="" matrix="" called="" confusion="" matrix="" via="" function="" table="" to="" evaluate="" the="" performance="" of="" our="" prediction="" table(results)="" #="" columns="" indicate="" the="" number="" of="" mushrooms="" in="" actual="" type;="" likewise,="" rows="" indicate="" the="" number="" those="" in="" predicted="" type.="" #="" for="" example,="" we="" successfully="" predicted="" 1067="" mushroom="" as="" edible,="" and="" 580="" as="" poisonous.="" however,="" we="" mistake="" 46="" poisonous="" mushroom="" for="" edible.="" actual="" predicted="" e="" p="" e="" 1067="" 46="" p="" 1="" 580="" 1="" lab="" 3="" source.r="" #="" set="" working="" directory,="" please="" change="" the="" code="" below="" #="" according="" to="" your="" own="" situation="" setwd("input="" your="" working="" directory")="" #="" e1071="" and="" kalr="" if(!require(e1071)){="" install.packages("e1071")="" library(e1071)="" }="" if(!require(caret)){="" install.packages("caret")="" library(caret)="" }="" #="" read="" in="" dataset="" mushroom=""><- read.csv('./data/mushroom.csv'="" ,="" na.strings='?' )="" #="" total="" number="" of="" mushroom="" nrow(mushroom)="" ###="" 8123="" #="" number="" of="" mushroom="" with="" na="" value="" nrow(mushroom[!complete.cases(mushroom),])="" ###="" 2480="" ###="" we="" can="" delete="" observations="" with="" missing="" value="" mushroom="mushroom[complete.cases(mushroom),]" #="" data="" should="" be="" clean="" now="" summary(mushroom)="" ########################="" ###="" mushroom="" type="" ###="" ########################="" #="" types="" of="" mushrooms="" levels(mushroom$classes)="" #="" distribution="" of="" types="" summary(mushroom$classes)="" ###="" take="" 70%="" as="" training="" set="" sample_size=""><- floor(0.7="" *="" nrow(mushroom))="" ###="" randomly="" decide="" which="" ones="" are="" training="" data="" training_index=""><- sample(nrow(mushroom),="" size="sample_size," replace="FALSE)" train=""><- mushroom[training_index,]="" test=""><- mushroom[-training_index,]="" #="" take="" all="" explanatory="" variables="" to="" predict="" mushroom.model=""><- naivebayes(classes="" ~="" .="" ,="" data="train" )="" #="" details="" of="" model="" explain="" conditional="" probability="" mushroom.model="" #="" run="" the="" test="" data="" #="" the="" result="" of="" prediction,="" a="" vector,="" #="" will="" be="" attached="" to="" test="" set="" labelled="" as="" "class"="" mushroom.predict=""><- predict(mushroom.model="" ,="" test="" ,="" type='class' )="" #="" pick="" actual="" value="" and="" predicted="" value="" together="" in="" a="" dataframe="" called="" results="" results=""><- data.frame(predicted = mushroom.predict, actual = test[,'classes']) # we can get a popular matrix called confusion matrix via function table() # to evaluate the performance of our prediction table(results) mushroom.csv classes,cap_shape,cap_surface,cap_color,if_bruises,odor,gill_attachment,gill_spacing,gill_size,gill_color,stalk_shape,stalk_root,stalk_surface_above_ring,stalk_surface_below_ring,stalk_color_above_ring,stalk_color_below_ring,veil_type,veil_color,ring_number,ring_type,spore_print_color,population,habitat e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,n,g e,b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,n,m e,b,y,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,s,m p,x,y,w,t,p,f,c,n,p,e,e,s,s,w,w,p,w,o,p,k,v,g e,b,s,y,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,s,m e,x,y,y,t,l,f,c,b,g,e,c,s,s,w,w,p,w,o,p,n,n,g e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,s,m e,b,s,y,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,n,s,g p,x,y,w,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,v,u e,x,f,n,f,n,f,w,b,n,t,e,s,f,w,w,p,w,o,e,k,a,g e,s,f,g,f,n,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,y,u e,f,f,w,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g p,x,s,n,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,g p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,s,u p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,s,u e,b,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,m p,x,y,n,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,v,g e,b,y,y,t,l,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,m e,b,y,w,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,n,n,m e,b,s,w,t,l,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,s,m p,f,s,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,v,g e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m e,x,y,w,t,l,f,c data.frame(predicted="mushroom.predict," actual="test[,'classes'])" #="" we="" can="" get="" a="" popular="" matrix="" called="" confusion="" matrix="" via="" function="" table()="" #="" to="" evaluate="" the="" performance="" of="" our="" prediction="" table(results)="" mushroom.csv="" classes,cap_shape,cap_surface,cap_color,if_bruises,odor,gill_attachment,gill_spacing,gill_size,gill_color,stalk_shape,stalk_root,stalk_surface_above_ring,stalk_surface_below_ring,stalk_color_above_ring,stalk_color_below_ring,veil_type,veil_color,ring_number,ring_type,spore_print_color,population,habitat="" e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g="" e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m="" p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u="" e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g="" e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,n,g="" e,b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,n,m="" e,b,y,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,s,m="" p,x,y,w,t,p,f,c,n,p,e,e,s,s,w,w,p,w,o,p,k,v,g="" e,b,s,y,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,s,m="" e,x,y,y,t,l,f,c,b,g,e,c,s,s,w,w,p,w,o,p,n,n,g="" e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,s,m="" e,b,s,y,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,n,s,g="" p,x,y,w,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,v,u="" e,x,f,n,f,n,f,w,b,n,t,e,s,f,w,w,p,w,o,e,k,a,g="" e,s,f,g,f,n,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,y,u="" e,f,f,w,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g="" p,x,s,n,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,g="" p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,s,u="" p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,n,s,u="" e,b,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,m="" p,x,y,n,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,v,g="" e,b,y,y,t,l,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,s,m="" e,b,y,w,t,a,f,c,b,w,e,c,s,s,w,w,p,w,o,p,n,n,m="" e,b,s,w,t,l,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,s,m="" p,f,s,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,n,v,g="" e,x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m="">
Answered Same DayJul 25, 2021

Answer To: Lab 3_Naive Bayes1.docx MIS 545 Lab 3: Naive Bayes Classifier Predicting Mushroom Types 1 Overview...

Subhanbasha answered on Jul 26 2021
120 Votes
# Installing the required package
install.packages("e1071")
# Calling the package
library(e1071)

# Question 1
# Reading data into R
Balance_Scale <- read.csv('Balance_Scale.csv', na.strings = '?')
# Summary of the data
summary(Balance_Scale)
# Checking completion
nrow(Balance_Scale[!complete.cases(Balance_Scale),])
# We are taking 70% of original data as training data set
sample_size <- floor(0.7 * nrow(Balance_Scale))
#Randomly select index of observations for training
training_index <- sample(nrow(Balance_Scale), size = sample_size, replace = FALSE)
train <- Balance_Scale[training_index,]
test <-...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here