Assessment 1: Naive Bayes classifier and Discriminant Analysis Issued: of Week 1 3 Weight: 30 % Maximum score: Marks Overview During this assessment you will insert R code and...

1 answer below »
Please see attached PDF outlining how to complete assignment. Excel (csv) files contain required data. Please use Markdown.


Assessment 1: Naive Bayes classifier and Discriminant Analysis Issued: of Week 1 3 Weight: 30 % Maximum score: Marks Overview During this assessment you will insert R code and written discussions with justifications to this template file. This assessment implements and explores techniques mainly covered in Week 1 and Week 2. The assessment is segmented into three tasks (1) Comparison of classifiers; (2) Application of a classifier; and (3) Implementation of classifiers. The purpose of the assignment is to enable you to: Code and comment R scripts Implement sub-setting, Bayes classifiers and Discrimina Analysis in RStudio Compare classification algorithms Visually present predictions of classifiers in RStudio Learning outcomes Related subject learning outcomes: 1. Evaluate, synthesise and apply classic supervised data mining methods for pattern classification. 2. Effectively integrate, execute and apply the studied concepts, algorithms, and techniques to real datasets using the computer language R and the software environment RStudio. 3. Communicate data concepts and methodologies of data science Background Real-world application of classifiers may require that the predictors used for classification be physically measured and, hence, the inclusion of unnecessary predictors may incur additional costs associated with sensors, instruments and computing. It should be noted that some variables may even require human intervention and/or expensive laboratory analyses in order to be measured. It is important that analysts try to use as few predictors as possible, that is, the smallest set of predictors that are relevant for the classification task in hand and yet sufficient to provide satisfactory classification performance. Selecting predictors is an important task called feature selection in data mining Assessment submission: Your submission should include: A PDF/html file that clearly shows the assignment question, the associated answers, any relevant R outputs, analyses and discussions. The assignment should not exceed 8-A4 pages. Appendices do not form part of the page limit. The assignment must be presented in 12 font on A4 pages using single line spacing. The task cover sheet. Upload all submission files in one go. You can upload the assessment up to 3 times, however, only the last submission is graded. A word on plagiarism Plagiarism is the act of using another's words, works or ideas from any source as one's own. Plagiarism has no place in a University. Student work containing plagiarised material will be subject to formal university processes . Glenn Fulford Glenn Fulford 30 marks Assessment Task 1: Comparison of classifiers In this task compare the performance of the supervised learning algorithms Linear Discriminant Analysis, Quadratic Discriminant Analysis and the Naïve Bayes Classifier using a publicly available Blood Pressure Data. The data to be used for this task is provided in the HBblood.csv file in the Assessment 1 folder. The HBblood.csv dataset contains values of the percent HbA1c (a measure of the amount of glucose and haemoglobin joined together in blood) and systolic blood pressure (SBP) (in mm/Hg) for 1,200 clinically healthy female patients within the ages 60 to 70 years. Additionally, the ethnicity, Ethno, for each patient was recorded and into three groups, A, B or C, for analysis. 1. Discuss and justify which of the supervised learning algorithms (i.e. Linear Discriminant Analysis, Quadratic Discriminant Analysis and the Naïve Bayes Classifier) would you choose for predicting the response Ethno using HbA1c and SBP as the feature variables. Provide any plots/images needed to support your discussion. Hint: Base your answer on the empirical properties of the data . Task 2 on next page Assessment Task 2: Application of a classifier Randomly split the dataset into a training subset and a test subset containing 80% and 20%of the data. Provide your R-code . a classifi to classify into Implement classifier Question 2 the training data subset Question 1. Interpret and discuss the relationships between the predictors and response variables. Task 3 on next page https://archive.ics.uci.edu/ml/datasets/Mushroom Glenn Fulford Marking Criteria and Rubric: MA5810 Assessment 1 Criterion High Distinction Distinction Credit Pass Fail R code (20%) Code submitted Code works correctly, meets the specifications, produces the correct results and displays them correctly. Code is exceptionally well organised and very easy to follow. Code always very well commented so the purpose of each block of code readily understood and what question part it corresponds to. Variable names give the purpose of the variable. Code submitted Code works correctly, meets the specifications, and produces correct results but may not display all of it correctly. Code is clean, understandable and well- organised, with just some minor errors. Code is well commented so that there is very little ambiguity of the code purpose. One or two places could benefit from comments, or the code is overly commented. Variable names clearly describe the purpose of the variable. Code submitted Code mostly works correctly, but functions incorrectly on some inputs. Minor details of the specification are violated. Code is fairly easy to read, although contains at least one major issue that detracts from clarity. The comments leave some code block ambiguous as to the purpose. One or two places could benefit from comments, or the code is overly commented. Variable names do not describe the purpose of the variable Code only provided in answer document but looks correct Code often exhibits incorrect behaviour. Significant details of specification are violated. Code contains more than one major issue that makes it difficult to read. The code is readable only by someone who already knows what it is supposed to be doing. Comments not sufficient to see what the code is doing. Significant lack of comments makes it difficult to understand code. Code not submitted Code not provided in answer document. Code produces incorrect results, does not compile, or significant errors occur. Code is poorly organised and very difficult to read. Code has no comments. Methodology (40%) The methodology implemented is expertly documented and justified. The methodology implemented reflects a sophisticated and nuanced understanding of relevant concepts. All assumptions validated and communicated concisely. The methodology implemented is clearly documented and justified. The methodology reflects a highly developed understanding of relevant concepts. Most assumptions validated and communicated clearly The methodology implemented is described, and may contain minor errors, or lacking clearly stated justification. The methodology is mostly appropriate, but some elements could be improved. Some assumptions validated. The methodology implemented is stated, but too general, and/or not justified. Some elements are satisfactory, but most elements need improving. Some model assumptions stated. The methodology implemented is not clearly stated or justified. Very few elements of the methodology are appropriate. Interpretation (40%) Interpretation is comprehensive and persuasive. Interpretation is accurate, comprehensive, and highly detailed. Few inferences or unjustified positions presented. Interpretation is accurate, and for the most part persuasive. Some inferences or unjustified positions presented. Interpretation is adequate in most places, and/or more detail is required in places. Interpretation is satisfactory in places but lacks sufficient and accurate interpretation. Many inferences or unjustified positions presented. Interpretation is lacking in multiple components. Major points may be stated but are often unsubstantiated. Assessment 1: Naive Bayes classifier and Discriminant Analysis Overview The purpose of the assignment is to enable you to: Learning outcomes Background Assessment submission: A word on plagiarism Assessment Task 1: Comparison of classifiers Assessment Task 2: Application of a classifier Assessment Task 3: Implementation of classifiers
Answered 14 days AfterNov 06, 2022

Answer To: Assessment 1: Naive Bayes classifier and Discriminant Analysis Issued: of Week 1 3 Weight:...

Mukesh answered on Nov 14 2022
40 Votes
Assessment 1: Naive Bayes classifier and Discriminant Analysis
Issued: Sunday of Week 1
(
30

marks
)Due: 11:59 PM AEST Sunday of Week 3 Weight: 30 %
Maximum score: 50 Marks
Overview
During this assessment you will insert R code and written discussions with justifications to this template file. This assessment implements and explores techniques mainly covered in Week 1 and Week 2. The assessment is segmented into three tasks (1) Comparison of classifiers; (2) Application of a classifier; and (3) Implementation
of classifiers.
The purpose of the assignment is to enable you to:
1. Code and comment R scripts
2. Implement sub-setting, Bayes classifiers and Discriminant Analysis in RStudio
3. Compare classification algorithms
4. Visually present predictions of classifiers in RStudio
Learning outcomes
Related subject learning outcomes:
1. Evaluate, synthesise and apply classic supervised data mining methods for pattern classification.
2. Effectively integrate, execute and apply the studied concepts, algorithms, and techniques to real datasets using the computer language R and the software environment RStudio.
3. Communicate data concepts and methodologies of data science
Background
Real-world application of classifiers may require that the predictors used for classification be physically measured and, hence, the inclusion of unnecessary predictors may incur additional costs associated with sensors, instruments and computing. It should be noted that some variables may even require human intervention and/or expensive laboratory analyses in order to be measured.
It is important that analysts try to use as few predictors as possible, that is, the smallest set of predictors that are relevant for the classification task in hand and yet sufficient to provide satisfactory classification performance. Selecting predictors is an important task called feature selection in data mining
Assessment submission:
Your submission should include:
· An output of the PDF/html file that clearly shows the assignment question, the associated answers, any relevant R outputs, analyses and discussions.
· The R−script (code) file as evidence.
· The assignment should not exceed 8-A4 pages. Appendices do not form part of the page limit. The assignment must be presented in 12 font on A4 pages using single line spacing.
· The task cover sheet.
· Note that RMarkdown is not required for this assessment but highly recommended.
Upload all submission files in one go. You can upload the assessment up to 3 times, however, only the last submission is graded.
A word on plagiarism
Plagiarism is the act of using another's words, works or ideas from any source as one's own. Plagiarism has no place in a University. Student work containing plagiarised material will be subject to formal university processes in line with procedure described in the subject outline.
.
Assessment Task 1: Comparison of classifiers
Marks − 10
In this task compare the performance of the supervised learning algorithms Linear Discriminant Analysis, Quadratic Discriminant Analysis and the Naïve Bayes Classifier using a publicly available Blood Pressure Data. The data to be used for this task is provided in the HBblood.csv file in the Assessment 1 folder.
The HBblood.csv dataset contains values of the percent HbA1c (a measure of the amount of glucose and haemoglobin joined together in blood) and systolic blood pressure (SBP) (in mm/Hg) for 1,200 clinically healthy female patients within the ages 60 to 70 years. Additionally, the ethnicity, Ethno, for each patient was recorded and categorized into three groups, A, B or C, for analysis.
1.        Discuss and justify which of the supervised learning algorithms (i.e. Linear Discriminant Analysis, Quadratic Discriminant Analysis and the Naïve Bayes Classifier) would you choose for predicting the response Ethno using HbA1c, and SBP as the feature variables. Provide any plots/images needed to support your discussion.
Hint: Base your answer on the empirical statistical properties of the data in relation to model assumptions.
Solution:
In this task compare the performance of the supervised learning algorithms Linear Discriminant Analysis, Quadratic Discriminant Analysis and the Naïve Bayes Classifier using a publicly available Blood Pressure Data.
Data pre processing
data= read.csv(file.choose(), header = T)
data$Ethno <- as.factor(data$Ethno)
str(data)
# Loading package
library(e1071)
library(caTools)
library(caret)
# Splitting data into train
# and test data
split <- sample.split(data, SplitRatio = 0.7)
train_cl <- subset(data, split == "TRUE")
test_cl <- subset(data, split == "FALSE")
# Feature Scaling
train_scale <- scale(train_cl[, 2:3])
test_scale <- scale(test_cl[, 2:3])
train_y = train_cl$Ethno
test_y = test_cl$Ethno
First, we will perform linear discriminant analysis and check model performance.
#--------------------------------------------------------------------
#Linear discriminant analysis - LDA
#--------------------------------------------------------------------
library(MASS)
model <- lda(Ethno~., data = train_cl)
model
#--------------------------------------------------------------------
Call:
lda(Ethno ~ ., data = train_cl)
Prior probabilities of groups:
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here