3/6/22, 2:27 AM Assignment 4: Prediction https://wssu.instructure.com/courses/19153/assignments/325905?module_item_id= XXXXXXXXXX/2 Assignment 4: Prediction Due Tuesday by 11:59pm Points 80 Submitting...

1 answer below »
I need help to do the assignment.


3/6/22, 2:27 AM Assignment 4: Prediction https://wssu.instructure.com/courses/19153/assignments/325905?module_item_id=1036267 1/2 Assignment 4: Prediction Due Tuesday by 11:59pm Points 80 Submitting a file upload Start Assignment I Do the following experiments: 1. Run Weka's Naive Bayes on the original loan dataset (Loan_original.arff), 2-bin and 3- bin discretized data using training set test option.. Examine the error rate and identify the wrongly classified instances. To do the latter, right click on the current line in the result list window and then select "Visualize classifier errors". The wrongly classified instance will show in the plot in small squares. Click on an instance to get its information. Answer the following questions: a) How is the model represented? What do the counts mean and why are they incremented by 1 (i.e. actual value count+1)? b) What actually happens when you test the classifier on the training set? c) How do errors occur? What could be the reasons? d) How does the error rate change over the three different data sets (original, 2-bin, 3-bin)? Any guess why? 2. Run Weka's IBk algorithm on the original loan data (no discretization) with the "Use training set" test option and examine the evaluation results (correctly/incorrectly classified instances). Vary KNN (the number of neighbors) with and without distance weighting. Try for example KNN=1,3,5,20 without distance weighting and with weight = 1/distance. Compare the results and find explanations. Answer the following questions by looking at what the algorithm does for each instance from the test set (which in this case is also a training set). Find conceptual level explanations, no need to go into computing distances: e) What actually happens when you test the classifier on the training set? f) How do errors occur? What could be the reasons? g) How do the error rate change with the KNN parameter in IBk? 3. Decide on the application of a new customer. 3/6/22, 2:27 AM Assignment 4: Prediction https://wssu.instructure.com/courses/19153/assignments/325905?module_item_id=1036267 2/2 Prepare a test set using the information of the new customer described in the Assignment in 3: A customer applying for a 30 month loan with 80,000 yen monthly pay to buy a car from a customer with the following data: male, employed, 22 years old, not married, does not live in a problematic area, has worked 1 year for his last employer and has 500,000 yen in a bank. For more information use Handout Week 7 (https://wssu.instructure.com/courses/19153/files/2875098/download?download_frd=1) “Using Weka 3 for classification and prediction”. Run Naive Bayes and IBk with different parameters for KNN and distance weighting, all with "Supplied test set" test option. Compare the prediction results obtained with different algorithms. Decide on the loan application of the new customer by using the outputs from the prediction algorithms. II Write a report on the prediction experiments described above. Include the following information (DO NOT include data sets or classifier outputs): The original 7 questions (4 about Bayes and 3 about IBK) with short answers to EACH ONE. ONE Naive Bayes model (any version of the loan data set) with explanations of its parameters (the answer to #1 (a) may be included here). Results from predicting the new customer's classification (Experiments with Prediction, #3) with short comment. https://wssu.instructure.com/courses/19153/files/2875098?wrap=1 https://wssu.instructure.com/courses/19153/files/2875098/download?download_frd=1 Chapter 4 Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4, Algorithms: the basic methods of Data Mining by I. H. Witten, E. Frank, M. A. Hall and C. J. Pal 2 Algorithms: The basic methods • Simple probabilistic modeling • Linear models • Instance-based learning 3 Can combine probabilities using Bayes’s rule • Famous rule from probability theory due to • Probability of an event H given observed evidence E: • A priori probability of H : • Probability of event before evidence is seen • A posteriori probability of H : • Probability of event after evidence is seen Thomas Bayes Born: 1702 in London, England Died: 1761 in Tunbridge Wells, Kent, England P(H |E)= P(E |H)P(H) /P(E) P(H ) P(H |E) 4 Naïve Bayes for classification • Classification learning: what is the probability of the class given an instance? • Evidence E = instance’s non-class attribute values • Event H = class value of instance • Naïve assumption: evidence splits into parts (i.e., attributes) that are conditionally independent • This means, given n attributes, we can write Bayes’ rule using a product of per-attribute probabilities: P(H |E)= P(E1 |H)P(E3 |H)… P(En |H)P(H) /P(E) 5 Weather data example ?TrueHighCoolSunny PlayWindyHumidityTemp.Outlook Evidence E Probability of class “yes” P(yes | E)= P(Outlook = Sunny | yes) P(Temperature =Cool | yes) P(Humidity = High | yes) P(Windy = True | yes) P(yes) / P(E) = 2 / 9´3 / 9´3 / 9´3 / 9´9 /14 P(E) 6 The “zero-frequency problem” • What if an attribute value does not occur with every class value? (e.g., “Humidity = high” for class “yes”) • Probability will be zero: • A posteriori probability will also be zero: (Regardless of how likely the other values are!) • Remedy: add 1 to the count for every attribute value- class combination (Laplace estimator) • Result: probabilities will never be zero • Additional advantage: stabilizes probability estimates computed from small samples of data P(Humidity =High | yes)= 0 P(yes |E)= 0 7 Modified probability estimates • In some cases adding a constant different from 1 might be more appropriate • Example: attribute outlook for class yes • Weights don’t need to be equal (but they must sum to 1) Sunny Overcast Rainy 8 Missing values • Training: instance is not included in frequency count for attribute value-class combination • Classification: attribute will be omitted from calculation • Example: ?TrueHighCool? PlayWindyHumidityTemp.Outlook Likelihood of “yes” = 3/9  3/9  3/9  9/14 = 0.0238 Likelihood of “no” = 1/5  4/5  3/5  5/14 = 0.0343 P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41% P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59% 9 Numeric attributes • Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) • The probability density function for the normal distribution is defined by two parameters: • Sample mean • Standard deviation • Then the density function f(x) is 10 Statistics for weather data • Example density value: 5/ 14 5 No 9/ 14 9 Yes Play 3/5 2/5 3 2 No 3/9 6/9 3 6 Yes True False True False Windy  =9.7  =86 95, … 90, 91, 70, 85, NoYesNoYesNoYes  =10.2  =79 80, … 70, 75, 65, 70, Humidity  =7.9  =75 85, … 72,80, 65,71,  =6.2  =73 72, … 69, 70, 64, 68, 2/53/9Rainy Temperature 0/54/9Overcast 3/52/9Sunny 23Rainy 04Overcast 32Sunny Outlook 11 Classifying a new day • A new day: • Missing values during training are not included in calculation of mean and standard deviation ?true9066Sunny PlayWindyHumidityTemp.Outlook Likelihood of “yes” = 2/9  0.0340  0.0221  3/9  9/14 = 0.000036 Likelihood of “no” = 3/5  0.0221  0.0381  3/5  5/14 = 0.000108 P(“yes”) = 0.000036 / (0.000036 + 0. 000108) = 25% P(“no”) = 0.000108 / (0.000036 + 0. 000108) = 75% 12 Probability densities • Probability densities f(x) can be greater than 1; hence, they are not probabilities • However, they must integrate to 1: the area under the probability density curve must be 1 • Approximate relationship between probability and probability density can be stated as assuming ε is sufficiently small • When computing likelihoods, we can treat densities just like probabilities P(x-e / 2 £ X £ x+e / 2) »e f (x) 13 Multinomial naïve Bayes I • Version of naïve Bayes used for document classification using bag of words model • n1,n2, ... , nk: number of times word i occurs in the document • P1,P2, ... , Pk: probability of obtaining word i when sampling from documents in class H • Probability of observing a particular document E given probabilities class H (based on multinomial distribution): • Note that this expression ignores the probability of generating a document of the right length • This probability is assumed to be constant for all classes 14 Multinomial naïve Bayes II • Suppose dictionary has two words, yellow and blue • Suppose P(yellow | H) = 75% and P(blue | H) = 25% • Suppose E is the document “blue yellow blue” • Probability of observing document: Suppose there is another class H' that has P(yellow | H’) = 10% and P(blue| H’) = 90%: • Need to take prior probability of class into account to make the final classification using Bayes’ rule • Factorials do not actually need to be computed: they drop out • Underflows can be prevented by using logarithms P({blue yellowblue} |H )= 3!´ 0.751 1! ´ 0.252 2! = 27 64 P({blue yellowblue} |H )= 3!´ 0.11 1! ´ 0.92 2! = 243 1000 15 Naïve Bayes: discussion • Naïve Bayes works surprisingly well even if independence assumption is clearly violated • Why? Because classification does not require accurate probability estimates as long as maximum probability is assigned to the correct class • However: adding too many redundant attributes will cause problems (e.g., identical attributes) • Note also: many numeric attributes are not normally distributed (kernel density estimators can be used instead) 16 Classification • Any regression technique can be used for classification • Training: perform a regression for each class, setting the output to 1 for training instances that belong to class, and 0 for those that don’t • Prediction: predict class corresponding to model with largest output value (membership value) • For linear regression this method is also known as multi- response linear regression • Problem: membership values are not in the [0,1] range, so they cannot be considered proper probability estimates • In practice, they are often simply clipped into the [0,1] range and normalized to sum to 1 17 Linear models: logistic regression • Can we do better than using linear regression for classification? • Yes, we can, by applying logistic regression • Logistic regression builds a linear model for a transformed target variable • Assume we have two classes • Logistic regression replaces the target by this target • This logit transformation maps [0,1] to (- , + ), i.e., the new target values are no longer restricted to the
Answered 2 days AfterMar 06, 2022

Answer To: 3/6/22, 2:27 AM Assignment 4: Prediction...

Mohd answered on Mar 08 2022
102 Votes
Answer the following questions:
a) How is the model represented? What do the counts mean and why are they incremented by 1
(i.e., actual value count+1 )?
In naïve Bayes classification, model that impose class value or label to vector of features. On the contrary response variable labels are from a fixed set of values. Generally, we use count to address zero frequency problem. We increment the count by 1 because it eliminated the possibility of zero probability. Also stabilizes probability estimates.
b) What actually happens when you test the classifier on the training set?
If we test the classifier on the training set than there is good chance of training result overly optimistic. There would be minimal chance of error in model evaluation.
c) How do errors occur? What could be the rea
sons?
In original loan dataset, correctly classified instances are 100 percent or we can say 0 percent error rate. In two bin dataset, correctly classified instances are 85 percent. That means, there are 15 percent instances are incorrectly classified. In three bin dataset, correctly classified instances are 95 percent. That means, there are 5 percent instances are incorrectly classified. As we have increased number of bins from 2 to 3 accuracy has increased by 10 percent. That means more levels in attributes could lead to higher accuracy. There are various ways to deal with the assessment of the Bayes error rate exist. One strategy looks to get insightful limits which are innately reliant upon circulation boundaries, and thus challenging to appraise. Another methodology centers around class densities, while one more strategy joins and analyses different classifiers.
d) How does the error rate change over the three different data sets (original, 2-bin, 3-bin)? Any guess why?
The error rate is zero over the loan original dataset.
The error rate is 15 percent over the 2-bin dataset.
The error rate is 5 percent over the 3-bin dataset.
In Loan original dataset we have numeric attributes and nominal attributes, as we have broad categories values of attributes. In 2 bin we have narrowed down all values into two categories. As we have increased bin in 3 bin error rates has reduced by 10 percent.
e) What actually happens when you test the classifier on the training set?
If we test the classifier on the training set than there is good chance of training result overly optimistic. There would be minimal chance of error in model evaluation. Model effectiveness will be low.
f) How do errors occur? What could be the reasons?
As we have increased number of KNN number of neighbors error rate has increased except KNN=5. As we have changed distance weighting by weight by 1/distance. Error rate hasn’t change by changing number of KNN. Although mean square has increased slightly as we have increased KNN numbers.
In KNN we could have maximum accuracy and minimum rate at certain number of KNN. We have to find optimum number of KNN neighbors. Before building the model we should find optimum number of KNN neighbor’s.
g) How do the error rate change with the KNN parameter in IBk?
As we have increased number of KNN number of neighbors error rate has increased except KNN=5. As we have changed distance weighting by weight by 1/distance. Error rate hasn’t change by changing number of KNN. Although mean square has increased slightly as we have increased KNN numbers.
Model output:
=== Run information ===
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: Loan
Instances: 20
Attributes: 11
Employed
LoanPurpose
Gender
Married
ProblematicArea
Age
MoneyInBank
Salary
LoanMonths
YearsEmployed
Approved
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute Yes No
(0.64) (0.36)
==================================
Employed
Yes 14.0 5.0
No 1.0 4.0
[total] 15.0 9.0
LoanPurpose
Computer 9.0 3.0
Car 6.0 6.0
[total] 15.0 9.0
Gender
Male 8.0 4.0
Female 7.0 5.0
[total] 15.0 9.0
Married
Yes 7.0 5.0
No 8.0 4.0
[total] 15.0 9.0
ProblematicArea
Yes 2.0 2.0
No 13.0 7.0
[total] 15.0 9.0
Age
mean 31.453 29.4603
std. dev. 11.2303 13.128
weight sum 13 7
precision 3.5556 3.5556
MoneyInBank
mean 50.1923 34.5238
std. dev. 58.0167 21.8348
weight sum 13 7
precision 24.1667 24.1667
Salary
mean 5.8571 6.6327
std. dev. 3.7742 2.1878
weight sum 13 7
precision 1.8571 1.8571
LoanMonths
mean 16.9231 20.7429
std. dev. 5.6837 5.6221
weight sum 13 7
precision 4.4 4.4
YearsEmployed
mean 7.9327 2.2321
std. dev. 7.4169 2.187
weight sum 13 7
precision 3.125 3.125
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0 seconds
=== Summary ===
Correctly Classified Instances 20 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.114
Root mean squared error 0.1722
Relative absolute error 24.84 %
Root relative squared error 36.0861 %
Total Number of Instances 20
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 Yes
1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 No
Weighted Avg. 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000
=== Confusion Matrix ===
a b<-- classified as
13 0 | a = Yes
0 7 | b = No
=== Run information ===
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: Loan-weka.filters.unsupervised.attribute.Discretize-B2-M-1.0-Rfirst-last-precision6
Instances: 20
Attributes: 11
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute Yes No
(0.64) (0.36)
================================
Employed
Yes 14.0 5.0
No 1.0 4.0
[total] 15.0 9.0
LoanPurpose
Computer 9.0 3.0
Car 6.0 6.0
[total] 15.0 9.0
Gender
Male 8.0 4.0
Female 7.0 5.0
[total] 15.0 9.0
Married
Yes 7.0 5.0
No 8.0 4.0
[total] 15.0 9.0
ProblematicArea
Yes 2.0 2.0
No 13.0 7.0
[total] 15.0 9.0
Age
'(-inf-34]' 8.0 6.0
'(34-inf)' 7.0 3.0
[total] 15.0 9.0
MoneyInBank
'(-inf-77.5]' 10.0 8.0
'(77.5-inf)' 5.0 1.0
[total] 15.0 9.0
Salary
'(-inf-8.5]' 10.0 6.0
'(8.5-inf)' 5.0 3.0
[total] 15.0 9.0
LoanMonths
'(-inf-19]' 7.0 3.0
'(19-inf)' 8.0 6.0
[total] 15.0 9.0
YearsEmployed
'(-inf-12.5]' ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here