I need help to do the assignment.3/6/22, 2:27 AM Assignment 4: Prediction...

Question

I need help to do the assignment.3/6/22, 2:27 AM Assignment 4: Prediction https://wssu.instructure.com/courses/19153/assignments/325905?module_item_id=1036267 1/2 Assignment 4: Prediction Due  Tuesday by 11:59pm  Points  80  Submitting  a file upload Start Assignment I  Do the following experiments: 1. Run Weka's Naive Bayes on the original loan dataset (Loan_original.arff), 2-bin and 3- bin discretized data using training set test option.. Examine the error rate and identify the wrongly classified instances. To do the latter, right click on the current line in the result list window and then select "Visualize classifier errors". The wrongly classified instance will show in the plot in small squares. Click on an instance to get its information. Answer the following questions:           a) How is the model represented? What do the counts mean and why are they incremented by 1 (i.e. actual value count+1)?           b) What actually happens when you test the classifier on the training set?           c) How do errors occur? What could be the reasons?           d) How does the error rate change over the three different data sets (original, 2-bin, 3-bin)? Any guess why? 2. Run Weka's IBk algorithm on the original loan data (no discretization) with the "Use training set" test option and examine the evaluation results (correctly/incorrectly classified instances). Vary KNN (the number of neighbors) with and without distance weighting. Try for example KNN=1,3,5,20 without distance weighting and with weight = 1/distance. Compare the results and find explanations. Answer the following questions by looking at what the algorithm does for each instance from the test set (which in this case is also a training set). Find conceptual level explanations, no need to go into computing distances:         e) What actually happens when you test the classifier on the training set?         f) How do errors occur? What could be the reasons?         g) How do the error rate change with the KNN parameter in IBk? 3. Decide on the application of a new customer. 3/6/22, 2:27 AM Assignment 4: Prediction https://wssu.instructure.com/courses/19153/assignments/325905?module_item_id=1036267 2/2 Prepare a test set using the information of the new customer described in the Assignment in 3: A customer applying for a 30 month loan with 80,000 yen monthly pay to buy a car from a customer with the following data: male, employed, 22 years old, not married, does not live in a problematic area, has worked 1 year for his last employer and has 500,000 yen in a bank.             For more information use Handout Week 7    (https://wssu.instructure.com/courses/19153/files/2875098/download?download_frd=1) “Using Weka 3 for classification and prediction”. Run Naive Bayes and IBk with different parameters for KNN and distance weighting, all with "Supplied test set" test option. Compare the prediction results obtained with different algorithms. Decide on the loan application of the new customer by using the outputs from the prediction algorithms.  II  Write a report on the prediction experiments described above. Include the following information (DO NOT include data sets or classifier outputs): The original 7 questions (4 about Bayes and 3 about IBK) with short answers to EACH ONE. ONE Naive Bayes model (any version of the loan data set) with explanations of its parameters (the answer to #1 (a) may be included here). Results from predicting the new customer's classification (Experiments with Prediction, #3) with short comment. https://wssu.instructure.com/courses/19153/files/2875098?wrap=1 https://wssu.instructure.com/courses/19153/files/2875098/download?download_frd=1   Chapter 4 Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4, Algorithms: the basic methods of Data Mining by I. H. Witten, E. Frank,  M. A. Hall and C. J. Pal 2 Algorithms: The basic methods • Simple probabilistic modeling • Linear models • Instance-based learning 3 Can combine probabilities using Bayes’s rule • Famous rule from probability theory due to • Probability of an event H given observed evidence E: • A priori probability of H : • Probability of event before evidence is seen • A posteriori probability of H : • Probability of event after evidence is seen Thomas Bayes Born: 1702 in London, England Died: 1761 in Tunbridge Wells, Kent, England P(H |E)= P(E |H)P(H) /P(E) P(H ) P(H |E) 4 Naïve Bayes for classification • Classification learning: what is the probability of the  class given an instance? • Evidence E = instance’s non-class attribute values • Event H = class value of instance • Naïve assumption: evidence splits into parts (i.e.,  attributes) that are conditionally independent • This means, given n attributes, we can write Bayes’ rule  using a product of per-attribute probabilities: P(H |E)= P(E1 |H)P(E3 |H)… P(En |H)P(H) /P(E) 5 Weather data example ?TrueHighCoolSunny PlayWindyHumidityTemp.Outlook Evidence E Probability of class “yes” P(yes | E)= P(Outlook = Sunny | yes) P(Temperature =Cool | yes) P(Humidity = High | yes) P(Windy = True | yes) P(yes) / P(E) = 2 / 9´3 / 9´3 / 9´3 / 9´9 /14 P(E) 6 The “zero-frequency problem” • What if an attribute value does not occur with every  class value? (e.g., “Humidity = high” for class “yes”) • Probability will be zero: • A posteriori probability will also be zero: (Regardless of how likely the other values are!) • Remedy: add 1 to the count for every attribute value- class combination (Laplace estimator) • Result: probabilities will never be zero • Additional advantage: stabilizes probability estimates computed from small samples of data P(Humidity =High | yes)= 0 P(yes |E)= 0 7 Modified probability estimates • In some cases adding a constant different from 1 might  be more appropriate • Example: attribute outlook for class yes • Weights don’t need to be equal  (but they must sum to 1) Sunny Overcast Rainy 8 Missing values • Training: instance is not included in frequency count for  attribute value-class combination • Classification: attribute will be omitted from calculation • Example: ?TrueHighCool? PlayWindyHumidityTemp.Outlook Likelihood of “yes” = 3/9  3/9  3/9  9/14 = 0.0238 Likelihood of “no” = 1/5  4/5  3/5  5/14 = 0.0343 P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41% P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59% 9 Numeric attributes • Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) • The probability density function for the normal  distribution is defined by two parameters: • Sample mean  • Standard deviation  • Then the density function f(x) is 10 Statistics for weather data • Example density value: 5/ 14 5 No 9/ 14 9 Yes Play 3/5 2/5 3 2 No 3/9 6/9 3 6 Yes True False True False Windy  =9.7  =86 95,  … 90, 91, 70, 85, NoYesNoYesNoYes  =10.2  =79 80,  … 70, 75, 65, 70, Humidity  =7.9  =75 85,  … 72,80, 65,71,  =6.2  =73 72,  … 69, 70, 64, 68, 2/53/9Rainy Temperature 0/54/9Overcast 3/52/9Sunny 23Rainy 04Overcast 32Sunny Outlook 11 Classifying a new day • A new day: • Missing values during training are not included  in calculation of mean and standard deviation ?true9066Sunny PlayWindyHumidityTemp.Outlook Likelihood of “yes” = 2/9  0.0340  0.0221  3/9  9/14 = 0.000036 Likelihood of “no”  = 3/5  0.0221  0.0381  3/5  5/14 = 0.000108 P(“yes”) = 0.000036 / (0.000036 + 0. 000108) = 25% P(“no”)  = 0.000108 / (0.000036 + 0. 000108) = 75% 12 Probability densities • Probability densities f(x) can be greater than 1; hence,  they are not probabilities • However, they must integrate to 1: the area under the  probability density curve must be 1 • Approximate relationship between probability and  probability density can be stated as assuming ε is sufficiently small • When computing likelihoods, we can treat densities just  like probabilities  P(x-e / 2 £ X £ x+e / 2) »e f (x) 13 Multinomial naïve Bayes I • Version of naïve Bayes used for document classification  using bag of words model • n1,n2, ... , nk: number of times word i occurs in the  document • P1,P2, ... , Pk: probability of obtaining word i when  sampling from documents in class H • Probability of observing a particular document E given  probabilities class H (based on multinomial distribution): • Note that this expression ignores the probability of  generating a document of the right length  • This probability is assumed to be constant for all classes 14 Multinomial naïve Bayes II • Suppose dictionary has two words, yellow and blue • Suppose P(yellow | H) = 75% and P(blue | H) = 25% • Suppose E is the document “blue yellow blue” • Probability of observing document: Suppose there is another class H' that has  P(yellow | H’) = 10% and P(blue| H’) = 90%: • Need to take prior probability of class into account to make the  final classification using Bayes’ rule • Factorials do not actually need to be computed: they drop out • Underflows can be prevented by using logarithms P({blue yellowblue} |H )= 3!´ 0.751 1! ´ 0.252 2! = 27 64 P({blue yellowblue} |H )= 3!´ 0.11 1! ´ 0.92 2! = 243 1000 15 Naïve Bayes: discussion • Naïve Bayes works surprisingly well even if independence  assumption is clearly violated • Why? Because classification does not require accurate  probability estimates as long as maximum probability is  assigned to the correct class • However: adding too many redundant attributes will cause  problems (e.g., identical attributes) • Note also: many numeric attributes are not normally  distributed (kernel density estimators can be used instead) 16 Classification • Any regression technique can be used for classification • Training: perform a regression for each class, setting the output to 1  for training instances that belong to class, and 0 for those that don’t • Prediction: predict class corresponding to model with largest output  value (membership value) • For linear regression this method is also known as multi- response linear regression • Problem: membership values are not in the [0,1] range, so  they cannot be considered proper probability estimates • In practice, they are often simply clipped into the [0,1]  range and normalized to sum to 1 17 Linear models: logistic regression • Can we do better than using linear regression for  classification? • Yes, we can, by applying logistic regression • Logistic regression builds a linear model for a transformed  target variable • Assume we have two classes • Logistic regression replaces the target by this target • This logit transformation maps [0,1] to (- , + ), i.e., the new  target values are no longer restricted to the

Mohd · Accepted Answer

Answer the following questions:
 a) How is the model represented? What do the counts mean and why are they incremented by 1
(i.e., actual value count+1 )?
In naïve Bayes classification, model that impose class value or label to vector of features. On the contrary response variable labels are from a fixed set of values. Generally, we use count to address zero frequency problem. We increment the count by 1 because it eliminated the possibility of zero probability. Also stabilizes probability estimates.
 b) What actually happens when you test the classifier on the training set?
If we test the classifier on the training set than there is good chance of training result overly optimistic. There would be minimal chance of error in model evaluation.
 c) How do errors occur? What could be the reasons?
In original loan dataset, correctly classified instances are 100 percent or we can say 0 percent error rate. In two bin dataset, correctly classified instances are 85 percent. That means, there are 15 percent instances are incorrectly classified. In three bin dataset, correctly classified instances are 95 percent. That means, there are 5 percent instances are incorrectly classified. As we have increased number of bins from 2 to 3 accuracy has increased by 10 percent. That means more levels in attributes could lead to higher accuracy. There are various ways to deal with the assessment of the Bayes error rate exist. One strategy looks to get insightful limits which are innately reliant upon circulation boundaries, and thus challenging to appraise. Another methodology centers around class densities, while one more strategy joins and analyses different classifiers.
 d) How does the error rate change over the three different data sets (original, 2-bin, 3-bin)? Any guess why?
The error rate is zero over the loan original dataset. 
The error rate is 15 percent over the 2-bin dataset.

3/6/22, 2:27 AM Assignment 4: Prediction https://wssu.instructure.com/courses/19153/assignments/325905?module_item_id= XXXXXXXXXX/2 Assignment 4: Prediction Due Tuesday by 11:59pm Points 80 Submitting...

Answer To: 3/6/22, 2:27 AM Assignment 4: Prediction...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment