Department of Computer Science 176 Thorvaldson Building 110 Science Place, Saskatoon, SK, S7N 5C9, Canada Telephine: XXXXXXXXXX, Facimile: XXXXXXXXXX CMPT 423/820 Winter 2021 Machine Learning...

Machine Learning....... Question 5 of the PDF (The Last One).


Department of Computer Science 176 Thorvaldson Building 110 Science Place, Saskatoon, SK, S7N 5C9, Canada Telephine: (306) 966-4886, Facimile: (306) 966-4884 CMPT 423/820 Winter 2021 Machine Learning Assignment 2 Simple Classi�ers Date Due: October 19, 2021, 5pm Total Marks: 60 Version History • 5/10/2021: released to students General Instructions • This assignment is individual work. You may discuss questions and problems with anyone, but the work you hand in for this assignment must be your own work. • Each question indicates what to hand in. • Assignments must be submitted to Canvas. • Assignments will be accepted until 11:59pm without penalty. Software This assignment primarily exercises the use of Scikit-Learn, an Open Source collection of tools for Machine Learning. This collection is quite extensive, and could be intimidating. Rest assured that we will practice our understanding of concepts with this tool, but you won’t be tested on how well you know the software. You’ll use Jupyter Notebooks to complete these assignment questions, and you will be allowed to make use of all the software we introduced in A1. As you complete this assignment, you may wonder if you can use this or that package that youmight have heard about. The answerwill dependon how that package contributes to your work. If the software you are considering accomplishes the learning objectives in a way that means you don’t have to think about them, I will probably disallow it. If the software does not impact your attention to the learning objectives, I will probably allow it. It’s always good to ask. As in A1, you’ll be asked to submit a PDF version along with your Jupyter Notebooks. This will assist the markers. Marks will be deducted if PDFs are not submitted. I have found that the most reliable way to produce a PDF of a Jupyter Notebook is to Export to HTML, Open the HTML in a browser, and Print As PDF from your browser. Department of Computer Science 176 Thorvaldson Building 110 Science Place, Saskatoon, SK, S7N 5C9, Canada Telephine: (306) 966-4886, Facimile: (306) 966-4884 CMPT 423/820 Winter 2021 Machine Learning Question 1 (15 points): Learning Objectives: • Practical experience building Naive Bayes Classi�ers • Practice critically evaluating the performance of classi�ers. Competency Level: Basic The IRIS dataset (used in A1 and also in lecture) has 4 continuous features/attributes, and the class label. We saw in class that we can get pretty good accuracy using one Gaussian Naive Bayes Classi�er (GNBC) when all 4 features/attributes are used. In this question, we will build 4 di�erent GNBCs, each classi�er using only one of the features/attributes to �t the model. In other words, the �rst classi�er will use feature/attribute/column 1, the second classi�er will use feature/attribute/column 2, etc. 1. Build the four 1-feature classi�ers, and calculate the accuracy of each. 2. Build the 4-feature classi�er (as we saw in class), and calculate the accuracy. 3. Reproduce the density plots from A1Q6 Task 5 that shows the class density for each feature, and compare the density plots to the accuracy scores you obtained. In a few sentences discuss how the density plot relates to the accuracy score. 4. Compare the best 1-feature classi�er to the 4-feature classi�er, in terms of accuracy. Discuss brie�y your results. Errata 1. None so far! What to Hand In A PDF document exported from Jupyter Notebook, containing the tasks and discussions above, with your name and student number at the top of the document, as in Assignment 1. • Make sure that your document is well-structured, using headings and providing discussion in Mark- down cells. • Make sure that the markers can read your document and grade it easily. Evaluation • You constructed the four 1-feature classi�ers, and calculated their accuracies. – 3 marks. Your Python scripting was neat and presentable. You made good use of Python com- ments, and Markdown cells to explain your method to a reader. – 2 marks. You calculated the accuracies correctly, and presented them neatly. • You discussed the relation between accuracy of each 1-feature classi�er, and the graphical visualiza- tion provided by the class density for each feature. – 4 marks. Your discussion highlighted the visual clues that might indicate di�erences in accuracy. – 2 marks. Your discussion was not too long! Seriously, keep it to the point. • You compared the best 1-feature classi�er to the 4-feature classi�er in terms of accuracy. – 2 marks. Your discussion was relevant. – 2 marks. Your discussion was not too long. Page 2 Department of Computer Science 176 Thorvaldson Building 110 Science Place, Saskatoon, SK, S7N 5C9, Canada Telephine: (306) 966-4886, Facimile: (306) 966-4884 CMPT 423/820 Winter 2021 Machine Learning Question 2 (10 points): Learning Objectives: • Critically assess a dataset based on visualization. Competency Level: Basic This question is preparation for Question 3. It’s a separate question to prevent your answer for Q3 from being too cluttered. On the Assignment Moodle page, you’ll �nd a dataset named A2Q2.cvs. This dataset has 14 columns. The �rst column is the class label, using the integers 1-3 as labels. The remaining columns are continuous features. Plot the class densities for all 13 features, similar to A1Q6 Task 5. Comment on each feature, relating the visualization to its potential utility in a classi�er (based on your experience from Q1). Answer the following questions: • Which, if any, of the 13 features, would you pick as the single feature in a 1-feature classi�er? Brie�y explain your answer. • Prior to building a classi�er, do you think a classi�er based on this data will have high accuracy? Brie�y explain your answer. Errata 1. None so far! What to Hand In A PDF document exported from Jupyter Notebook, containing 13 density plots, and brief discussion, with your name and student number at the top of the document, as in Assignment 1. • Make sure that your document is well-structured, using headings and providing discussion in Mark- down cells. • Make sure the answers to the questions are easy to �nd! • Make sure that the markers can read your document and grade it easily. Evaluation • 4 marks: Your density plots are correct, and neatly presented. • 6 marks: You answers to the questions demonstrate you’ve assessed the features critically. Page 3 Department of Computer Science 176 Thorvaldson Building 110 Science Place, Saskatoon, SK, S7N 5C9, Canada Telephine: (306) 966-4886, Facimile: (306) 966-4884 CMPT 423/820 Winter 2021 Machine Learning Question 3 (15 points): Learning Objectives: • To critically compare di�erent models on the same dataset. Competency Level: Basic On the Assignment Moodle page, you’ll �nd a dataset named a2q3.cvs. This dataset has 14 columns. The �rst column is the class label, using the integers 1-3 as labels. The remaining columns are continuous features. Use this dataset to compare three classi�ers: 1. K-Nearest Neighbours Classi�er. Remember that you’ll have to chooseK . 2. Naive Bayes Classi�er 3. Decision Tree Classi�er. To keep things interesting, use f1 as the metric for comparison. Discuss the performance of the three classi�ers. Which, if any, would you choose as the best model for the data? Explain your answer. To complete this question, you’ll have to research the Scikit-Learn User Manual to use KNN and Decision Trees. Errata 1. None so far! What to Hand In A PDF document exported from Jupyter Notebook, with your name and student number at the top of the document, as in Assignment 1. • Make sure that your document is well-structured, using headings. • Make sure that you’ve used the Scikit-Learn models correctly. • Document any decisions about parameter choices, etc, in Markdown cells close to your scripts. • Address the discussion comparing classi�ers in Markdown cells at the end of your document. • Make sure the di�erent parts of your solution to the question are easy to �nd! • Make sure that the markers can read your document and grade it easily. Evaluation • 3marks: You �tted the KNN classi�er appropriately by choosing k, and other parameters to themodel. • 3 marks: You �tted the Decision Tree classi�er appropriately by choosing appropriate parameters to the model. • 1 mark: You �tted the Naive Bayes classi�er appropriately. • 8 marks: Your discussion of the performance of the classi�ers arrived at a well-reasoned conclusion. Page 4 Department of Computer Science 176 Thorvaldson Building 110 Science Place, Saskatoon, SK, S7N 5C9, Canada Telephine: (306) 966-4886, Facimile: (306) 966-4884 CMPT 423/820 Winter 2021 Machine Learning Question 4 (5 points): Purpose: To exercise the derivation of formulae involving probability. Degree of Di�culty: Moderate. The actual derivation is easier than the explanation. References: Lecture Notes 04, Math with LATEX in Jupyter Notebook There is a notion in Bayesian statistics that yesterday’s posterior probabilities are today’s prior probabilities. It’s an idea that suggests that learning should naturally combine data collected over time. It also reassures us that choosing a prior can be based on data seen previously. To understand this notion we need to do some math. Task Suppose we collected data X1 yesterday, and used the data to calculate P (y|X1). Yesterday, the prior that we assumed was P (µ) = Beta(µ|a, b). Today, we collected data X2, and we wish to calculate P (y|X1,X2). Derive an expression for P (µ|X1,X2) in terms of yesterday’s posterior P (µ|X1). This expression shows how yesterday’s posterior can be used as if it were a prior. Elaboration In the following, we’ll start with a review of the lecture material. Then we’ll think about what happens with data X1 collected yesterday, and then more data X2 today. We could just throw away the model based onX1, and start over with all the data. But we can be cleverer than that here. In class we derived the following equation using Bayes’ Rule: P (µ|X) = P (X|µ)P (µ) P (X) This was one of the steps in determining P (y|X) for a binary event Y . In this expression, P (µ) is the prior distribution for µ, and P (µ|X) is the posterior. We assumed that P (µ) = Beta(µ|a, b), and we learned that E [µ] = aa+b . We also saw (skipping some of the mathematical details) that: E [µ|X] = m+ a N + a+ b wherem andN come from the dataX. The posterior P (µ|X) turns out also to be a Beta distribution over µ, but it’s Beta(a1, b1), where a1 = m+ a and b1 = N −m+ b are new hyper-parameters. In e�ect, Beta(a1,b1) summarizes everything we learned about µ fromX. Suppose we collected dataX1 yesterday, and used the data to calculate P (y|X1). Yesterday, the prior that we assumed was P (µ) = Beta(µ|a, b). Today
Oct 17, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here