BENG 420/520 - Homework #1 Due on 2/21/2021, 11:59pm K-nearest neighbor (K-NN) classifier: a) Create a new Matlab file called h1_plot.m and program in this file only. b) Load the data (h1_data.mat) in...

Matlab code assignment


BENG 420/520 - Homework #1 Due on 2/21/2021, 11:59pm K-nearest neighbor (K-NN) classifier: a) Create a new Matlab file called h1_plot.m and program in this file only. b) Load the data (h1_data.mat) in Matlab. The variable “features” contains values of two features of many data points. Each row in “features” represents one data point. The variable “classlabels” contains the corresponding class labels of all the data points. A sample belongs to one of the two classes with its label being 1 or 2. c) (2 pts) Generate a 2-dimensional plot of the feature values of all data points (Figure 1). The x-axis is on the value of the first feature (column #1 in variable “features”) and the y-axis is on the value of the second feature (column #2 in variable “features”). Plot the two classes of data points as indicated by “classlabels”. Use two different symbols in terms of shape and/or color for the two classes. (2 pts) Describe your observation of the data (e.g. distributions, patterns, etc.) d) Implement the KNN classifier. Use the given data to test the KNN classification algorithm. Use Euclidean distance as the distance metric. Do not use any existing Matlab functions such as “knnsearch” in your implementation. e) Use the first 500 data points as the training dataset and the data points between rows 4001 and 4500 for validation dataset (i.e. to fine tune parameter “K”). · Iterate all K values between 1 and 80 and predict (classify) class labels for the validation dataset. · (4 pts) Plot the classification error percentage of the validation dataset as a function of K, which is defined as the number of data points being misclassified divided by the total number of data points evaluated (Figure 2). · (3 pts) Explain your observation on the relationship between K and the classification error. Is there an optimal value of K? Was the relationship expected? Discuss the influence of K on classification error, as shown by this example. · (1 pt) Determine the optimal K value you from the above classification error of the validation dataset. · (1 pt) Test the performance of your KNN classifier with this optimal K using a separate testing dataset (data points between rows 5001 and 5500) and report the testing error you observe. (2 pts) Compare the classification error percentages of the optimal K on the testing dataset and the validation dataset and discuss. f) (6 pts) Repeat all steps in e) using the first 2500 data points as the training dataset and answer all the questions. Plot generated will be Figure 3. g) (2 pts) Discuss: what are the strength(s) and weakness(s) of KNN classifier demonstrated by your results? How does the size of the training dataset affect the classification error? h) Submit your answers in a document and your Matlab code in a single file h1_plot.m. Make sure it produces the three figures noted above. Please comment your code which will be helpful for the grader to give you partial credits if needed.
Feb 23, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here