Option #1 k-Nearest Neighbor Suppose you work for a wine company as a data mining expert. You are to build a k-nearest neighbor algorithm to predict the quality of wine from different factors....

1 answer below »

Option #1k-Nearest Neighbor


Suppose you work for a wine company as a data mining expert. You are to build a k-nearest neighbor algorithm to predict the quality of wine from different factors. Download the dataset called
winequality-red.csv(Links to an external site.)
from Module 4 and then perform the following tasks. The data description is provided in the file called
description-winequality-red.docx(Links to an external site.)
.



  1. Provide the summary statistics for all the variables from the dataset. Explain some of the key aspects of the dataset.

  2. Examine the SAS code frommod4knn.sas(Links to an external site.)file and for each SAS statement, provide the explanation of the code as SAS comments. Put this part in the Appendix of your report.

  3. Perform the k-NN using k = 1, 2, and 3. For each case, explain the SAS output and give interpretation(s).

  4. Which case (k = 1, 2, or 3) provides the best model? Explain why using the output from part c.


Take the screenshots of SAS output and paste them into a Word document. Include all relevant calculations and your responses to all items (1, 2, 3, and 4) and submit the document to Canvas for grading. Clearly label all elements in your submission. In addition, provide a short description of any challenge(s) you faced during this assignment.


Your paper should be at least three to four pages in length (excluding title, reference, and appendix pages) and conform to theCSU Global Writing Center(Links to an external site.). Remember that your paper must be in a report form using APA format. An APA paper template can be found on the website:APA Paper Templates(Links to an external site.). Review the grading rubric to see how you will be graded for this assignment.

Answered 1 days AfterJan 10, 2021

Answer To: Option #1 k-Nearest Neighbor Suppose you work for a wine company as a data mining expert. You are to...

Swapnil answered on Jan 11 2021
144 Votes
Wine Quality Data Analysis
The objective of this report is to analyse read wine samples, build a kNN classification algorithm and evaluate the model on the dataset. The inputs include
objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). Following attributes are available in the dataset:
Wine Quality Dataset Independent Variable
● Fixed acidity
● Volatile acidity
● Citric acid
● Residual sugar
● Chlorides
● Free sulfur dioxide
● Total sulphur dioxide
● Density
● pH
● Sulphates
● Alcohol
Dependant Variable
● Quality
Data Pre-processing
We replace the Quality variable with a binary class variable for the classification task according to the following rule:
Class: High when Quality > 5
Class: Low when Quality <= 5
The summary statistics of all the variables in the dataset, we notice that a few of the features are on a different scale compared to the rest. Therefore, during in the model evaluation section, we test our kNN model on data that has been normalized to make the model less sensitive to the scale of certain features.
The following table shows us the Highly correlated pairs.
Data Transformations:
Outlier Analysis: We will check the outlier present in our dataset. We remove those records from our dataset where at least one attribute’s value for that record is an outlier. Since some outlier convey valuable information to model, the threshold that we set to decide an outlier is if the value is greater than 3 standard deviations from the mean of the attribute.
Data Normalization: We notice that a few of the features are on a different scale compared to the rest. Therefore, we need to normalize the data to make the model...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here