Read Chapter 8CompleteExercise questions 1,2,4, 5,6,7,8,and 10- Using EXCEL & Word. Please #the questions

1 answer below »
Read Chapter 8CompleteExercise questions 1,2,4, 5,6,7,8,and 10- Using EXCEL & Word. Please #the questions
Answered 1 days AfterAug 13, 2021

Answer To: Read Chapter 8CompleteExercise questions 1,2,4, 5,6,7,8,and 10- Using EXCEL & Word. Please #the...

Saravana answered on Aug 14 2021
136 Votes
Chapter 8 Exercise questions
1. A predictive model’s performance is better quantified by the misclassification rate with test dataset than by the misclassification rate with training dataset for the following reasons:
· The al
gorithm always performs better with the training dataset used to build the model compared to the random new test dataset. This ensures that misclassification rate of the training dataset is somewhat lower than the misclassification rate of the test dataset.
· The misclassification error for training dataset could be misleading for certain classification algorithms like for the 1-Nearest Neighbors classifier on the training data set. When applied on the same data the model was trained, the 1-Nearest Neighbor classifier results to correct predictions in all cases, as the nearest neighbor for each point will always be the point itself. Thus the classifier obtains a misclassification error of 0% which is misleading (Mierswa, 2017).
Mierswa, Ingo. (2017, Jannuary 10). Why You Should Ignore the Training Error. rapidminer. https://rapidminer.com/blog/validate-models-ignore-training-errors/
2. The following are the risks associated with not sampling the data in to training and test dataset and using the entire dataset.
· The performance of a model trained with entire dataset could carry forward the inherent bias in the model.
· Some feature in the data might favor a particular model over other models resulting in better performance for a particular model.
· The models with tendency to over fit have higher chance of over fitting when an entire data set is used. Due to over fitting, these models primarily have higher goodness fit but fail to predict future data well and also have very less accuracy.
4. The 45degree line is the reference line that represents the naïve model. A naïve model represents the number of expected “successes” we would predict if we selected only the most common category. In this example, the naïve model represents the number of correct classification successes. The other curve represents the cumulative gains i.e. the cumulative number of true positives for every 10deciles in the validation partition. The individuals most likely to take out loans are placed at the top of the data partition and this...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here