Machine Learning, 2022S: HW6 CS XXXXXXXXXXMachine Learning: Homework 6 SpRing 2022 Due: Sunday, May 1, 2022 (End of day) We are coming back to the dataset that we used in homework 5. Namely, we will...

1 answer below »
please, i need a microsoft word document just to highlight the answers - short answers as done in the code.So i need two things, the well labelled codes for each question and microsoft word document addressing the short answers for each question.


Machine Learning, 2022S: HW6 CS 5033 - Machine Learning: Homework 6 SpRing 2022 Due: Sunday, May 1, 2022 (End of day) We are coming back to the dataset that we used in homework 5. Namely, we will be using a UCI simulated electrical grid stability data set that is available here: https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+. This dataset has 10,000 examples using 11 real-valued attributes, with a binary target (stable vs. unstable). The target value that you are predicting is the last column in the dataset. Remark 1 (Cross-Entropy). For the cross-entropy values that you want to report in the questions below, please use the following formula (empirical risk using the cross-entropy loss): R̂S(h, c) = − 1 m m∑ i=1 [yi ln(pi) + (1− yi) ln(1− pi)] , where m is the size of the sample S where we evaluate our hypothesis h, yi ∈ {0, 1} is the true label c(xi) of the instance xi, and pi is the probability of assigning the positive label to instance xi by our hypothesis. Exercise 1 – Preprocessing (10 points). You have already done this part in homework 4. How- ever, since you may need to refresh your memory with what you did, this part is worth a few points. (a) Remove columns 5 and 13 (labeled p1 and stab); p1 is non-predictive and stab is target column that is exactly correlated with the binary target you are trying to predict (if this column is negative, the system is stable). (b) Change the target variable to a number. If the value is stable, change it to 1, and if the value is unstable, change it to 0. (c) Remove 20% of the examples and keep them for testing. Youmay assume that all examples are independent, so it does not matter which 20% you remove. However, the testing data should not be used until after a model has been selected. (d) Split the remaining examples into training (75%) and validation (25%). Thus, you will train with 60% of the full dataset (75% of 80%) and validate with 20% of the full dataset (25% of 80%). April 11, 2022 1/3 https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+ CS 5033 - Machine LeaRning: HomewoRK 6 Exercise 2 – Artificial Neural Network (20 points). You may use sklearn.neural_network.MLPClassifier. (a) Fit an artificial neural network to the training data using 1 hidden layer of 20 units as well as another neural network that has 2 hidden layers of 10 units each. (b) For each model made in (a), make a probabilistic prediction for each validation example. Re- port the cross-entropies between the predictions and the true labels in your writeup. (c) Which neural network performs the best on the validation data? Report this in your writeup. Train a new neural network using the architecture that performed better among the two using the training and validation data. Make a probabilistic prediction for each testing example using this model and save them for later. Exercise 3 – Decision Trees (20 points). For this problem you can use the scikit-learn method sklearn.tree.DecisionTreeClassifier. (a) Fit a decision tree to the training data using the Gini impurity index and max tree depth of 5. (b) Using themodel created in part (a)make a probabilistic prediction for each validation example. What is the cross-entropy on these predictions and the true labels? Put this value in your writeup. (c) Fit a decision tree to the training data using information gain and max tree depth of 5. (d) Using themodel created in part (c) make a probabilistic prediction for each validation example. What is the cross-entropy on these predictions and the true labels? Put this value in your writeup. (e) Which model performed better on the validation data? Report this in your writeup. Train a new decision tree on the training and validation data using whichever measure created the best model in (a)-(d), with a max tree depth of 5. Make a probabilistic prediction for each testing example and save them for later. Exercise 4 – Boosting (20 points). For this problem you may use sklearn.ensemble.AdaBoostClassifier. (a) Fit boosted decision stumps (max tree depth of 1) to the training data allowing at most 20, 40, and 80 decision stumps (base estimators) in each model. (b) For each model trained in (a), make a probabilisitc prediction for each validation example. Report the cross-entropies between the predictions and the true labels in your writeup. (c) Which upper bound on the number of allowed base classifiers generates the best performing model? Report this in your writeup. Train a new AdaBoost classifier using this bound on the number of maximum allowed base classifiers, using the training and validation data. Make a probabilistic prediction for each testing example using this model and save them for later. 2/3 April 11, 2022 CS 5033 - Machine LeaRning: HomewoRK 6 Exercise 5 – ROC Curve (30 points). For this exercise you must write your own code; no scikit-learn, except maybe to compute AUC. For each model produced in Exercises 2-4 do the following: (a) Determinize the testing predictions made above, using 1001 different probability thresholds (0.000, 0.001, 0.002, . . ., 0.999, 1.000). “Determinization” means converting the probability to a deterministic class label (0 or 1). Use (1) below for determinization. We have that p∗ is the critical threshold; pi is the predicted probability for example i; and Pi is the resulting deterministic prediction: Pi = { 1, if pi ≥ p∗ 0, otherwise (1) (b) At each of the 1001 probability thresholds, compute the true positive rate (TPR) and false positive rate (FPR). Recall that these values are easily computed from the confusion matrix. (You would have to re-calculate the confusionmatrix for each one of these thresholds, for each model.) (c) Plot the ROC (receiver operating characteristic) curve, using the 1001 points created in part 6b. If you have forgotten what a ROC curve looks like, see our notes on model evaluation. The ROC curve must contain a point at the bottom left (0, 0) and top right (1, 1). Also, it must contain the dashed grey line, indicating the performance of a random predictor. Include the ROC curve for each model in your write-up. (d) Find the probability threshold yielding the highest Youden index (TPR - FPR). Report the Youden index and the corresponding probability threshold for each model. (e) Compute the AUC (area under the curve) for each model. You may use the function sklearn.metrics.roc_auc_score for this part. April 11, 2022 3/3
Answered 6 days AfterApr 23, 2022

Answer To: Machine Learning, 2022S: HW6 CS XXXXXXXXXXMachine Learning: Homework 6 SpRing 2022 Due: Sunday, May...

Uhanya answered on Apr 26 2022
98 Votes
AUC-ROC Curve – The Star Performer!
You’ve built your machine learning model – so what’s next? You need to evaluate it and validate how good (or bad) it is, so you can then decide on whether to implement it. That’s where the AUC-ROC c
urve comes in.
The name might be a mouthful, but it is just saying that we are calculating the “Area Under the Curve” (AUC) of “Receiver Characteristic Operator” (ROC). Confused? I feel you! I have been in your shoes. But don’t worry, we will see what these terms mean in detail and everything will be a piece of cake!
For now, just know that the AUC-ROC curve helps us visualize how well our machine learning classifier is performing. Although it works for only binary classification problems, we will see towards the end how we can extend it to evaluate multi-class classification problems too.
We’ll cover topics like sensitivity and specificity as well since these are key topics behind the AUC-ROC curve.
I suggest going through the article on Confusion Matrix as it will introduce some important terms which we will be using in this article.
Become a Full-Stack Data Scientist
Power Ahead in your AI ML Career | No Pre-requisites RequiredDownload Brochure
 
Table of Contents
· What are Sensitivity and Specificity?
· Probability of Predictions
· What is the AUC-ROC Curve?
· How Does the AUC-ROC Curve Work?
· AUC-ROC in Python
· AUC-ROC for Multi-Class Classification
 
What are Sensitivity and Specificity?
This is what a confusion matrix looks like:
From the confusion matrix, we can derive some important metrics that were not discussed in the previous article. Let’s talk about them here.
 
Sensitivity / True Positive Rate / Recall
Sensitivity tells us what proportion of the positive class got correctly classified.
A simple example would be to determine what proportion of the actual sick people were correctly detected by the model.
 
False Negative Rate
False Negative Rate (FNR) tells us what proportion of the positive class got incorrectly classified by the classifier.
A higher TPR and a lower FNR is desirable since we want to correctly classify the positive class.
 
Specificity / True Negative Rate
Specificity tells us what proportion of the negative class got correctly classified.
Taking the same example as in Sensitivity, Specificity would mean determining the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here