Hello, I have another ML Python assignment that will not have time to fully complete. I have...

Question

Hello, I have another ML Python assignment that will not have time to fully complete. I have attached the word doc with all of the questions that need to be answered, along with the assignment file that it needs to be completed in (assignment.ipynb) and the data can be downloaded -https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients#Regards,Ed

Assignment: Decision Trees Learning outcomes · Understand how to use decision trees on a Dataset to make a prediction · Learning hyper-parameters tuning for decision trees by using RandomGrid · Learning the effectiveness of ensemble algorithms (Random Forest, Adaboost, Extra trees classifier, Gradient Boosted Tree) · · In the first part of this assignment, you will use Classification Trees for predicting if a user has a default payment option active or not. You can find the necessary data for performing this assignment here · This dataset is aimed at the case of customer default payments in Taiwan. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default. · Required imports for this project are given below. Make sure you have all libraries required for this project installed. You may use conda or pip based on your set up. · NOTE: Since data is in Excel format you need to install xlrd in order to read the excel file inside your pandas dataframe. You can run pip install xlrd to install Questions (15 points total) Question 1 (2 pts) Build a classifier by using decision tree and calculate the confusion matrix. Try different hyper-parameters (at least two) and discuss the result. Question 2 (4 pts) Try to build the decision tree which you built for the previous question, but this time by RandomGrid search over hyper-parameters. Compare the results. Question 3 (6 pts) Try to build the same classifier by using following ensemble models. For each of these models calculate accuracy and at least for two in the list below, plot the learning curves. · Random Forest · AdaBoost · Extra Trees Classifier · Gradient Boosted Trees Question 4 (3 pts) Discuss and compare the results for the all past three questions. · How does changing hyperparms effect model performance? · Why do you think certain models performed better/worse? · How does this performance line up with known strengths/weakness of these models?

assignment-o3vkr3cn.docx assignment-55f3g4mw.ipynb

Suraj · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9OBvBOCkPrga"
   },
   "source": [
    "## Assignment 4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "bEmSTWZSPrgb"
   },
   "source": [
    "This assignment is based on content discussed in module 8 and using Decision Trees and Ensemble Models in classification and regression problems."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "1cUoTzQLPrgc"
   },
   "source": [
    "## Learning outcomes "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Q1ygYVo_Prgc"
   },
   "source": [
    "- Understand how to use decision trees on a Dataset to make a prediction
",
    "- Learning hyper-parameters tuning for decision trees by using RandomGrid 
",
    "- Learning the effectiveness of ensemble algorithms (Random Forest, Adaboost, Extra trees classifier, Gradient Boosted Tree)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9hjVbQlVPrgd"
   },
   "source": [
    "In the first part of this assignment, you will use Classification Trees for predicting if a user has a default payment option active or not. You can find the necessary data for performing this assignment [here](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients) 
",
    "
",
    "This dataset is aimed at the case of customer default payments in Taiwan. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default.
",
    "
",
    "Required imports for this project are given below. Make sure you have all libraries required for this project installed. You may use conda or pip based on your set up.
",
    "
",
    "__NOTE:__ Since data is in Excel format you need to install `xlrd` in order to read the excel file inside your pandas dataframe. You can run `pip install xlrd` to install "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "R376ZBnBPrge"
   },
   "outputs": [],
   "source": [
    "#required imports
",
    "import numpy as np
",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ddF9R5pdPrgi"
   },
   "source": [
    "After installing the necessary libraries, proceed to download the data. Since reading the excel file won't create headers by default, we added two more operations to substitute the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "CtNCjjr7Prgj"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None
"
     ]
    }
   ],
   "source": [
    "#loading the data
",
    "dataset = pd.read_excel("https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls")
",
    "#dataset.columns = dataset.iloc[0]
",
    "#dataset.drop(['ID'], inplace=True)
",
    "dataset.drop(dataset.columns[dataset.columns.str.contains('unnamed',case = False)],axis = 1, inplace = True)
",
    "print(dataset.drop(0,inplace=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "cMh-sEIdPrgl"
   },
   "source": [
    "In the following, you can take a look into the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "E0lAPOXQPrgl",
    "outputId": "ea66ba57-f32c-4b39-c60a-e52402acbca1"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      X1
",
       "      X2
",
       "      X3
",
       "      X4
",
       "      X5
",
       "      X6
",
       "      X7
",
       "      X8
",
       "      X9
",
       "      X10
",
       "      ...
",
       "      X15
",
       "      X16
",
       "      X17
",
       "      X18
",
       "      X19
",
       "      X20
",
       "      X21
",
       "      X22
",
       "      X23
",
       "      Y
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      1
",
       "      20000
",
       "      2
",
       "      2
",
       "      1
",
       "      24
",
       "      2
",
       "      2
",
       "      -1
",
       "      -1
",
       "      -2
",
       "      ...
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      689
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      1
",
       "    
",
       "    
",
       "      2
",
       "      120000
",
       "      2
",
       "      2
",
       "      2
",
       "      26
",
       "      -1
",
       "      2
",
       "      0
",
       "      0
",
       "      0
",
       "      ...
",
       "      3272
",
       "      3455
",
       "      3261
",
       "      0
",
       "      1000
",
       "      1000
",
       "      1000
",
       "      0
",
       "      2000
",
       "      1
",
       "    
",
       "    
",
       "      3
",
       "      90000
",
       "      2
",
       "      2
",
       "      2
",
       "      34
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      ...
",
       "      14331
",
       "      14948
",
       "      15549
",
       "      1518
",
       "      1500
",
       "      1000
",
       "      1000
",
       "      1000
",
       "      5000
",
       "      0
",
       "    
",
       "    
",
       "      4
",
       "      50000
",
       "      2
",
       "      2
",
       "      1
",
       "      37
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      ...
",
       "      28314
",
       "      28959
",
       "      29547
",
       "      2000
",
       "      2019
",
       "      1200
",
       "      1100
",
       "      1069
",
       "      1000
",
       "      0
",
       "    
",
       "    
",
       "      5
",
       "      50000
",
       "      1
",
       "      2
",
       "      1
",
       "      57
",
       "      -1
",
       "      0
",
       "      -1
",
       "      0
",
       "      0
",
       "      ...
",
       "      20940
",
       "      19146
",
       "      19131
",
       "      2000
",
       "      36681
",
       "      10000
",
       "      9000
",
       "      689
",
       "      679
",
       "      0
",
       "    
",
       "    
",
       "      6
",
       "      50000
",
       "      1
",
       "      1
",
       "      2
",
       "      37
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      ...
",
       "      19394
",
       "      19619
",
       "      20024
",
       "      2500
",
       "      1815
",
       "      657
",
       "      1000
",
       "      1000
",
       "      800
",
       "      0
",
       "    
",
       "    
",
       "      7
",
       "      500000
",
       "      1
",
       "      1
",
       "      2
",
       "      29
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      0
",
       "      ...
",
       "      542653
",
       "      483003
",
       "      473944
",
       "      55000
",
       "      40000
",
       "      38000
",
       "      20239
",
       "      13750
",
       "      13770
",
       "      0
",
       "    
",
       "    
",
       "      8
",
       "      100000
",
       "      2
",
       "      2
",
       "      2
",
       "      23
",
       "      0
",
       "      -1
",
       "      -1
",
       "      0
",
       "      0
",
       "      ...
",
       "      221
",
       "      -159
",
       "      567
",
       "      380
",
       "      601
",
       "      0
",
       "      581
",
       "      1687
",
       "      1542
",
       "      0
",
       "    
",
       "    
",
       "      9
",
       "      140000
",
       "      2
",
       "      3
",
       "      1
",
       "      28
",
       "      0
",
       "      0
",
       "      2
",
       "      0
",
       "      0
",
       "      ...
",
       "      12211
",
       "      11793
",
       "      3719
",
       "      3329
",
       "      0
",
       "      432
",
       "      1000
",
       "      1000
",
       "      1000
",
       "      0
",
       "    
",
       "    
",
       "      10
",
       "      20000
",
       "      1
",
       "      3
",
       "      2
",
       "      35
",
       "      -2
",
       "      -2
",
       "      -2
",
       "      -2
",
       "      -1
",
       "      ...
",
       "      0
",
       "      13007
",
       "      13912
",
       "      0
",
       "      0
",
       "      0
",
       "      13007
",
       "      1122
",
       "      0
",
       "      0
",
       "    
",
       "  
",
       "
",
       "10 rows × 24 columns
",
       ""
      ],
      "text/plain": [
       "        X1 X2 X3 X4  X5  X6  X7  X8  X9 X10  ...     X15     X16     X17  \
",
       "1    20000  2  2  1  24   2   2  -1  -1  -2  ...       0       0       0   
",
       "2   120000  2  2  2  26  -1   2   0   0   0  ...    3272    3455    3261   
",
       "3    90000  2  2  2  34   0   0   0   0   0  ...   14331   14948   15549   
",
       "4    50000  2  2  1  37   0   0   0   0   0  ...   28314   28959   29547   
",
       "5    50000  1  2  1  57  -1   0  -1   0   0  ...   20940   19146   19131   
",
       "6    50000  1  1  2  37   0   0   0   0   0  ...   19394   19619   20024   
",
       "7   500000  1  1  2  29   0   0   0   0   0  ...  542653  483003  473944   
",
       "8   100000  2  2  2  23   0  -1  -1   0   0  ...     221    -159     567   
",
       "9   140000  2  3  1  28   0   0   2   0   0  ...   12211   11793    3719   
",
       "10   20000  1  3  2  35  -2  -2  -2  -2  -1  ...       0   13007   13912   
",
       "
",
       "      X18    X19    X20    X21    X22    X23  Y  
",
       "1       0    689      0      0      0      0  1  
",
       "2       0   1000   1000   1000      0   2000  1  
",
       "3    1518   1500   1000   1000   1000   5000  0  
",
       "4    2000   2019   1200   1100   1069   1000  0  
",
       "5    2000  36681  10000   9000    689    679  0  
",
       "6    2500   1815    657   1000   1000    800  0  
",
       "7   55000  40000  38000  20239  13750  13770  0  
",
       "8     380    601      0    581   1687   1542  0  
",
       "9    3329      0    432   1000   1000   1000  0  
",
       "10      0      0      0  13007   1122      0  0  
",
       "
",
       "[10 rows x 24 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "r4jchSRoPrgr"
   },
   "source": [
    "## Questions (15 points total)
",
    "
",
    "#### Question 1 (2 pts)
",
    "Build a classifier by using decision tree and calculate the confusion matrix. Try different hyper-parameters (at least two) and discuss the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "1Qr1SPGlPrgr"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "
",
      "Int64Index: 30000 entries, 1 to 30000
",
      "Data columns (total 24 columns):
",
      "X1     30000 non-null object
",
      "X2     30000 non-null object
",
      "X3     30000 non-null object
",
      "X4     30000 non-null object
",
      "X5     30000 non-null object
",
      "X6     30000 non-null object
",
      "X7     30000 non-null object
",
      "X8     30000 non-null object
",
      "X9     30000 non-null object
",
      "X10    30000 non-null object
",
      "X11    30000 non-null object
",
      "X12    30000 non-null object
",
      "X13    30000 non-null object
",
      "X14    30000 non-null object
",
      "X15    30000 non-null object
",
      "X16    30000 non-null object
",
      "X17    30000 non-null object
",
      "X18    30000 non-null object
",
      "X19    30000 non-null object
",
      "X20    30000 non-null object
",
      "X21    30000 non-null object
",
      "X22    30000 non-null object
",
      "X23    30000 non-null object
",
      "Y      30000 non-null object
",
      "dtypes: object(24)
",
      "memory usage: 5.7+ MB
",
      "[[14306  3261]
",
      " [ 2883  2050]]
",
      "
",
      "
",
      "[[16868   699]
",
      " [ 3316  1617]]
",
      "[[16669   898]
",
      " [ 3122  1811]]
"
     ]
    }
   ],
   "source": [
    "# YOUR CODE HERE
",
    "import matplotlib.pyplot as plt
",
    "from sklearn.tree import DecisionTreeClassifier
",
    "from sklearn.model_selection import train_test_split
",
    "from sklearn.metrics import accuracy_score,confusion_matrix
",
    "dataset.info()
",
    "dataset.describe()
",
    "# dividing data into dependent and independent variables
",
    "ind=dataset.iloc[:,0:23].values
",
    "dep=dataset.iloc[:,23:24].values
",
    "dep=dep.astype('int')
",
    "# spliting data into train and test phase
",
    "x_train,x_test,y_train,y_test=train_test_split(ind,dep,test_size=0.75,random_state=0)
",
    "# building model
",
    "tree=DecisionTreeClassifier()
",
    "tree.fit(x_train,y_train)
",
    "pred=tree.predict(x_test)
",
    "print(confusion_matrix(y_test,pred))
",
    "#changing first hyperparameter
",
    "tree=DecisionTreeClassifier(criterion="entropy",max_depth=2,min_samples_leaf=1,min_samples_split=2)
",
    "tree.fit(x_train,y_train)
",
    "pred=tree.predict(x_test)
",
    "print(type(pred))
",
    "print(type(y_test))
",
    "print(confusion_matrix(y_test,pred))
",
    "#changing second hyperparameter
",
    "tree=DecisionTreeClassifier(criterion="gini",max_depth=4,min_samples_leaf=2,min_samples_split=3)
",
    "tree.fit(x_train,y_train)
",
    "pred=tree.predict(x_test)
",
    "print(confusion_matrix(y_test,pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "QwcecRukPrgw"
   },
   "source": [
    "#### Question 2 (4 pts)
",
    "
",
    "Try to build the decision tree which you built for the previous question, but this time by RandomGrid search over hyper-parameters. Compare the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "4XHRmsWOPrgx"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "
",
      "
",
      "[[16654   913]
",
      " [ 3112  1821]]
"
     ]
    }
   ],
   "source": [
    "# YOUR CODE HERE
",
    "from sklearn.model_selection import GridSearchCV
",
    "parameters = {'criterion':('gini','entropy'),'max_depth':(2,3,4,5,6,7,8),'min_samples_leaf':(2,3,4,5,6,7,8)}
",
    "grid=GridSearchCV(DecisionTreeClassifier(),param_grid=parameters,cv=3)
",
    "grid_model=grid.fit(x_train,y_train)
",
    "grid_model.best_estimator_
",
    "tree=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
",
    "            max_features=None, max_leaf_nodes=None,
",
    "            min_impurity_decrease=0.0, min_impurity_split=None,
",
    "            min_samples_leaf=4, min_samples_split=2,
",
    "            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
",
    "            splitter='best')
",
    "tree.fit(x_train,y_train)
",
    "pred=tree.predict(x_test)
",
    "print(type(pred))
",
    "print(type(y_test))
",
    "print(confusion_matrix(y_test,pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dEvsYwiXPrg3"
   },
   "source": [
    "#### Question 3 (6 pts)
",
    "
",
    "Try to build the same classifier by using following ensemble models. For each of these models calculate accuracy and at least for two in the list below, plot the learning curves.
",
    "
",
    "* Random Forest 
",
    "* AdaBoost
",
    "* Extra Trees Classifier 
",
    "* Gradient Boosted Trees 
"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "J8S4UaKdPrg3"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "H:\Anaconda\lib\site-packages\sklearn\ensemble\forest.py:246: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
",
      "  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
",
      "H:\Anaconda\lib\site-packages\ipykernel_launcher.py:5: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
",
      "  """
"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8035111111111111
"
     ]
    },
    {
     "data": {
      "image/png":

Assignment: Decision Trees Learning outcomes · Understand how to use decision trees on a Dataset to make a prediction · Learning hyper-parameters tuning for decision trees by using RandomGrid ·...

Answer To: Assignment: Decision Trees Learning outcomes · Understand how to use decision trees on a Dataset to...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment

	X1	X2	X3	X4	X5	X6	X7	X8	X9	X10	...	X15	X16	X17	X18	X19	X20	X21	X22	X23	Y
1	20000	2	2	1	24	2	2	-1	-1	-2	...	0	0	0	0	689	0	0	0	0	1
2	120000	2	2	2	26	-1	2	0	0	0	...	3272	3455	3261	0	1000	1000	1000	0	2000	1
3	90000	2	2	2	34	0	0	0	0	0	...	14331	14948	15549	1518	1500	1000	1000	1000	5000	0
4	50000	2	2	1	37	0	0	0	0	0	...	28314	28959	29547	2000	2019	1200	1100	1069	1000	0
5	50000	1	2	1	57	-1	0	-1	0	0	...	20940	19146	19131	2000	36681	10000	9000	689	679	0
6	50000	1	1	2	37	0	0	0	0	0	...	19394	19619	20024	2500	1815	657	1000	1000	800	0
7	500000	1	1	2	29	0	0	0	0	0	...	542653	483003	473944	55000	40000	38000	20239	13750	13770	0
8	100000	2	2	2	23	0	-1	-1	0	0	...	221	-159	567	380	601	0	581	1687	1542	0
9	140000	2	3	1	28	0	0	2	0	0	...	12211	11793	3719	3329	0	432	1000	1000	1000	0
10	20000	1	3	2	35	-2	-2	-2	-2	-1	...	0	13007	13912	0	0	0	13007	1122	0	0