CISC 3440 Fall 2021 Lab 4 Description Overview: Prepare a jupyter notebook showing the end to end process for analyzing a dataset with ML algorithms. Dataset selection: You may go at your own speed...

1 answer below »
https://github.com/ageron/handson-ml2/blob/master/05_support_vector_machines.ipynb
https://github.com/ageron/handson-ml2/blob/master/06_decision_trees.ipynb
https://colab.research.google.com/drive/1M7X18ZPq5N6Xo6JctdY80jlSh-dNugFb?usp=sharing







CISC 3440 Fall 2021 Lab 4 Description Overview: Prepare a jupyter notebook showing the end to end process for analyzing a dataset with ML algorithms. Dataset selection: You may go at your own speed with either easy or challenging. · Easy[footnoteRef:0]: Run the code examples from the textbook using the IRIS dataset and Logistic Regression, Support Vector Machines, and Decision Trees algorithms. Please include citations to the source you are copying from. [0: See note below.] · Challenging Pick a dataset you have not used before. · Scikit Learn Datasets https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets · TensorFlow catalog of data https://www.tensorflow.org/datasets/catalog Algorithms: The suggested theme is classifiers, since we worked on regression for the last lab. The choice of algorithm depends on the problem, so consider this after you’ve selected a dataset. Minimal Requirements[footnoteRef:1] in your submission: [1: One factor in determining grades is the suitability of the project completed. It is possible to complete this lab satisfactorily and receive a grade below ‘A’ level (90-100%), if the work shown is a simple application of elementary techniques.] Use a jupyter notebook (i.e., local version saved to GitHub or Google colab). Within the notebook, include the following elements: · Introduction paragraph to the dataset and overarching data analysis question, such as: · Where did you get the data? And how? What are features available with the data? · What is the question you want to answer through analysis? · What preparation of data did you do, if any? And how? · Use three different algorithms from Scikit-Learn · Paragraph briefly explaining algorithm and another paragraph on results of each model · Plot the data and results · Annotate the chart to help the reader understand what you’re showing · Comments explaining each line of code · A concluding paragraph that describes your ranking of results from good to worse · Final reflective paragraph(s) describing the experience and what you’ve learned. · Citations Due Date: Monday, October 18, 2021 before class Submit your final notebook URL to this form https://forms.gle/WdJhpQT3EtjjchMk9 A gallery of interesting Jupyter Notebooks
Answered 1 days AfterOct 16, 2021

Answer To: CISC 3440 Fall 2021 Lab 4 Description Overview: Prepare a jupyter notebook showing the end to end...

Sathishkumar answered on Oct 18 2021
102 Votes
solutions/code/Decision_Tree.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "371c6b02",
"metadata": {},
"outputs": [],
"source": [
"#Decision Tree Classifier for Iris Dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "72457dc4",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import load_iris\n",
"from sklearn import tree\n",
"iris = load_iris()\n",
"X, y = iris.data, iris.target\n",
"clf = tree.DecisionTreeClassifier()\n",
"clf = clf.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ee3302ae",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"[Text(167.4, 199.32, 'X[3] <= 0.8\\ngini = 0.667\\nsamples = 150\\nvalue = [50, 50, 50]'),\n",
" Text(141.64615384615385, 163.07999999999998, 'gini = 0.0\\nsamples = 50\\nvalue = [50, 0, 0]'),\n",
" Text(193.15384615384616, 163.07999999999998, 'X[3] <= 1.75\\ngini = 0.5\\nsamples = 100\\nvalue = [0, 50, 50]'),\n",
" Text(103.01538461538462, 126.83999999999999, 'X[2] <= 4.95\\ngini = 0.168\\nsamples = 54\\nvalue = [0, 49, 5]'),\n",
" Text(51.50769230769231, 90.6, 'X[3] <= 1.65\\ngini = 0.041\\nsamples = 48\\nvalue = [0, 47, 1]'),\n",
" Text(25.753846153846155, 54.359999999999985, 'gini = 0.0\\nsamples = 47\\nvalue = [0, 47, 0]'),\n",
" Text(77.26153846153846, 54.359999999999985, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 0, 1]'),\n",
" Text(154.52307692307693, 90.6, 'X[3] <= 1.55\\ngini = 0.444\\nsamples = 6\\nvalue = [0, 2, 4]'),\n",
" Text(128.76923076923077, 54.359999999999985, 'gini = 0.0\\nsamples = 3\\nvalue = [0, 0, 3]'),\n",
" Text(180.27692307692308, 54.359999999999985, 'X[0] <= 6.95\\ngini = 0.444\\nsamples = 3\\nvalue = [0, 2, 1]'),\n",
" Text(154.52307692307693, 18.119999999999976, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 2, 0]'),\n",
" Text(206.03076923076924, 18.119999999999976, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 0, 1]'),\n",
" Text(283.2923076923077, 126.83999999999999, 'X[2] <= 4.85\\ngini = 0.043\\nsamples = 46\\nvalue = [0, 1, 45]'),\n",
" Text(257.53846153846155, 90.6, 'X[1] <= 3.1\\ngini = 0.444\\nsamples = 3\\nvalue = [0, 1, 2]'),\n",
" Text(231.7846153846154, 54.359999999999985, 'gini = 0.0\\nsamples = 2\\nvalue = [0, 0, 2]'),\n",
" Text(283.2923076923077, 54.359999999999985, 'gini = 0.0\\nsamples = 1\\nvalue = [0, 1, 0]'),\n",
" Text(309.04615384615386, 90.6, 'gini = 0.0\\nsamples = 43\\nvalue = [0, 0, 43]')]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAADnCAYAAAC9roUQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAA4+0lEQVR4nO3de3xU1b3w/8+CBAeacktCIgQIhQoUOQgECRQqNyMKPQ9IpD3Kw6v04ZHCCT8jIOIlFKQqPBkgNOA5HEODoEA0PT3mYJWAl4KlAbkZUJF6IIYQMkLIhUqGmYT1+2MyQwJJyGVmzyXf9+s1L2Y2+7K+WWvWrL322msrrTVCCCGM0cbbCRBCiNZEKl0hhDCQVLpCCGEgqXSFEMJAUukKIYSBpNIVQggDSaUrhBAGkkpXCCEMJJWuEEIYSCpdIYQwUJC3EyACT/v27YusVmuEt9PhDiaTyVJRURHp7XSIwKFk7gXhbkopHSjlSimF1lp5Ox0icEj3ghBCGEgqXSGEMJD06QqvSk9PZ/z48bzxxhv07t2byMhI7r77bj766CO6devG8OHD2bVrFytWrKi1XVVVFW3btq1zn5mZmXzzzTeMHj2an/3sZ2itSUpKorS0lJUrVxIaGmpAZELUTVq6wqtmz57N008/zWOPPQZATEwMQ4YMoaSkhBs3bjBgwABCQkJc6xcUFLBhwwZSUlIASElJISUlhY0bN7rW+fbbb1m2bBmfffaZa5nFYqG8vJyOHTsaE5gQ9ZBKV3hVeXk5ISEhXLlypdbyl156ifLy8tvWX7x4MT169CAxMbHRx7h27RqTJk3i8ccf54svvmhpkoVoEal0hVelpaWxefNmPv74Y5wjHj744ANWrlyJyWS6bf2MjAzuvfdeUlNTAUhMTCQxMZGEhATXOj179mT16tXExMSQmZlJUFAQ+/bt47333iM6OtqQuISojwwZE27X3CFjH330EQATJkxwLTt9+jRHjx7liSeecFv6mkKGjAl3kwtpwmdUVlYSFxfn+lxcXMz333/fYIW7bt06iouLWbRoEaGhoVy7do3169fTr18/fv7zn7veDx06lD//+c8cPnyYHTt2GBGOEHWSSld4VVpaGna7nYKCArp06UK7du3Yv38/FRUVzJs3j3PnzjF8+HAKCgrIzMwEIDo6mmnTpgGgtebJJ5/kk08+YcaMGezduxer1Yrdbq/1/p577iEiIoKgICnywrukT1d4lcViYf78+bRpc7MoTp48udnDuux2O+PHjycvL6/We4CsrCx+/vOfuyPZQjSb/OwLr+rWrRuvvfYaVVVVrmU1K2CnqKioOkcsaK3ZvHkzixYtIjMzk3HjxvHyyy8TEhJS6z04hpL17t3bY7EI0RhyIU24XVMupJ08eZLs7Gz69+/P1KlTPZyyppMLacLdpNIVbicT3ghRP+nTFX7BbDY3a7ulS5e6LsBNnz6dN998E7vdzrJly1i1apU7kyhEo0ifrjBcWloa169fJz4+noyMDGw2G2FhYRQUFGCxWOjcuTNxcXFs376dUaNG0aNHDwBOnDhBVlYWnTp1IiIigpKSEmbOnEloaCg5OTnk5OQAMGbMGGJiYgBYsGABR44cASA0NJR//OMffP755zz88MPk5eVRXFwsczEIQ0lLVxiuV69eXL16FavVilLKNbpg7ty5dO/eneXLl5Obm0tkZCRz5sxxVZr79u0jKioKm81Gnz59KCsr48aNG40+blpaGmVlZVRUVKCU9BgI75CWrjBcWVkZNpuN/Px8goODsdlsAAQFBREcHOzsR8VisbBu3TqGDRvGqVOnmDhxIrt372bgwIGUlJQQFBSExWIhPDyc2NhYYmNjbzvWO++8w5dffsmoUaPYvn07RUVFjBw5kqSkJDp06CCtXGE4uZAm3M5dF9LMZjNLlixxQ4qaTy6kCXeTSle4nYxeEKJ+0qcrDNfckQiJiYkUFRWxcOFCUlJSsFqtrFu3jhdeeIHi4uJa67ZkHbPZ7OpHFsLdpNIVHmM2m6msrGTTpk3s3LmThQsX8o9//MP1f85/MzIyWLNmDVlZWYBjjl3n5OTbtm1z7S8qKorIyEjCw8MpKytz9f06516oqSXrOEc+COEJUukKj4mIiCAjI4Nx48ZRXl6OyWTi7NmztdapqqoiJyeHiIgIysrKGrXf5cuX89BDD7mmgnSy2+1uWUcIT5JKV3jMlClT2LJlC4MGDaKoqAittWuIV9euXV1DuEaOHElpaSl9+/YFoGPHjq7JyWfPnn3bflNTU9m+fTtDhgxxzb3wwAMPuCY2b+k6QniSXEgTbuepC2kbN24kPj6eyMjIOv//8uXLhIWFNbiPxqyzdetWxo4dS9++feVCmnA7qXSF28noBSHqJzdHCLczmUwWpVSEt9PhDiaTyeLtNIjAIi1d4ROU477c/wK+1lovdeN+fwGsBIZrrb93136FaC6pdIVPUErNA+YBo7TW19287zeACq31b9y5XyGaQypd4XVKqQHAAeBnWuuvPLD/jsBxYJHW+l1371+IppBKV3iVUqod8Dfgda31v3vwOKOB/wQKgQla61JPHUuIhsg4XeFtLwEXgM0ePk44UA78pPolhFdIpSu8QinVTik1HvjfwP8xYIzZn4ENOMr8BA8fS4h6SfeCMFz1SIVCwA48qbX+wMBjBwFVATOQWPgdaekKb7gb6AYo4J+MPLDWulIqXOFNcnOE8IYBwHXgd8AWL6dFCENJ94Lwa+3bty+yWq1+ffebyWSyVFRU1D2hhAg4UukKvxYI8zzI/A6ti/TpCiGEgaTSbaXat29fpJTS/vZq3759UVNjTU9PJy8vj5UrV7J161Y++OADPv/8c9avX89bb73F6dOnWbFixW3bVVVV1bvPgwcPMmvWLNfna9eukZKSwpQpU7h69SrTp0/nzTffbGpSRSsgF9JaKavVGuGPp+XNmb1s9uzZxMfH8/LLL3P48GFiYmIICwvjj3/8I2FhYQwYMICQkBDX+gUFBfzxj3+ksrKSxYsXk5KSAjgeEZ+QkADA6NGjOXjwoGubDh06kJiYyNWrV/nhD39IaGio69FEQtQkLV3RJNnZ2bU+FxcXc/To0Qa3ufXBkF999RUrVqwgPT3dY+msqby8nJCQEK5cuVJr+UsvvUR5eflt6y9evJgePXqQmJjYpOPk5+cTHR0N4HoqRkVFRXOTLQKUtHTFHaWlpWG32ykoKKBLly60a9eO/fv3U1FRwbx58zh37hzDhw+noKCAzMxMAKKjo5k2bRpArYc+zpgxgz179pCUlORqQRqR/s2bN7N27VqioqIA+OCDDzh06JDrc00ZGRmcPn2a1NRU12ODbvXFF19w4MABhg0bxpUrV4iPjycrK4tZs2ZRXFzM66+/TlFREe3bt/d0eMLPSKUr7shisfDCCy+QlJTkWjZ58mT279/vxVQ13jPPPANAUlISH330Ebm5uUyePJnJkycDcPr0ae6+++5a2wwYMIABAwbUu89Bgwbx7ru1Jyxzdj0ALFu2zF3JFwFGKl1xR926deO1116rdWGpTZvbe6aioqLqbBU6H/q4aNEiMjMziYuLY9WqVfTs2dOTya7ThAm1p13Izs4mLi7OVcEWFxeTl5fH8OHD693HunXrKC4uZtGiRYSGhvLJJ5/w3nvvMWXKFMaNG+fJ5IsAION0W6mmjG89efIk2dnZ9O/fn6lTp3o4ZQ27dUxrc8bp3tpdEhMTU6u75MiRI8THx9fbXbJ27Vri4+M5cuQIM2bM4NChQ7z77ruMGzeOuLi4FsckAptcSBN3NHjwYBYvXuz1CtddLBYL8+fPr9Vanzx5MqGhoc3a38iRI3nllVf461//6q4kigAm3QvCLcxmM0uWLGnydtOnT
\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"tree.plot_tree(clf)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e4c0bec",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ea394979",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|--- petal width (cm) <= 0.80\n",
"| |--- class: 0\n",
"|--- petal width (cm) > 0.80\n",
"| |--- petal width (cm) <= 1.75\n",
"| | |--- class: 1\n",
"| |--- petal width (cm) > 1.75\n",
"| | |--- class: 2\n",
"\n"
]
}
],
"source": [
"from sklearn.datasets import load_iris\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.tree import export_text\n",
"iris = load_iris()\n",
"decision_tree = DecisionTreeClassifier(random_state=0, max_depth=2)\n",
"decision_tree = decision_tree.fit(iris.data, iris.target)\n",
"r = export_text(decision_tree, feature_names=iris['feature_names'])\n",
"print(r)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "972d396e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
solutions/code/Logistic_Regression.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "fc4ac432",
"metadata": {},
"outputs": [],
"source": [
"#Logistic Regression for Iris Dataset"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "da03addd",
"metadata": {},
"outputs": [],
"source": [
"# import necessary packages\n",
"\n",
"import numpy as np # linear algebra\n",
"import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn import datasets\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "35371856",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename']\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
":8: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.\n",
"Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations\n",
" y = (iris[\"target\"]==2).astype(np.int)\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"iris = datasets.load_iris()\n",
"\n",
"print(list(iris.keys()))\n",
"\n",
"\n",
"\n",
"X = iris[\"data\"][:,3:] # petal width\n",
"y = (iris[\"target\"]==2).astype(np.int)\n",
"log_reg = LogisticRegression(penalty=\"l2\")\n",
"log_reg.fit(X,y)\n",
"\n",
"X_new = np.linspace(0,3,1000).reshape(-1,1)\n",
"y_proba = log_reg.predict_proba(X_new)\n",
"\n",
"plt.plot(X,y,\"b.\")\n",
"plt.plot(X_new,y_proba[:,1],\"g-\",label=\"Iris-Virginica\")\n",
"plt.plot(X_new,y_proba[:,0],\"b--\",label=\"Not Iris-Virginca\")\n",
"plt.xlabel(\"Petal width\", fontsize=14)\n",
"plt.ylabel(\"Probability\", fontsize=14)\n",
"plt.legend(loc=\"upper left\", fontsize=14)\n",
"plt.show()\n",
"\n",
"xx=log_reg.predict([[1.7],[1.5]])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "bceae04e",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"iris = datasets.load_iris()\n",
"\n",
"X = iris[\"data\"][:,(2,3)] # petal length, petal width\n",
"y = iris[\"target\"]\n",
"\n",
"softmax_reg = LogisticRegression(multi_class=\"multinomial\", solver=\"lbfgs\", C=5)\n",
"mdl=softmax_reg.fit(X,y)\n",
"print(mdl)\n",
"X_new = np.linspace(0,3,1000).reshape(-1,2)\n",
"\n",
"plt.plot(X[:, 0][y==1], X[:, 1][y==1], \"y.\", label=\"Iris-Versicolor\")\n",
"plt.plot(X[:, 0][y==0], X[:, 1][y==0], \"b.\", label=\"Iris-Setosa\")\n",
"\n",
"plt.legend(loc=\"upper left\", fontsize=14)\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "af18768b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
solutions/code/SVM.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "4b2a3b51",
"metadata": {},
"outputs": [],
"source": [
"#Importing Iris dataset from Scikit-Learn"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "631e90b9",
"metadata": {},
"outputs": [],
"source": [
"# Required Packages\n",
"from sklearn import datasets\t\t# To Get iris dataset\n",
"from sklearn import svm \t\t\t# To fit the svm classifier\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt # To visuvalizing the data"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d79979f9",
"metadata": {},
"outputs": [],
"source": [
"# import iris data to model Svm classifier\n",
"iris_dataset = datasets.load_iris()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "59b42d2b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iris data set Description :: .. _iris_dataset:\n",
"\n",
"Iris plants dataset\n",
"--------------------\n",
"\n",
"**Data Set Characteristics:**\n",
"\n",
" :Number of Instances: 150 (50 in each of three classes)\n",
" :Number of Attributes: 4 numeric, predictive attributes and the class\n",
" :Attribute Information:\n",
" - sepal length in cm\n",
" - sepal width in cm\n",
" - petal length in cm\n",
" - petal width in cm\n",
" - class:\n",
" - Iris-Setosa\n",
" - Iris-Versicolour\n",
" - Iris-Virginica\n",
" \n",
" :Summary Statistics:\n",
"\n",
" ============== ==== ==== ======= ===== ====================\n",
" Min Max Mean SD Class Correlation\n",
" ============== ==== ==== ======= ===== ====================\n",
" sepal length: 4.3 7.9 5.84 0.83 0.7826\n",
" sepal width: 2.0 4.4 3.05 0.43 -0.4194\n",
" petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\n",
" petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)\n",
" ============== ==== ==== ======= ===== ====================\n",
"\n",
" :Missing Attribute Values: None\n",
" :Class Distribution: 33.3% for each of 3 classes.\n",
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here