- Questions & Answers
- Accounting
- Computer Science
- Automata or Computationing
- Computer Architecture
- Computer Graphics and Multimedia Applications
- Computer Network Security
- Data Structures
- Database Management System
- Design and Analysis of Algorithms
- Information Technology
- Linux Environment
- Networking
- Operating System
- Software Engineering
- Big Data
- Android
- iOS
- Matlab

- Economics
- Engineering
- Finance
- Thesis
- Management
- Science/Math
- Statistics
- Writing
- Dissertations
- Essays
- Programming
- Healthcare
- Law

- Log in | Sign up

work needs to be clean and tidy and understandable.

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1\n", "\n", "\n", "This question is inspired from Exercise 3 in Chapter 5 in the textbook. \n", "\n", "On Canvas, you will see a CSV file named \"THA_diamonds.csv\". This file is a small subset of a real dataset on diamond prices in a [Kaggle competition](https://www.kaggle.com/shivam2503/diamonds). You will use this dataset for this question and the next question. \n", "\n", "**Some Background Information:** In our version of the dataset, the `price` feature has been discretized as `low`, `medium`, and `high`, and `premium`. If you are interested, these levels correspond to the following price ranges in the actual diamonds dataset:\n", "- `low` price: price between \\\\$1000 and \\\\$2000\n", "- `medium` price: price between \\\\$2000 and \\\\$3000\n", "- `high` price: price between \\\\$3000 and \\\\$3500\n", "- `premium` price: price between \\\\$3500 and \\\\$4000\n", "\n", "**Question Overview:** For this question, you will use the (unweighted) KNN algorithm for predicting the `carat` (numerical) target feature for the following single observation using the **Euclidean distance** metric with different number of neighbors:\n", "- `cut` = good\n", "- `color` = D\n", "- `depth` = 60\n", "- `price` = premium\n", "- (`carat` = 0.71 but you will pretend that you do not have this information)\n", "\n", "In practice, you would use cross-validation or train-test split for determining optimal values of KNN hyperparameters. **However, as far as this assessment is concerned, you are to use entire data for training.**\n", "\n", "\n", "### Part A (15 points)\n", "Prepare your dataset for KNN modeling. Specifically, \n", "1. Perform one-hot encoding of the categorical descriptive features in the input dataset.\n", "2. Scale your descriptive features to be between 0 and 1.\n", "3. Display the **last** 10 rows after one-hot encoding and scaling.\n", "\n", "**IMPORTANT NOTE: If your data preparation steps are incorrect, you will not get full credit for a correct follow-through.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE:** For Parts (B), (C), and (D) below, you are **not** allowed to use the `KNeighborsRegressor()` in Scikit-Learn module, but rather use manual calculations (via either Python or Excel). That is, you will need to show and explain all your solution steps **without** using Scikit-Learn. The reason for this restriction is so that you get to learn how some things work behind the scenes. \n", "\n", "### Part B (5 points)\n", "What is the prediction of the 1-KNN algorithm (i.e., k=1 in KNN) for the `carat` target feature using your manual calculations (using the Euclidean distance metric) for the single observation given above?\n", "\n", "### Part C (5 points)\n", "What is the prediction of the 5-KNN algorithm?\n", "\n", "### Part D (5 points)\n", "What is the prediction of the 10-KNN algorithm?\n", "\n", "\n", "### Part E (15 points)\n", "\n", "This part (E) is an exception to the solution mode instructions for this question. In particular, you will need to use the `KNeighborsRegressor()` in Scikit-Learn to perform the same predictions in each Part (B) to (D). That is, \n", "- What is the prediction of the 1-KNN algorithm using `KNeighborsRegressor()`?\n", "- What is the prediction of the 5-KNN algorithm using `KNeighborsRegressor()`?\n", "- What is the prediction of the 10-KNN algorithm using `KNeighborsRegressor()`?\n", "\n", "Are you able to get the same results as in your manual calculations? Please explain.\n", "\n", "\n", "### Part F: Wrap-up (5 points)\n", "\n", "**IMPORTANT NOTE: This Wrap-up section is mandatory. That is, for Parts (B) to (E) (inclusive), you will not get any points for solutions not presented in the table format explained below.**

\n", "\n", "Add and display two tables called **\"df_summary_manual\"** and **\"df_summary_sklearn\"** respectively:\n", "- For the table **\"df_summary_manual\"**, you will report your results for Parts (B) to (D) using your manual calculations.\n", "- For the table **\"df_summary_sklearn\"**, you will report your results for the 3 predictions in Part (E) using `KNeighborsRegressor()`.\n", "\n", "\n", "Each of these tables need to have the following 3 columns:\n", "- method\n", "- prediction for the observation given (to be rounded to 3 decimal places)\n", "- is_best (True or False - only the best prediction's is_best flag needs to be True and all the others need to be False)\n", "\n", "Your table needs to have 3 rows (one for each method) in each table that summarizes your results. These tables should look like below:\n", "\n", "|method | prediction | is_best |\n", "|---|---|---\n", "|1-KNN | ? | ? | ? |\n", "|5-KNN | ? | ? | ? |\n", "|10-KNN | ? | ? | ? |\n", "\n", "In case of a Pandas data frame, you can populate this data frame line by line by referring to Cell #6 in our [Pandas tutorial](https://www.featureranking.com/tutorials/python-tutorials/pandas/).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This question is inspired from Exercise 3 in Chapter 4 in the textbook. \n", "\n", "You will use the same CSV file as in Question 1 named \"THA_diamonds.csv\". You will build a simple decision tree with **depth 1** using this dataset for predicting the `price` (categorical) target feature using the **Entropy** split criterion. \n", "\n", "To clarify, for Question 1, your target feature will be `carat` whereas for this Question 2, your target feature will be `price`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part A (10 points)\n", "\n", "The dataset for this question has 2 numerical descriptive features, `carat` and `depth`. \n", "1. Discretize these 2 features separately as \"category_1\", \"category_2\", and \"category_3\" respectively using the *equal-frequency binning* technique. \n", "2. Display the first 10 rows after discretization of these two features.\n", "\n", "After this discretization, all features in your dataset will be categorical (which we will assume to be **\"nominal categorical\"**). \n", "\n", "For this question, please do **NOT** perform any one-hot-encoding of the categorical descriptive features nor any scaling. Also, please do **NOT** perform any train-test splits.\n", "\n", "**IMPORTANT NOTE: If your discretizations are incorrect, you will not get full credit for a correct follow-through.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part B (5 points)\n", "\n", "Compute the impurity of the `price` target feature." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part C (20 points)\n", "\n", "**IMPORTANT NOTE: For Parts C and D below, you will not get any points for solutions not presented in the required table format.**

\n", "\n", "In this part, you will determine the root node for your decision tree.\n", "\n", "Your answer to this part needs to be a table and it needs to be called **\"df_splits\"**. Also, it needs to have the following 4 columns:\n", "- split\n", "- remainder\n", "- info_gain\n", "- is_optimal (True or False - only the optimal split's is_optimal flag needs to be True and the others need to be False)\n", "\n", "In your **\"df_splits\"** table, you should have **one row for each descriptive feature in the dataset**. As an example for your **\"df_splits\"** table, consider the `spam prediction` example in Table 4.2 in the textbook (**FIRST** Edition) on page 121, which was also covered in lectorials. The `df_splits` table would look something like the table below.\n", "\n", "|split| remainder | info_gain| is_optimal |\n", "|---|---|---|---|\n", "|suspicious words | ? | ? | True |\n", "|unknown sender | ? | ? | False |\n",

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1\n", "\n", "\n", "This question is inspired from Exercise 3 in Chapter 5 in the textbook. \n", "\n", "On Canvas, you will see a CSV file named \"THA_diamonds.csv\". This file is a small subset of a real dataset on diamond prices in a [Kaggle competition](https://www.kaggle.com/shivam2503/diamonds). You will use this dataset for this question and the next question. \n", "\n", "**Some Background Information:** In our version of the dataset, the `price` feature has been discretized as `low`, `medium`, and `high`, and `premium`. If you are interested, these levels correspond to the following price ranges in the actual diamonds dataset:\n", "- `low` price: price between \\\\$1000 and \\\\$2000\n", "- `medium` price: price between \\\\$2000 and \\\\$3000\n", "- `high` price: price between \\\\$3000 and \\\\$3500\n", "- `premium` price: price between \\\\$3500 and \\\\$4000\n", "\n", "**Question Overview:** For this question, you will use the (unweighted) KNN algorithm for predicting the `carat` (numerical) target feature for the following single observation using the **Euclidean distance** metric with different number of neighbors:\n", "- `cut` = good\n", "- `color` = D\n", "- `depth` = 60\n", "- `price` = premium\n", "- (`carat` = 0.71 but you will pretend that you do not have this information)\n", "\n", "In practice, you would use cross-validation or train-test split for determining optimal values of KNN hyperparameters. **However, as far as this assessment is concerned, you are to use entire data for training.**\n", "\n", "\n", "### Part A (15 points)\n", "Prepare your dataset for KNN modeling. Specifically, \n", "1. Perform one-hot encoding of the categorical descriptive features in the input dataset.\n", "2. Scale your descriptive features to be between 0 and 1.\n", "3. Display the **last** 10 rows after one-hot encoding and scaling.\n", "\n", "**IMPORTANT NOTE: If your data preparation steps are incorrect, you will not get full credit for a correct follow-through.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE:** For Parts (B), (C), and (D) below, you are **not** allowed to use the `KNeighborsRegressor()` in Scikit-Learn module, but rather use manual calculations (via either Python or Excel). That is, you will need to show and explain all your solution steps **without** using Scikit-Learn. The reason for this restriction is so that you get to learn how some things work behind the scenes. \n", "\n", "### Part B (5 points)\n", "What is the prediction of the 1-KNN algorithm (i.e., k=1 in KNN) for the `carat` target feature using your manual calculations (using the Euclidean distance metric) for the single observation given above?\n", "\n", "### Part C (5 points)\n", "What is the prediction of the 5-KNN algorithm?\n", "\n", "### Part D (5 points)\n", "What is the prediction of the 10-KNN algorithm?\n", "\n", "\n", "### Part E (15 points)\n", "\n", "This part (E) is an exception to the solution mode instructions for this question. In particular, you will need to use the `KNeighborsRegressor()` in Scikit-Learn to perform the same predictions in each Part (B) to (D). That is, \n", "- What is the prediction of the 1-KNN algorithm using `KNeighborsRegressor()`?\n", "- What is the prediction of the 5-KNN algorithm using `KNeighborsRegressor()`?\n", "- What is the prediction of the 10-KNN algorithm using `KNeighborsRegressor()`?\n", "\n", "Are you able to get the same results as in your manual calculations? Please explain.\n", "\n", "\n", "### Part F: Wrap-up (5 points)\n", "\n", "**IMPORTANT NOTE: This Wrap-up section is mandatory. That is, for Parts (B) to (E) (inclusive), you will not get any points for solutions not presented in the table format explained below.**

\n", "\n", "Add and display two tables called **\"df_summary_manual\"** and **\"df_summary_sklearn\"** respectively:\n", "- For the table **\"df_summary_manual\"**, you will report your results for Parts (B) to (D) using your manual calculations.\n", "- For the table **\"df_summary_sklearn\"**, you will report your results for the 3 predictions in Part (E) using `KNeighborsRegressor()`.\n", "\n", "\n", "Each of these tables need to have the following 3 columns:\n", "- method\n", "- prediction for the observation given (to be rounded to 3 decimal places)\n", "- is_best (True or False - only the best prediction's is_best flag needs to be True and all the others need to be False)\n", "\n", "Your table needs to have 3 rows (one for each method) in each table that summarizes your results. These tables should look like below:\n", "\n", "|method | prediction | is_best |\n", "|---|---|---\n", "|1-KNN | ? | ? | ? |\n", "|5-KNN | ? | ? | ? |\n", "|10-KNN | ? | ? | ? |\n", "\n", "In case of a Pandas data frame, you can populate this data frame line by line by referring to Cell #6 in our [Pandas tutorial](https://www.featureranking.com/tutorials/python-tutorials/pandas/).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This question is inspired from Exercise 3 in Chapter 4 in the textbook. \n", "\n", "You will use the same CSV file as in Question 1 named \"THA_diamonds.csv\". You will build a simple decision tree with **depth 1** using this dataset for predicting the `price` (categorical) target feature using the **Entropy** split criterion. \n", "\n", "To clarify, for Question 1, your target feature will be `carat` whereas for this Question 2, your target feature will be `price`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part A (10 points)\n", "\n", "The dataset for this question has 2 numerical descriptive features, `carat` and `depth`. \n", "1. Discretize these 2 features separately as \"category_1\", \"category_2\", and \"category_3\" respectively using the *equal-frequency binning* technique. \n", "2. Display the first 10 rows after discretization of these two features.\n", "\n", "After this discretization, all features in your dataset will be categorical (which we will assume to be **\"nominal categorical\"**). \n", "\n", "For this question, please do **NOT** perform any one-hot-encoding of the categorical descriptive features nor any scaling. Also, please do **NOT** perform any train-test splits.\n", "\n", "**IMPORTANT NOTE: If your discretizations are incorrect, you will not get full credit for a correct follow-through.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part B (5 points)\n", "\n", "Compute the impurity of the `price` target feature." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part C (20 points)\n", "\n", "**IMPORTANT NOTE: For Parts C and D below, you will not get any points for solutions not presented in the required table format.**

\n", "\n", "In this part, you will determine the root node for your decision tree.\n", "\n", "Your answer to this part needs to be a table and it needs to be called **\"df_splits\"**. Also, it needs to have the following 4 columns:\n", "- split\n", "- remainder\n", "- info_gain\n", "- is_optimal (True or False - only the optimal split's is_optimal flag needs to be True and the others need to be False)\n", "\n", "In your **\"df_splits\"** table, you should have **one row for each descriptive feature in the dataset**. As an example for your **\"df_splits\"** table, consider the `spam prediction` example in Table 4.2 in the textbook (**FIRST** Edition) on page 121, which was also covered in lectorials. The `df_splits` table would look something like the table below.\n", "\n", "|split| remainder | info_gain| is_optimal |\n", "|---|---|---|---|\n", "|suspicious words | ? | ? | True |\n", "|unknown sender | ? | ? | False |\n",

Answered 4 days AfterMay 06, 2021

SOLUTION.PDF## Answer To This Question Is Available To Download

- Everything is in the document. I started a chat with a person and it disappeared and they haven't responded. If you have any questions I'd be happy to try to answer but the file is self explanatory....Aug 28, 2024
- THE PANCAKE PROBLEM (100 Points)A messy cook has adisordered stack of 10 differently sized pancakes [size from 1 to 10] and aspatula thatcan be inserted at any point in the stack and used to flip all...Jun 26, 2024
- CST8390 Assignment #3 – Data Analysis and Visualization [30%]This assignment relates to the following Course Learning Requirements (CLR). CLR 9: Interpreting and reporting results, presenting...SolvedApr 04, 2024
- CST8390 Assignment #2 – Statistics and Data Analysis [25%]This assignment relates to the following Course Learning Requirements (CLR). CLR 4: Explain the fundamental concepts of data mining and...SolvedMar 06, 2024
- CST8390 Assignment #1 – Data Extraction, Preprocessing, and Analysis [20%]This assignment relates to the following Course Learning Requirements (CLR). CLR 6: Explain basic statistical...SolvedFeb 09, 2024
- 11/20 11/20 11/20 This is a special type of LED light, used even during the day. The Principal wants to find an alternative means of pressing so that as many consecutive light bulbs as possible can be...Dec 04, 2023
- CS 6320.001 - Natural Language Processing - F23Take Test: Homework 3CS 6320.001 - Natural Language Processing - F23 AssignmentsTake Test: Homework 3 Test...SolvedNov 06, 2023
- code in python Write a detailed annotated bibliography for at least 10 bibliographic (find 6 more )referencesfrom scholarly peer-reviewed academic journals about Convolutional Neural Networks...Oct 19, 2023
- Classifying Ships from Automatic Indentification System (AIS) Data — Data Mining (pantelis.github.io)This is the assignment. I need a google colab file that works and does every coding part that is...SolvedOct 06, 2023
- Hello,Its an assignment on Natural Language Processing, I couldn't find this field in Computer Science section. I am attaching the zip file here which has all the required files. In that zip file...SolvedSep 14, 2023

Copy and Paste Your Assignment Here

Copyright © 2024. All rights reserved.