{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# STOR 120: Take Home Midterm 1\n", "\n", "**Due:** Friday, September 17, 9:05 am on Gradescope\n", " \n", "**Directions:** The...

Here is the Midterm Assignment, but I have two other documents to send you to download that you need for the assignment!


{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# STOR 120: Take Home Midterm 1\n", "\n", "**Due:** Friday, September 17, 9:05 am on Gradescope\n", " \n", "**Directions:** The exam is open book, notes, course materials, internet, and all things that are not direct communication with others. Just as with all course assignments, you will submit exams to Gradescope as Jupyter Notebooks with the ipynb file extension. To receive full credit, you should show all of your code used to answer each question." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Data:** The dataset used on this exam contains information on the number of students that majored in different topics of study at universities in the United States in 2019 and is broken down by age group, sex, and state. The original source of the data is the US Census Bureau, but this dataset was found on [Kaggle.com](https://www.kaggle.com/tjkyner/bachelor-degree-majors-by-age-sex-and-state)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Run the cell below to import the dataset.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "plots.style.use('fivethirtyeight')\n", "\n", "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)\n", "\n", "edu = Table.read_table('Bachelor_Degree_Majors.csv')\n", "edu.take(np.arange(5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 (6 Points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before analayzing the data, we will first need to clean the data. Start by creating a new table named **edu_clean** with the following changes:\n", "- Rename the variables\n", " - \"Bachelor's Degree Holders\" should become \"Total\"\n", " - \"Science and Engineering\" should become \"ScEn\"\n", " - \"Science and Engineering Related Fields\" should become \"ScEn Rel\"\n", " - \"Arts, Humanities and Others\" should become \"Other\"\n", "- Remove all observations where *Sex* is equal to *Total*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "edu_clean = ...\n", "\n", "\n", "edu_clean #Do Not Change this Line" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 (10 Points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, there is a problem. We have several variables in our dataset that are numeric, but the presence of commas will prevent us from answering key questions about the data. Notice the following python code which can convert the string \"1,030,452\" to an integer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "int(str.replace(\"1,030,452\",\",\",\"\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next two code blocks, create a function named **str_to_int** and then use the method **apply** on the last 6 variables of the data in **edu_clean**. Create a new table named **edu_clean_int** that is similar to **edu_clean**, except that the last six variables are *int* arrays and not *str* arrays. You will not get full credit if you do not create a function or do not use the **apply** method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Put your function here\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Apply your function here\n", "edu_clean_int = ...\n", "\n", "edu_clean_int #Do Not Change this Line" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If you cannot figure out how to do this, run this code.\n", "# This code will load the cleaned version of the file.\n", "# You will need this to work if you want to complete the exam.\n", "# The rest of the exam depends on getting this part correct.\n", "# Only do this if you cannot figure this part out.\n", "# Uncomment the next two lines (remove # sign) and run code.\n", "\n", "\n", "#edu_clean_int = Table.read_table('Bachelors_Clean.csv')\n", "#edu_clean_int" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3 (14 Points Total)\n", "\n", "#### 3.1 (6 Points)\n", "Create a table called **Bachelors_by_Sex** based off **edu_clean_int** which has only two variables labeled \"Male\" and \"Female\". Each row in this table should contain the total number of bachelor's degrees based on people 25 and older for males and females for each state. In other words, every row is for a different state, but the name of the state should not be in **Bachelors_by_Sex**. Sort this table by \"Female\" in descending order." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Bachelors_by_Sex = ...\n", "\n", "Bachelors_by_Sex #Do Not Change this Line" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2 (4 Points)\n", "Create a scatter plot that shows the relationship between the two numeric variables in **Bachelors_by_Sex**. Add a title to the plot called \"Total Bachelor's Degrees\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3 (4 Points)\n", "In the scatterplot, you will notice that there is one state that has an unusually large amount of people with bachelor's degrees. Use the table methods you learned in class on the table **edu_clean_int** to identify this state and print out a table below that only shows this one state and the total number of bachelor's degrees for male and female based on people 25 and older (Female and Male). This table should contain one state and two rows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4 (12 Points)\n", "\n", "#### 4.1 (6 Points)\n", "In order to make comparisons across states, it may be helpful to convert our counts to proportions. We want to compare states based on the proportions of bachelor's degrees that are \"ScEn\" and \"ScEn Rel\" for people in the \"25 to 39\" age group. Create a table named **young_prop** based on **edu_clean_int** by subsetting the data based off the mentioned age group, removing the variables named \"Age Group\", \"Business\", \"Education\", and \"Other\", creating a new variable called \"STEM Proportion\" which calculates the proportion of total bachelor's degrees that are either \"ScEn\" or \"ScEn Rel\" for Males and Females and finally, removing the variables \"Total\", \"ScEn\" and \"ScEn Rel\". Your final table named **young_prop** should contain three variables: \"State\", \"Sex\", and \"STEM Proportion\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "young_prop = ...\n", "\n", "young_prop #Do Not
Sep 24, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here