{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# STOR 120: Take Home Midterm 1\n", "\n", "**Due:** Friday, September 17, 9:05 am on Gradescope\n", " \n", "**Directions:** The...

1 answer below »
I need this assignment completed for my Foundations of Statistics and Data Science class. It's coding, so you will need JupyterLab or Jupyter Notebook (which use Python I believe). Also, I will attach two files that you need to download in order to complete the required datasets for the assignment. Thank you!


{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# STOR 120: Take Home Midterm 1\n", "\n", "**Due:** Friday, September 17, 9:05 am on Gradescope\n", " \n", "**Directions:** The exam is open book, notes, course materials, internet, and all things that are not direct communication with others. Just as with all course assignments, you will submit exams to Gradescope as Jupyter Notebooks with the ipynb file extension. To receive full credit, you should show all of your code used to answer each question." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Data:** The dataset used on this exam contains information on the number of students that majored in different topics of study at universities in the United States in 2019 and is broken down by age group, sex, and state. The original source of the data is the US Census Bureau, but this dataset was found on [Kaggle.com](https://www.kaggle.com/tjkyner/bachelor-degree-majors-by-age-sex-and-state)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Run the cell below to import the dataset.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "plots.style.use('fivethirtyeight')\n", "\n", "import warnings\n", "warnings.simplefilter('ignore', FutureWarning)\n", "\n", "edu = Table.read_table('Bachelor_Degree_Majors.csv')\n", "edu.take(np.arange(5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 (6 Points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before analayzing the data, we will first need to clean the data. Start by creating a new table named **edu_clean** with the following changes:\n", "- Rename the variables\n", " - \"Bachelor's Degree Holders\" should become \"Total\"\n", " - \"Science and Engineering\" should become \"ScEn\"\n", " - \"Science and Engineering Related Fields\" should become \"ScEn Rel\"\n", " - \"Arts, Humanities and Others\" should become \"Other\"\n", "- Remove all observations where *Sex* is equal to *Total*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "edu_clean = ...\n", "\n", "\n", "edu_clean #Do Not Change this Line" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2 (10 Points)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, there is a problem. We have several variables in our dataset that are numeric, but the presence of commas will prevent us from answering key questions about the data. Notice the following python code which can convert the string \"1,030,452\" to an integer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "int(str.replace(\"1,030,452\",\",\",\"\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next two code blocks, create a function named **str_to_int** and then use the method **apply** on the last 6 variables of the data in **edu_clean**. Create a new table named **edu_clean_int** that is similar to **edu_clean**, except that the last six variables are *int* arrays and not *str* arrays. You will not get full credit if you do not create a function or do not use the **apply** method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Put your function here\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Apply your function here\n", "edu_clean_int = ...\n", "\n", "edu_clean_int #Do Not Change this Line" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If you cannot figure out how to do this, run this code.\n", "# This code will load the cleaned version of the file.\n", "# You will need this to work if you want to complete the exam.\n", "# The rest of the exam depends on getting this part correct.\n", "# Only do this if you cannot figure this part out.\n", "# Uncomment the next two lines (remove # sign) and run code.\n", "\n", "\n", "#edu_clean_int = Table.read_table('Bachelors_Clean.csv')\n", "#edu_clean_int" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 3 (14 Points Total)\n", "\n", "#### 3.1 (6 Points)\n", "Create a table called **Bachelors_by_Sex** based off **edu_clean_int** which has only two variables labeled \"Male\" and \"Female\". Each row in this table should contain the total number of bachelor's degrees based on people 25 and older for males and females for each state. In other words, every row is for a different state, but the name of the state should not be in **Bachelors_by_Sex**. Sort this table by \"Female\" in descending order." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Bachelors_by_Sex = ...\n", "\n", "Bachelors_by_Sex #Do Not Change this Line" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.2 (4 Points)\n", "Create a scatter plot that shows the relationship between the two numeric variables in **Bachelors_by_Sex**. Add a title to the plot called \"Total Bachelor's Degrees\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3.3 (4 Points)\n", "In the scatterplot, you will notice that there is one state that has an unusually large amount of people with bachelor's degrees. Use the table methods you learned in class on the table **edu_clean_int** to identify this state and print out a table below that only shows this one state and the total number of bachelor's degrees for male and female based on people 25 and older (Female and Male). This table should contain one state and two rows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 4 (12 Points)\n", "\n", "#### 4.1 (6 Points)\n", "In order to make comparisons across states, it may be helpful to convert our counts to proportions. We want to compare states based on the proportions of bachelor's degrees that are \"ScEn\" and \"ScEn Rel\" for people in the \"25 to 39\" age group. Create a table named **young_prop** based on **edu_clean_int** by subsetting the data based off the mentioned age group, removing the variables named \"Age Group\", \"Business\", \"Education\", and \"Other\", creating a new variable called \"STEM Proportion\" which calculates the proportion of total bachelor's degrees that are either \"ScEn\" or \"ScEn Rel\" for Males and Females and finally, removing the variables \"Total\", \"ScEn\" and \"ScEn Rel\". Your final table named **young_prop** should contain three variables: \"State\", \"Sex\", and \"STEM Proportion\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "young_prop = ...\n", "\n", "young_prop #Do Not
Answered Same DaySep 25, 2021

Answer To: { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# STOR 120: Take Home Midterm...

Karthi answered on Sep 26 2021
138 Votes
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# STOR 120: Take Home Midterm 1\n",
"\n",
"**Due:** Friday, September 17, 9:05 am on Gradescope\n",
" \n",
"**Directions:** The exam is open book, notes, course materials, internet, and all things that are not direct communication with others. Just as with all course assignments, you will submit exams to Gradescope as Jupyter Notebooks with the ipynb file extension. To receive full credit, you should show all of your code used to answer each question."
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"**Data:** The dataset used on this exam contains information on the number of students that majored in different topics of study at universities in the United States in 2019 and is broken down by age group, sex, and state. The original source of the data is the US Census Bureau, but this dataset was found on [Kaggle.com](https://www.kaggle.com/tjkyner/bachelor-degree-majors-by-age-sex-and-state)"
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"**Run the cell below to import the dataset.**"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 2,
"source": [
"from datascience import *\n",
"import numpy as np\n",
"\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plots\n",
"plots.style.use('fivethirtyeight')\n",
"\n",
"import warnings\n",
"warnings.simplefilter('ignore', FutureWarning)\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"edu = Table.read_table('Bachelor_Degree_Majors.csv')\n",
"edu.take(np.arange(5))"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
State Sex Age Group Bachelor's Degree Holders Science and Engineering Science and Engineering Related Fields Business Education Arts, Humanities and Others
Alabama Total 25 and older 885,357 263,555 98,445 210,147 141,071 172,139
Alabama Total 25 to 39 268,924 90,736 32,378 58,515 29,342 57,953
Alabama Total 40 to 64 418,480 115,762 46,724 112,271 63,875 79,848
Alabama Total 65 and older 197,953 57,057 19,343 39,361 47,854 34,338
Alabama Male 25 and older 405,618 159,366 26,004 113,909 29,490 76,849
"
],
"text/plain": [
"State | Sex | Age Group | Bachelor's Degree Holders | Science and Engineering | Science and Engineering Related Fields | Business | Education | Arts, Humanities and Others\n",
"Alabama | Total | 25 and older | 885,357 | 263,555 | 98,445 | 210,147 | 141,071 | 172,139\n",
"Alabama | Total | 25 to 39 | 268,924 | 90,736 | 32,378 | 58,515 | 29,342 | 57,953\n",
"Alabama | Total | 40 to 64 | 418,480 | 115,762 | 46,724 | 112,271 | 63,875 | 79,848\n",
"Alabama | Total | 65 and older | 197,953 | 57,057 | 19,343 | 39,361 | 47,854 | 34,338\n",
"Alabama | Male | 25 and older | 405,618 | 159,366 | 26,004 | 113,909 | 29,490 | 76,849"
]
},
"metadata": {},
"execution_count": 2
}
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Question 1 (6 Points)"
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"Before analayzing the data, we will first need to clean the data. Start by creating a new table named **edu_clean** with the following changes:\n",
"- Rename the variables\n",
" - \"Bachelor's Degree Holders\" should become \"Total\"\n",
" - \"Science and Engineering\" should become \"ScEn\"\n",
" - \"Science and Engineering Related Fields\" should become \"ScEn Rel\"\n",
" - \"Arts, Humanities and Others\" should become \"Other\"\n",
"- Remove all observations where *Sex* is equal to *Total*"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"edu_clean = ...\n",
"\n",
"\n",
"edu_clean #Do Not Change this Line"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 3,
"source": [
"pd.set_option(\"display.max_rows\", 10)\n",
"sns.set() # Setting seaborn as default style\n",
"%matplotlib inline"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"df = pd.read_csv('Bachelor_Degree_Majors.csv')\n",
"df"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
StateSexAge GroupBachelor's Degree HoldersScience and EngineeringScience and Engineering Related FieldsBusinessEducationArts, Humanities and Others
0AlabamaTotal25 and older885,357263,55598,445210,147141,071172,139
1AlabamaTotal25 to 39268,92490,73632,37858,51529,34257,953
2AlabamaTotal40 to 64418,480115,76246,724112,27163,87579,848
3AlabamaTotal65 and older197,95357,05719,34339,36147,85434,338
4AlabamaMale25 and older405,618159,36626,004113,90929,49076,849
..............................
607WyomingMale65 and older16,4829,3751,1452,0112,3781,573
608WyomingFemale25 and older59,07415,5708,4706,85616,63811,540
609WyomingFemale25 to 3918,1806,7082,2681,9363,3133,955
610WyomingFemale40 to 6426,5375,1104,1943,8278,0075,399
611WyomingFemale65 and older14,3573,7522,0081,0935,3182,186
\n",
"

612 rows × 9 columns

\n",
"
"
],
"text/plain": [
" State Sex Age Group Bachelor's Degree Holders \\\n",
"0 Alabama Total 25 and older 885,357 \n",
"1 Alabama Total 25 to 39 268,924 \n",
"2 Alabama Total 40 to 64 418,480 \n",
"3 Alabama Total 65 and older 197,953 \n",
"4 Alabama Male 25 and older 405,618 \n",
".. ... ... ... ... \n",
"607 Wyoming Male 65 and older 16,482 \n",
"608 Wyoming Female 25 and older 59,074 \n",
"609 Wyoming Female 25 to 39 18,180 \n",
"610 Wyoming Female 40 to 64 26,537 \n",
"611 Wyoming Female 65 and older 14,357 \n",
"\n",
" Science and Engineering Science and Engineering Related Fields Business \\\n",
"0 263,555 98,445 210,147 \n",
"1 90,736 32,378 58,515 \n",
"2 115,762 46,724 112,271 \n",
"3 57,057 19,343 39,361 \n",
"4 159,366 26,004 113,909 \n",
".. ... ... ... \n",
"607 9,375 1,145 2,011 \n",
"608 15,570 8,470 6,856 \n",
"609 6,708 2,268 1,936 \n",
"610 5,110 4,194 3,827 \n",
"611 3,752 2,008 1,093 \n",
"\n",
" Education Arts, Humanities and Others \n",
"0 141,071 172,139 \n",
"1 29,342 57,953 \n",
"2 63,875 79,848 \n",
"3 47,854 34,338 \n",
"4 29,490 76,849 \n",
".. ... ... \n",
"607 2,378 1,573 \n",
"608 16,638 11,540 \n",
"609 3,313 3,955 \n",
"610 8,007 5,399 \n",
"611 5,318 2,186 \n",
"\n",
"[612 rows x 9 columns]"
]
},
"metadata": {},
"execution_count": 4
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 5,
"source": [
"df.head()"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"tr>\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
StateSexAge GroupBachelor's Degree HoldersScience and EngineeringScience and Engineering Related FieldsBusinessEducationArts, Humanities and Others
0AlabamaTotal25 and older885,357263,55598,445210,147141,071172,139
1AlabamaTotal25 to 39268,92490,73632,37858,51529,34257,953
2AlabamaTotal40 to 64418,480115,76246,724112,27163,87579,848
3AlabamaTotal65 and older197,95357,05719,34339,36147,85434,338
4AlabamaMale25 and older405,618159,36626,004113,90929,49076,849
\n",
"
"
],
"text/plain": [
" State Sex Age Group Bachelor's Degree Holders \\\n",
"0 Alabama Total 25 and older 885,357 \n",
"1 Alabama Total 25 to 39 268,924 \n",
"2 Alabama Total 40 to 64 418,480 \n",
"3 Alabama Total 65 and older 197,953 \n",
"4 Alabama Male 25 and older 405,618 \n",
"\n",
" Science and Engineering Science and Engineering Related Fields Business \\\n",
"0 263,555 98,445 210,147 \n",
"1 90,736 32,378 58,515 \n",
"2 115,762 46,724 112,271 \n",
"3 57,057 19,343 39,361 \n",
"4 159,366 26,004 113,909 \n",
"\n",
" Education Arts, Humanities and Others \n",
"0 141,071 172,139 \n",
"1 29,342 57,953 \n",
"2 63,875 79,848 \n",
"3 47,854 34,338 \n",
"4 29,490 76,849 "
]
},
"metadata": {},
"execution_count": 5
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 6,
"source": [
"df.tail()"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
StateSexAge GroupBachelor's Degree HoldersScience and EngineeringScience and Engineering Related FieldsBusinessEducationArts, Humanities and Others
607WyomingMale65 and older16,4829,3751,1452,0112,3781,573
608WyomingFemale25 and older59,07415,5708,4706,85616,63811,540
609WyomingFemale25 to 3918,1806,7082,2681,9363,3133,955
610WyomingFemale40 to 6426,5375,1104,1943,8278,0075,399
611WyomingFemale65 and older14,3573,7522,0081,0935,3182,186
\n",
"
"
],
"text/plain": [
" State Sex Age Group Bachelor's Degree Holders \\\n",
"607 Wyoming Male 65 and older 16,482 \n",
"608 Wyoming Female 25 and older 59,074 \n",
"609 Wyoming Female 25 to 39 18,180 \n",
"610 Wyoming Female 40 to 64 26,537 \n",
"611 Wyoming Female 65 and older 14,357 \n",
"\n",
" Science and Engineering Science and Engineering Related Fields Business \\\n",
"607 9,375 1,145 2,011 \n",
"608 15,570 8,470 6,856 \n",
"609 6,708 2,268 1,936 \n",
"610 5,110 4,194 3,827 \n",
"611 3,752 2,008 1,093 \n",
"\n",
" Education Arts, Humanities and Others \n",
"607 2,378 1,573 \n",
"608 16,638 11,540 \n",
"609 3,313 3,955 \n",
"610 8,007 5,399 \n",
"611 5,318 2,186 "
]
},
"metadata": {},
"execution_count": 6
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 7,
"source": [
"df.describe()"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
StateSexAge GroupBachelor's Degree HoldersScience and EngineeringScience and Engineering Related FieldsBusinessEducationArts, Humanities and Others
count612612612612612612612612612
unique5134612612608611608611
topWest VirginiaTotal40 to 641,260,695208,95316,60325,81046,31433,222
freq12204153112222
\n",
"
"
],
"text/plain": [
" State Sex Age Group Bachelor's Degree Holders \\\n",
"count 612 612 612 612 \n",
"unique 51 3 4 612 \n",
"top West Virginia Total 40 to 64 1,260,695 \n",
"freq 12 204 153 1 \n",
"\n",
" Science and Engineering Science and Engineering Related Fields \\\n",
"count 612 612 \n",
"unique 612 608 \n",
"top 208,953 16,603 \n",
"freq 1 2 \n",
"\n",
" Business Education Arts, Humanities and Others \n",
"count 612 612 612 \n",
"unique 611 608 611 \n",
"top 25,810 46,314 33,222 \n",
"freq 2 2 2 "
]
},
"metadata": {},
"execution_count": 7
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 8,
"source": [
"df.info()"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"RangeIndex: 612 entries, 0 to 611\n",
"Data columns (total 9 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 State 612 non-null object\n",
" 1 Sex 612 non-null object\n",
" 2 Age Group 612 non-null object\n",
" 3 Bachelor's Degree Holders 612 non-null object\n",
" 4 Science and Engineering 612 non-null object\n",
" 5 Science and Engineering Related Fields 612 non-null object\n",
" 6 Business 612 non-null object\n",
" 7 Education 612 non-null object\n",
" 8 Arts, Humanities and Others 612 non-null object\n",
"dtypes: object(9)\n",
"memory usage: 43.2+ KB\n"
]
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 9,
"source": [
"df[df.duplicated()].sum()"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"State 0.0\n",
"Sex 0.0\n",
"Age Group 0.0\n",
"Bachelor's Degree Holders 0.0\n",
"Science and Engineering 0.0\n",
"Science and Engineering Related Fields 0.0\n",
"Business 0.0\n",
"Education 0.0\n",
"Arts, Humanities and Others 0.0\n",
"dtype: float64"
]
},
"metadata": {},
"execution_count": 9
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 10,
"source": [
"# Removing rows that contains '25 and older' as a value in 'Age Group'\n",
"df = df[df[\"Age Group\"] != \"25 and older\"]"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 11,
"source": [
"# Removing the 'Total' data value in the 'Sex' Column\n",
"df = df[df[\"Sex\"] != \"Total\"]"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 12,
"source": [
"# Converting data type from (Dtype: object) to (Dtype: int)\n",
"def convert(string):\n",
" return int(string.replace(',', ''))\n",
"\n",
"\n",
"for col in df.iloc[:,3:]:\n",
" df[col] = df[col].apply(convert)"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 13,
"source": [
"# Merging 'Science and Engineering' & 'Science and Engineering Related Fields' columns together into a new column called 'STEM'\n",
"df['STEM'] = df['Science and Engineering'] + df['Science and Engineering Related Fields']\n",
"df = df.drop(['Science and Engineering', 'Science and Engineering Related Fields'], axis = 1)"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 14,
"source": [
"# Some minor adjustments to satisfy the OCD ?\n",
"\n",
"# Reset the index to start from zero\n",
"df.reset_index(drop=True, inplace=True)\n",
"\n",
"# Rearrange the columns\n",
"df = df[['State', 'Sex', 'Age Group', \"Bachelor's Degree Holders\", 'STEM', 'Business', 'Education', 'Arts, Humanities and Others']]\n",
"\n",
"# Renaming the column to remove the single quotes\n",
"df.rename(columns={\"Bachelor's Degree Holders\": 'Bachelors Degree Holders'}, inplace=True)"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 15,
"source": [
"# Final form\n",
"df"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
StateSexAge GroupBachelors Degree HoldersSTEMBusinessEducationArts, Humanities and Others
0AlabamaMale25 to 391177945790029859635723678
1AlabamaMale40 to 6418432880308549311282036269
2AlabamaMale65 and older10349647162291191031316902
3AlabamaFemale25 to 3915113065214286562298534275
4AlabamaFemale40 to 6423415282178573405105543579
...........................
301WyomingMale40 to 642414912375507727003997
302WyomingMale65 and older1648210520201123781573
303WyomingFemale25 to 39181808976193633133955
304WyomingFemale40 to 64265379304382780075399
305WyomingFemale65 and older143575760109353182186
\n",
"

306 rows × 8 columns

\n",
"
"
],
"text/plain": [
" State Sex Age Group Bachelors Degree Holders STEM Business \\\n",
"0 Alabama Male 25 to 39 117794 57900 29859 \n",
"1 Alabama Male 40 to 64 184328 80308 54931 \n",
"2 Alabama Male 65 and older 103496 47162 29119 \n",
"3 Alabama Female 25 to 39 151130 65214 28656 \n",
"4 Alabama Female 40 to 64 234152 82178 57340 \n",
".. ... ... ... ... ... ... \n",
"301 Wyoming Male 40 to 64 24149 12375 5077 \n",
"302 Wyoming Male 65 and older 16482 10520 2011 \n",
"303 Wyoming Female 25 to 39 18180 8976 1936 \n",
"304 Wyoming Female 40 to 64 26537 9304 3827 \n",
"305 Wyoming Female 65 and older 14357 5760 1093 \n",
"\n",
" Education Arts, Humanities and Others \n",
"0 6357 23678 \n",
"1 12820 36269 \n",
"2 10313 16902 \n",
"3 22985 34275 \n",
"4 51055 43579 \n",
".. ... ... \n",
"301 2700 3997 \n",
"302 2378 1573 \n",
"303 3313 3955 \n",
"304 8007 5399 \n",
"305 5318 2186 \n",
"\n",
"[306 rows x 8 columns]"
]
},
"metadata": {},
"execution_count": 15
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 16,
"source": [
"# Group the dataframe by states\n",
"d1 = df.groupby(['State']).sum().reset_index()\n",
"\n",
"# Sorting the number of bachelor's degree holders in descending order\n",
"d1.sort_values(by='Bachelors Degree Holders', ascending=False, inplace=True)"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 17,
"source": [
"plt.figure(figsize=(18,14))\n",
"sns.barplot(x=\"Bachelors Degree Holders\", y=\"State\", data=d1)"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {},
"execution_count": 17
},
{
"output_type": "display_data",
"data": {
"image/png": "",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
}
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 18,
"source": [
"# Another way to the see the result\n",
"d1.style.background_gradient(cmap='Blues')"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
StateBachelors Degree HoldersSTEMBusinessEducationArts, Humanities and Others
4California9428484471095717210945803662416067
43Texas5776533258701013406906868451161988
32New York516621822687509648875455421387039
9Florida475363720309561155695613501953485
13Illinois31089721355271662118376087715496
38Pennsylvania29174021297818557288402944659352
30New Jersey25517651190861551474268203541227
35Ohio23565851003960503277351202498146
46Virginia23250701129802427386210501557381
33North Carolina23211851034083471697280963534442
10Georgia2301568968962545251298007489348
21Massachusetts21817431081678387672184168528225
22Michigan2070795925568438527274144432556
47Washington1955632990397313212188089463934
20Maryland1710230854447317385151937386461
5Colorado1695602806198347333148597393474
2Arizona1492158664603312503197335317717
23Minnesota1433226635231283017194355320623
42Tennessee1348224557053302281184641304249
25Missouri1271281521394277875195694276318
49Wisconsin1258379551449250467189348267115
14Indiana1212826505899254878190166261883
40South Carolina1054559434021244608158539217391
37Oregon1032316507050149989110849264428
6Connecticut994548449578196631101676246663
0Alabama885357362000210147141071172139
18Louisiana784275329028151884126033177330
17Kentucky765923306484152929123079183431
36Oklahoma686509271127154362124801136219
44Utah66466128837112800083000165290
16Kansas652489262713136103113425140248
15Iowa622253252932124622112723131976
28Nevada54891923489712519466172122656
3Arkansas4753671935061000208625695585
24Mississippi441751175276901079239183977
27Nebraska422587162764924647615691203
31New Mexico394598189477546925725993170
29New Hampshire368237174264699223967984372
12Idaho336655156916574254757874736
11Hawaii335209158014675493831371333
19Maine328999147567438504578991793
8District of Columbia301429161821391141025190243
48West Virginia269706114517483685590050921
39Rhode Island260275115600472512965767767
26Montana249148118160391444110150743
7Delaware22819998630468172970853044
41South Dakota17478475218347223400430840
45Vermont17227280687184022306650117
34North Dakota15339766026312082903327130
1Alaska14615771601225071728334766
50Wyoming11355753384166702255020953
"
],
"text/plain": [
""
]
},
"metadata": {},
"execution_count": 18
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 19,
"source": [
"d2 = df.groupby(['State']).sum().reset_index()"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 20,
"source": [
"columns = d2.columns[2:]\n",
"\n",
"i=1\n",
"plt.figure(figsize=(15,22))\n",
"\n",
"for col in columns:\n",
"\tplt.subplot(2,2,i)\n",
"\tsns.barplot(x=col, y='State', data=d2.sort_values(by=col, ascending=False))\n",
"\ti+=1\n",
"\n",
"plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1, wspace=0.6, hspace=0.2)"
],
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here