Data management contains multiple steps, including data cleaning and exploratory analysis. In this project, you will showcase skill in data management using Pandas.DataYou will use publicly available...

1 answer below »



Data management contains multiple steps, including data cleaning and exploratory analysis. In this project, you will showcase skill in data management using Pandas.




Data




You will use publicly available files. The first contains data on causes of death, while the second contains population data. Both files have state-level information for multiple years.







Requirements







  1. To demonstrate pandas skills and ability, answer these questions:



    1. Are Americans facing increasing, decreasing, or steady likelihood of death?



    2. What are the four leading causes of death for Americans?



    3. Do individual states show the same four leading causes of death?



    4. Are there year-by-year changes in the four leading causes of death nationwide?









  2. Use appropriately constructed and formatted tables to show results. There is no need to use visualization in this project.



  3. Use population data appropriately to demonstrate your understanding of how variables are normalized/standardized.



  4. Show skill in constructing a formal report using Jupyter.






Your formal report should contain components such as:







  • An introduction that discusses the scope of the analysis



  • A description of data used in the analysis along with data cleaning procedures



  • Code that clearly shows how an algorithm is implemented



  • Results



  • Discussion of results and generation of insight when appropriate



  • Summary when appropriate



Answered Same DayFeb 21, 2023

Answer To: Data management contains multiple steps, including data cleaning and exploratory analysis. In this...

Baljit answered on Feb 22 2023
29 Votes
2/22/23, 5:58 PM Untitled3 - Jupyter Notebook
localhost:8888/notebooks/Untitled3.ipynb?kernel_name=python3# 1/7
DATA MANAGEMENT
Introduction
In the first part of this project we will learn how we can do data analysis with pandas without visualization.We will analysis the data from the tables.We will
learn varoius trends in given dataset that deaths in United states.We will learn how we will filter,group and sort the data.In second part we will learn about the
cleansing of data and normalization of the Data.
In [1]:
Part
1 :- Analysis of NCHS-Leading causes of death in United States
In [2]:
1. Analysis of Trends of deaths in United states
In [3]:
Explanation:-
As we can see from the above output table we can see that Number of deaths in each year is increasing.There is slight decrease in number of deaths in year
2004,2006,2007,2009 but overall number deaths from year 1999 to 2016 was increased.
Out[2]:
Year 113 Cause Name Cause Name State Deaths Age-adjusted Death Rate
0 2012 Nephritis, nephrotic syndrome and nephrosis (N... Kidney disease Vermont 21 2.6
1 2016 Nephritis, nephrotic syndrome and nephrosis (N... Kidney disease Vermont 30 3.7
2 2013 Nephritis, nephrotic syndrome and nephrosis (N... Kidney disease Vermont 30 3.8
3 2000 Intentional self-harm (suicide) (*U03,X60-X84,... Suicide District of Columbia 23 3.8
Year 113 Cause Name Cause Name State Deaths Age-adjusted Death Rate
10074 1999 All Causes All causes United States 2391399 875.6
10062 2000 All Causes All causes United States 2403351 869.0
10043 2001 All Causes All causes United States 2416425 858.8
10039 2002 All Causes All causes United States 2443387 855.9
10015 2003 All Causes All causes United States 2448288 843.5
9929 2004 All Causes All causes United States 2397615 813.7
9932 2005 All Causes All causes United States 2448017 815.0
9841 2006 All Causes All causes United States 2426264 791.8
9779 2007 All Causes All causes United States 2423712 775.3
9776 2008 All Causes All causes United States 2471984 774.9
9681 2009 All Causes All causes United States 2437163 749.6
9664 2010 All Causes All causes United States 2468435 747.0
9641 2011 All Causes All causes United States 2515458 741.3
9620 2012 All Causes All causes United States 2543279 732.8
9616 2013 All Causes All causes United States 2596993 731.9
9585 2014 All Causes All causes United States 2626418 724.6
9622 2015 All Causes All causes United States 2712630 733.1
9601 2016 All Causes All causes United States 2744248 728.8
#libarary import
import pandas as pd
import matplotlib.pyplot as plt

#Loading of data into dataframe and display the data
data=pd.read_csv('NCHS_-_Leading_Causes_Of_Death_United_States.csv')
data.head(4)
# In order to see trends into the death we will fiter the data by state United States and cause name All causes
#Then we weill sort the data into the Ascendng order by year
us_deaths=data[(data['State']=='United States') &( data['Cause Name']=='All causes')]
us_deaths=us_deaths.sort_values(by='Year')
display(us_deaths)
2/22/23, 5:58 PM Untitled3 - Jupyter Notebook
localhost:8888/notebooks/Untitled3.ipynb?kernel_name=python3# 2/7
In [ ]:
2. Leading Causes of Deaths in United States
In [4]:
Explanation
From the above table of contents we can see that most of the death in United states caused by Heart disease and Cancer. The Heart disease,cancer,Stroke
and CLRD are the top four leading causes of the deaths in United States.
3. Analysis of Four leading causes of death in United states Vs Individual state
In [5]:
Cause Name
Heart disease 11575183
Cancer 10244536
Stroke 2580140
CLRD 2434726
Unintentional injuries 2177884
Alzheimer's disease 1373412
Diabetes 1316379
Influenza and pneumonia 1038969
Kidney disease 807980
Suicide 649843
Name: Deaths, dtype: int64

# In order to find this we will data by state name United state and Causes name not all causes
#And we will group the data by 'cause name' and sum the all the deaths by each cause
#Then we will sort the data descending order
#This will give us total number of death by each cause in descending order

causes=data[(data['State']=='United States') &( data['Cause Name']!='All causes')].groupby('Cause Name')['Deaths'].sum()
causes=causes.sort_values(ascending=False)
display(causes)
#create empty dataframe for saving top 4 causes of death for individual states
data1 = pd.DataFrame(columns = ['State','First Cause', 'Second Cause', 'Third Cause','Fourth Cause'])

#filter dataframe by not united States and not all causes
us_states_deaths_causes = data[(data['State']!='United States') & (data['Cause Name']!='All causes')]
#iterating all states to fetch the state wise top cause of deaths
for state in us_states_deaths_causes['State'].unique():
state_cause= us_states_deaths_causes[us_states_deaths_causes['State']==state].groupby(['Cause Name'])['Deaths'].sum()
state_cause.sort_values(ascending=False,inplace=True)
cause = state_cause.keys()[:4]
# append rows to an empty DataFrame
data1 = data1.append({'State' : state, 'First Cause':cause[0], 'Second Cause':cause[1],
'Third...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here