Overview You have already read your data set that you found online into a Pandas DataFrame and created a subset of it that is relevant to the questions you posed about your topic. You will now...

information is on pdfs



Overview You have already read your data set that you found online into a Pandas DataFrame and created a subset of it that is relevant to the questions you posed about your topic. You will now visually inspect your data set for common data errors using the several Pandas methods covered in the lab and lecture this week. In fact, you could even use the same ​visually_inpsect function seen in the lab and modify it slightly to work with your data set. Once you have identified the data errors, you will then fix these errors using the techniques you have learned about in lab and lecture. For this assignment, you will ​place all answers and screenshots into the ​Google Site ​you made in a previous assignment​. This Google Site should be named ​lastname_topic where ​lastname is your last name and ​topic is the name of your topic (i.e., armenti_climatechange). The steps below will tell you what to put in the assignment document. You will also submit your Python script as an attachment in this assignment. Python Programming Environment You will be using Anaconda as your Python programming environment to write and run Python scripts for your labs and assignments in this course. Before you can begin this assignment, you will need to install Anaconda onto your personal device. You can follow the instructions provided in the Week 5: Intro To Python & Programming With Data​ module. In your ​lastname_csc201 folder, you should create a new Python script in Anaconda called lastname_cleaning​ where ​lastname​ should be replaced with your last name. Part I: Cleaning Your DataFrame You will write a modularized Python program that reads all the records from the data set you found online into a Pandas DataFrame, visually inspects and cleans the DataFrame for common data errors. Your program will be broken down into several functions to complete the required tasks of this program. ​Each line of code you write in this program should be within a function. https://sites.google.com/new 1. Reuse Your CSV File and Code From Previous Assignment. You will reuse the CSV file containing the data set on your topic that you found online and the code you wrote in the previous assignment—your ​lastname_dataframes from the Assignment 10: Programming With Data Using DataFrames​. ➔ Reuse the CSV file that you downloaded and uploaded to Python Jupyter in the previous assignment—your ​lastname_topic.csv from the ​Assignment 10: Programming With Data Using DataFrames​. This should be the CSV file containing the data set you found online earlier in the semester. ● You do NOT need to reupload this CSV file to Python Jupyter again! You should have already done this in the previous assignment. You will just use it again for the program you write in this assignment. ➔ Reuse the code you wrote in the previous assignment—your ​lastname_dataframes from the ​Assignment 10: Programming With Data Using DataFrames​—by copying and pasting this code into your new Python notebook called ​lastname_cleaning​. 2. Designing Your Program. Once you have copied and pasted your code from the previous assignment, you can then begin to design the program you want to write. Remember, this is the most important part where you are determining what the program is supposed to do and the steps needed to perform the tasks in your program. You should use comments (#) in your program to design your program and include the following: ➔ Your name, course name and semester, title of this assignment, and date. ➔ The title and description of the program. ➔ Your general solution for this program. ➔ Pseudocode (i.e., the steps needed to perform tasks and write your informal code for the program). Your program will be broken down into several functions to complete the required tasks of this program. Each line of code you write in this program should be within a function. You can use the functions that you wrote in your previous assignment​—your lastname_dataframes from the ​Assignment 10: Programming With Data Using DataFrames​—​as your starting code for this program. Make sure to visually inspect and clean your DataFrame BEFORE creating your filtered subset(s), computing calculations on your filtered subset(s), and outputting your filtered subset(s) and calculations on it. The following functions should be defined and implemented within your program: ⭐ ​You can refer to the Google Site on our climate change example meeting all of the above requirements at this ​link​. https://sites.google.com/uri.edu/a11-climatechange/python-scripts ● main: This function set ups the program and manages calls to functions defined for reading the CSV file into a Pandas DataFrame, visually inspects and cleans the DataFrame of common data errors. ○ Arguments -​ None ○ Return -​ None ● read_as_dataframe: This function converts data in a CSV file to a Pandas DataFrame. ○ Arguments -​ 1 string representing the name of the CSV file ○ Return - ​1 Pandas DataFrame ● visually_inpsect: This function visually inspects the DataFrames to diagnose it for common data errors. ○ Arguments -​ 1 Pandas DataFrame ○ Return - ​None You must do the following within the ​visually_inpsect​ function: ○ Use the ​info() method to examine the columns and rows in the DataFrame including the number of values and data types of each column. ■ Recall that this helps you to identify missing values in columns and incorrect data types of columns. ○ Use the ​value_counts() method to examine the unique counts of values in columns with potentially missing and/or duplicate values. ■ You do not have to use this method on every column in your DataFrame but you should use it on columns that you believe have data errors such as missing values and/or duplicate values. ○ Use the ​describe() method to identify columns with numeric types in the DataFrame and determine which other columns, if any, are not being recognized as an appropriate numerical type. ■ You should note which columns should have a numerical type in your DataFrame and which columns that should not. ○ Use other Pandas methods to help you visually inspect your DataFrame for errors as necessary. ● clean_dataframe: After visually inspecting your data set, this function will resolve any data errors that you found in your inspection. You can use any of the techniques to clean data discussed in the lectures and lab such as dropping or filling missing ⭐ ​You can refer to the Google Site on our climate change example meeting all of the above requirements at this ​link​. https://sites.google.com/uri.edu/a11-climatechange/python-scripts values, dropping duplicate rows, replacing erroneous data values, formatting values, and converting columns to their proper data type. ○ Arguments - 1 Pandas DataFrame that has NOT been cleaned (i.e., the original DataFrame containing data
Dec 15, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here