Please see attachedCP5805 Assignment 2 Main task DataFrame manipulation and visualisation Task ...

Question

Please see attachedCP5805 Assignment 2   Main task  DataFrame manipulation and visualisation  Task  Design and implement a data analysis program in Python using pandas as detailed in the  instructions below.   85% of your mark will be based on the correctness and quality of the basic program, and   15% is based on the functionality in the challenge section.  You will need to use the skills covered across weeks one to five for this main task. Some  portions may require some further investigation of the pandas docs.  Important note about libraries  For this assessment, you are free to use any standard Python libraries, as well the libraries we  have covered in subject contents. In fact, you must use pandas appropriately to fulfill the  requirements of this assessment. You may, if it allows you to write more efficient or effective  code, use additional libraries, provided these libraries are included in the standard Anaconda  installation. You may not use any libraries that need to be installed separately (e.g., via  conda or pip).  Detailed instructions  Your program will allow users to load a DataFrame from a CSV file, clean the data in various  ways, display statistics, and create visualisations.  When the program runs, the user will see an introductory message (you are welcome to  determine this as you see you fit, but make sure to include your name). For example:   Welcome to The DataFrame Statistician!   Programmed by Ada Lovelace  After the welcome message, the user will be presented with the following menu:   Please choose from the following options:    1 – Load data from a file    2 – View data    3 – Clean data    4 – Analyse data    5 – Visualise data    6 - Save data to a file    7 - Quit Option 7 will exit the program; every other option will do some task and then display the menu  again until the user chooses 7 from this menu.  If the user enters anything other than a value between 1 and 7, display an appropriate error  message (e.g., Invalid selection!), then get the user to enter another choice.  Menu option 1 - load data from a file  When the user chooses option 1, they will be asked for a filename to load, which is expected to  be in the same directory as the program (no need for path information). Your program should use  the exact filename as stated. Do not append .csv or any other extension – although the  contents of the file will be expected to be CSV, a CSV file could be stored under any  extension, or no extension.  Your program should be able to handle any file in a format like the following:  day,min_temp,max_temp,rainfall,humidity  1,11,23,3,55  1,11,23,3,55  2,13,25,0,60  3,9,19,17,80  4,9,18,36,85  5,,,,50  6,12,22,,60  7,13,23,0,65  So, the first row should be the names of the columns, and the following rows should consist of  the data. Your program should not be hard coded to deal with the example weather data  above; it should work with any CSV file where all the column values are numeric and it  can be loaded as a DataFrame. Your program should work for any number of rows or  columns.  There are two problems your program may encounter here.  • the file does not exist or cannot be opened  • pandas cannot interpret the data as a DataFrame  In both of these cases your program should display an appropriate error message (e.g., "File not  found", "Unable to load data") then return control to the main menu.  Your program only needs to handle one DataFrame in the system at a time. If a DataFrame was  previously loaded, it should be replaced.  After the file loads successfully, the program should display the names of the columns, and ask  the user if they want to set any of the columns as an index. Valid input in this case will consist of  either one of the column names, or the blank string (user just presses `Enter`). If the input is not  valid, loop until the user enters a valid column name or blank.  The program should then set the DataFrame's index to the selected column or skip this if the user  entered the blank string. Menu option 2 - View data  This option simply prints the DataFrame to the screen. In the following example, day was set as  the index when the DataFrame was loaded.       min_temp  max_temp  rainfall  humidity  day                                          1        11.0      23.0       3.0        55  1        11.0      23.0       3.0        55  2        13.0      25.0       0.0        60  3         9.0      19.0      17.0        80  3         9.0      19.0      17.0        80  4         9.0      18.0      36.0        85  5         NaN       NaN       NaN        50  6        12.0      22.0       NaN        60  7        13.0      23.0       0.0        65  Menu option 3 - Clean data  This option will enter a submenu offering various cleaning operations.  Cleaning data:   1 - Drop rows with missing values   2 - Fill missing values   3 - Drop duplicate rows   4 - Drop column   5 - Rename column   6 - Finish cleaning  Cleaning option 1 - Drop rows with missing values  This option will ask the user for a threshold value. This must be a non-negative integer. A row  should be dropped if it has fewer non-null values than the threshold. For example, if there are 7  columns, and the threshold is 4, then there will need to be at least 4 non-null (or equivalently no  more than 3 null values).  Cleaning option 2 - Fill missing values  This option will ask the user to enter a value to fill in all the missing cells of the DataFrame.  Accept any number for this value. and display an error message if the user enters a non-number.  Cleaning option 3 - Drop duplicate rows  This option will remove any (fully) duplicate rows from the DataFrame.  Cleaning option 4 - Drop column  Present the user with the list of columns in the data and ask them to enter a name. If the entered  column name exists in the DataFrame, drop this column from the DataFrame. If the entered  column name does not exist, ask again.  Cleaning option 5 - Rename column  The user will choose a column to rename, then enter a new name. Make sure the new name is not  the name of an existing column, and that it is not blank.  Cleaning option 6 - Finish cleaning  Return to the main menu.  Menu option 4 - Analyse data  For each of the columns in the DataFrame, produce a report like the one below. Make sure to use  pandas functions as appropriate.   humidity   --------   number of values (n): 7                minimum: 50.00                maximum: 85.00                   mean: 65.00                 median: 60.00     standard deviation: 12.91      std. err. of mean: 4.88 Display each statistic to two decimal places (except for number of values, which is always a  whole number). After displaying the statistics reports, finish by displaying a table of correlations  like the one below (hint: you don't have to write your own code to compute correlations, search  the pandas docs).             min_temp  max_temp  rainfall  humidity   min_temp  1.000000  0.916131 -0.795016 -0.845247   max_temp  0.916131  1.000000 -0.882108 -0.920701   rainfall -0.795016 -0.882108  1.000000  0.882754   humidity -0.845247 -0.920701  0.882754  1.000000  Menu option 5 - Visualise data  In this case, ask the user:  • If they want a bar graph, line graph, or boxplot (repeat until they give a valid selection)  • Whether they want to use subplots  • For a title (skip if they leave it blank)  • For an x-axis label (skip if they leave it blank)  • For a y-axis label (skip if they leave it blank)  Then display the plot.  Menu option 6 - Save data to a file  Ask the user for a filename, including file extension (e.g., data.csv). Use the exact filename  given including the extension – if the user

Sathishkumar · Accepted Answer

Answer Attached Below:

CP5805 Assignment 2 Main task DataFrame manipulation and visualisation Task Design and implement a data analysis program in Python using pandas as detailed in the instructions below. 85% of your mark...

Answer To: CP5805 Assignment 2 Main task DataFrame manipulation and visualisation Task Design and implement a...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment