Week 6 Assessment: Code Task The Iris data set is a comprehensive data set compiled by Robert Fisher in 1936, detailing a number of measurements of three species of Iris flowers. It has gained some...

1 answer below »
i need a python program to meet the spec of the attached pdf. a text doc with the initial code framework has been provided. the two csv documents that will pass through the program are also provided.


Week 6 Assessment: Code Task The Iris data set is a comprehensive data set compiled by Robert Fisher in 1936, detailing a number of measurements of three species of Iris flowers. It has gained some popularity in the fields of Data Analytics and Machine Learning, as it provides a large number of measurements across a relatively small number of categories. The assessment task is to carry out some simple, computer-supported analysis of the Iris data set. Task details The task has been broken into several, largely independent stages. Stage 1: Reading and processing data For this stage you need to complete the specification of the read_and_process(csv_filename) function. This function should do the following: • Import the csv file named csv_filename as a Pandas DataFrame • Drop any rows that do not contain entries in all columns • Strip ' cm' and ' mm' from each data point, and convert them to floats • Divide the second column ('sepal_width') by 10 • Return the resulting DataFrame You may assume that csv_filename is a readable csv file with a similar format to iris.csv Stage 2: User menu For this stage you need to implement the initial interactions. When your program is run: • Prompt the user to enter a csv file with Enter csv file: • Read and process the user-entered file using the function from Stage 1 • Display the menu: 1. Create textual analysis 2. Create graphical analysis 3. Exit • Prompt the user to select an option with Please select an option: • Process the user's choice: o If they select '1', proceed to Stage 3 o If they select '2', proceed to Stage 4 o If they select '3', exit the program with the exit() function You may assume that only valid options are selected. https://en.wikipedia.org/wiki/Iris_flower_data_set Stage 3: Text-based analysis For this stage you will output some simple statistics based on the DataFrame loaded in Stage 2. Upon entering this stage, the program should: • Prompt the user for a species with: Select species (all, setosa, versicolor, virginica): To obtain full marks, the available species should be extracted from the DataFrame, and may be different from those listed above. They should be arranged alphabetically, after all. • Display the following statistics: Mean, 25%-ile, Median, 75%-ile, Standard deviation for each of the characteristics (sepal_length, sepal_width, petal_length, petal_width) for the species selected by the user. If the user chose all, then the resulting table should be a summary of all the data. • The output should be the result of printing a DataFrame with index: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] and column headings: Mean, 25%, Median, 75%, Std. (See Sample Interactions) • Return to the main menu (Stage 2) The output resulting from pandas function calls is sufficient. You do not need to manually round any results. Stage 4: Graphics-based analysis For this stage you will output some simple graphical plots based on the DataFrame loaded in Stage 2. Upon entering this stage, your program should: • Prompt the user for a characteristic for the x-axis with: Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): The available characteristics do not need to be extracted from the DataFrame • If the user does not select all: o Prompt the user for a characteristic for the y-axis with: Choose the y- axis characteristic (sepal_length, sepal_width, petal_length, petal_width): o Plot a scatter-plot of the two chosen characteristics (does not have to be displayed) • If the user does select all: o Using a scatter_matrix or pairplot, plot the relationships between all pairs of characteristics • In both cases, the program should prompt the user to enter a file with: Enter save file: and then save the graphical plot to the entered file. • Return to the main menu (Stage 2) To obtain full marks, the outputs should differentiate the different species by colouring the data points based on their species. In addition to the automarked test-cases, the output of this Stage will be inspected by your OL, and up to 5 marks awarded for output. The marks will be based on the following criteria: • Scatter plots of the correct characteristics (3 marks) • Differentiation of species by colour (2 marks) Stage 5: Conclusion For this Stage you are required to complete the provided function conclusion(). Your function should return a tuple containing the two (non-species) characteristics you believe answer the following question: In iris.csv, which pair of characteristics is best for separating the species? In other words, which pair of characteristics have the most significant impact in determining what species the plant belongs to? The two characteristics should be ordered alphabetically within the tuple, and should be two of: 'sepal_length', 'sepal_width', 'petal_length', or 'petal_width' . The return value should be hard-coded into the function (i.e., no calculations are required) based on your own analysis of the data (using the program you just created, if appropriate). If you are failing the last (hidden) test case, but passing the second last test case then add a comment indicating the reason for your choice. Justification is not needed if you pass the last test case. Subjective component In addition to the above tasks, your code will be inspected by your OL and evaluated on its adherence to good coding practices. Particular attention will be on the following aspects of your code: • Documentation: Appropriate use of comments • Modularity: Appropriate use of functions (Note: if appropriate you should define your own functions outside of those outlined above). All functions should "stand alone" - that is, not be dependent on global variables • Readability: Appropriate use of variable names • Structure: Appropriate code layout so that the program flow is clear Sample interactions Enter csv file: iris.csv 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 1 Select species (all, setosa, versicolor, virginica): all Mean 25% Median 75% Std sepal_length 5.843333 5.1 5.80 6.4 0.828066 sepal_width 3.054000 2.8 3.00 3.3 0.433594 petal_length 3.758667 1.6 4.35 5.1 1.764420 petal_width 1.198667 0.3 1.30 1.8 0.763161 1. Create textual analysis 2. Create graphical analysis https://www.python.org/dev/peps/pep-0008/ 3. Exit Please select an option: 3 Enter csv file: iris_test.csv 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 1 Select species (all, versicolor, virginica): versicolor Mean 25% Median 75% Std sepal_length 5.955102 5.6 5.9 6.3 0.503348 sepal_width 2.785714 2.6 2.8 3.0 0.296507 petal_length 4.275510 4.0 4.4 4.6 0.461668 petal_width 1.332653 1.2 1.3 1.5 0.194066 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 3 Enter csv file: iris.csv 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 2 Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): all Enter save file: iris_all.png 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 3 After the above interaction, an example of iris_all.png would be either of the following: Enter csv file: iris.csv 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 2 Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): sepal_width Choose the y-axis characteristic (sepal_length, sepal_width, petal_length, petal_width): sepal_width Enter save file: sw_vs_sw.png 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 1 Select species (all, setosa, versicolor, virginica): all Mean 25% Median 75% Std sepal_length 5.843333 5.1 5.80 6.4 0.828066 sepal_width 3.054000 2.8 3.00 3.3 0.433594 petal_length 3.758667 1.6 4.35 5.1 1.764420 petal_width 1.198667 0.3 1.30 1.8 0.763161 1. Create textual analysis 2. Create graphical analysis 3. Exit Please select an option: 3 After the above interaction, an example of sw_vs_sw.png would be: Note: Your plots do not have to have the same style options (e.g., colours, fonts) as the ones presented here. Your plots will be assessed on whether they are plotting the correct data with the correct chart type (i.e., a scatterplot) Submission and feedback You can click on the mark button, also used to submit your work, as many times as you like. We will assess your last submission only. You can see where your code differs from the expected output by examining the feedback from the non-hidden test cases. The hidden test cases will test your code more rigorously, but with suppressed input/output to limit dishonest attempts. You are encouraged to test your code yourself and not rely on the provided test cases. Two files suitable for input have been provided as part of your scaffold. import pandas as pd # Stage 1: Read and process
Answered Same DayAug 01, 2021

Answer To: Week 6 Assessment: Code Task The Iris data set is a comprehensive data set compiled by Robert Fisher...

Yogesh answered on Aug 02 2021
135 Votes
import pandas as pd
# import for graphical data plotting
import matplotlib.pyplot as plt
# for "pairplot" & "PairGrid"

import seaborn as sns
# Stage 1: Read and process the data
def read_and_process(filename):
    # read csv file & load as Dataframe
    df = pd.read_csv("iris.csv")
    
    # drop empty rows
    df = df.dropna(how = "all")
    
    # strip cm & mm from each column & convert data_type to float
    df["sepal_length"] = df["sepal_length"].str.replace("\scm", "").astype(float)
    df["sepal_width"] = df["sepal_width"].str.replace("\smm", "").astype(float)
    df["petal_length"] = df["petal_length"].str.replace("\scm", "").astype(float)
    df["petal_width"] = df["petal_width"].str.replace("\scm", "").astype(float)
    
    # devide sepal_width by 10
    df["sepal_width"] /= 10
    
    #print(df)
    return df
# Stage 5: Conculsion
def conclusion():
# Return the two (non-species) categories that best identify the species of iris
# Your code here
### by observing graph it can be seen that "petal_length" & "petal_width" are two characteristics which effectively separate the species
    result = ("petal_length", "petal_width",)
    
    return result

# User Menu Function
def menu():
    print("1. Create...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here