202130_ITEC102_Assessment3 ASSESSMENT GUIDE Unit: ITEC102 Python fundamentals for data science, Semester 1, 2021 Assessment number (3) Assessment Artefact: Report and Python Code Weighting [40%] Why...

1 answer below »
Please do this in Jupyter Notebook



202130_ITEC102_Assessment3 ASSESSMENT GUIDE Unit: ITEC102 Python fundamentals for data science, Semester 1, 2021 Assessment number (3) Assessment Artefact: Report and Python Code Weighting [40%] Why this assessment? What are the types of employability skills that I will acquire upon completion of this assessment? Assessment Overview: Purpose, as written in the EUO Due date: 12pm on Friday of Week 14, 11 June 2021 Weighting: 40% Length and/or format: Individual Runnable code, detailed comments and discussion in jupyter notebook Learning outcomes assessed LO3, LO4 Graduate attributes assessed GA3, GA4, GA5 How to submit: via LEO Return of assignment: via LEO within 2 weeks of submission Assessment criteria: Rubric: see end of document • The purpose is to assess students’ comprehensive Python data science skills and understanding from data processing to data visualisation on real-world datasets with consideration of data ethics. Skill Type Developed critical and analytical thinking ☒ Developed ability to solve complex problems ☒ Developed ability to work effectively with others ☐ Developed confidence to learn independently ☒ Developed written communication skills ☒ Developed spoken communication skills ☐ Developed knowledge in the field study ☐ Developed work-related knowledge and skills ☒ 2 Context Data processing, analysis and visualisation assignment In this assignment you will be analyzing the BRFSS weight vs height data (brfss.csv), which can be download from unit LEO website and use pandas to load it. The five columns in the data represent: age, current_weight (kg), weight_a_year_ago (kg), height (cm), and gender, where gender == 1 represents male and 2 represents female. In this assignment you will have the chance to do initial exploratory and visualization about the data with learned skills from this unit. Instructions Attempt below tasks with the given dataset, at the same time, reflect on the development and applications of data science while ensuring the respect of human rights and of the values shaping open, pluralistic and tolerant information societies. Task 1 (15 marks): Produce a summary statistics graph on current_weight, weight_a_year_ago, and height. [Hint: similar to figure 1 below] Figure 1: An example of summary statistics graph Task 2 (15 marks): Calculate correlation: Define weight_change = (current_weight – weight_a_year_ago). Calculate correlation between weight_change and the following variables, and determine which one is most correlated (regardless of sign of correlation) with weight_change. Use scatter plots to support your conclusion. i. current_weight ii. weight_a_year_ago iii. age [Hint: One scatter plot for each variable.] height weight Current_ weight Weight_a_ year_ago Height 3 Task 3 (10 marks): Use t-test to check significant difference 3.1 Use t-test to test whether there is a significant difference between the weight_change of male and female. 3.2 Randomly split the subjects into two groups of roughly equal sizes, and use t-test to test whether there is a significant difference between the weight_change of the two groups. [Hint: use t-test here https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html ] Task 4 (5 bonus marks): Propose and perform your own analysis that utilizes different skills taught in class (or reveal additional interesting insight from this dataset). Structure Prepare a Jupyter Notebook for this project. The structure of the Jupyter Notebook should alternate texts and python codes and cover topics listed the in specific tasks above. One template could be found in any week’s workshop resources in LEO. Naming the submission: start with your student ID, name and followed by unit name and code, i.e., studentID_studentNames_ITEC102_AT3.ipynb, e.g., S00258769_Alice_Zuk_ITEC102_AT3.ipynb How do I submit? Submit Jupyter Notebook (.ipynb) to Assessment 3 via LEO assessment tile Note that: The code will be compared to other students’ submission in Turnitin to make sure the submission satisfies academic integrity. Submission checklist I have formatted my jupyter notebook as per the specifications ☐ I have checked my Turnitin report and taken appropriate actions to ensure that the submission satisfies academic integrity ☐ I have actioned feedback advice provided to me from labs and assessment 2 (if applicable) ☐ I have submitted my work before the due date/time ☐ I have submitted feed forward template along with my assignment submission ☐ Feed Forward Template (example) A template for students to use and act on feedback and provide recommendations for improvement. Note This is a task for any instance of follow-on assignment (assessment 2 and 3). This must be submitted as the first page of the follow-on assignment (assessment 2 and 3) to ensure you acted on the feedback provided to you in the previous assignment (this is not counted as part of the assessment word count). How did you act on the feedback? Feedback is an important component of learning. Please consider the feedback you received in your last assignment and provide a response on how you acted on, or intend to act upon, that feedback, and how it has informed the current assignment task. Submit this sheet along with your assignment. Questions Your learning from the previous assignment feedback https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html 4 How have you acted on the feedback from previous assignment to improve your work in this assignment? (e.g. based on my previous feedback, I made sure that I supported my discussion, position, ideas, concepts with peer reviewed journal references in this assignment) What is your expectation around the type of feedback that enhances your learning? (e.g. I want to know where I made a mistake and how I can correct them and not make the same mistake again i.e. I want specific feedback that will help me to improve my learning and performance in the next assignment) Did you have any difficulty understanding or acting on previous feedback? Please be as specific as possible so that you can gain further feedback/clarify anything you do not understand in the feedback (e.g. feedback provided in my previous assignment was very generic I did not know how to improve my work. So, I would like the teacher to explain more on xxxx aspects of the feedback or I would like an opportunity to have a dialogue to understand the feedback) Some Helpful Websites and Resources LEO listed contents Anaconda environment https://docs.anaconda.com/anaconda/ Python official website https://www.python.org/ Useful python packages: https://numpy.org/ https://pandas.pydata.org/ https://matplotlib.org/ Who can help me? Academic skills Unit (ASU) Places Lecturer Maoying Qiao (via LEO messages) I’m having problems Application for Extension (EX) of Time for submission of an Assessment Task: The EX form should be completed by ACU students applying for an extension of time for submission of an assessment task. The completed and signed form must be submitted to the relevant National Lecturer-in-Charge prior to the due date of the assessment task. It must be accompanied by supporting documentary evidence such as EIP, doctor’s certificate or equivalent, death certificate, or a statutory declaration. Special Consideration: This form is used by students to apply for Special Consideration for assessable work in studies at Australian Catholic University. Approval of such applications will only be granted to students who are legitimately disadvantaged in their assessment due to exceptional and unforeseen circumstances beyond their control. https://www.acu.edu.au/-/media/feature/pagecontent/richtext/study-at-acu/_forms/form---ex-application-for-extension-of-time-for-submission-of-an-assessment-task.pdf?la=en&hash=F0B8D12030C4F16ED06803E255FFAB0A https://units.acu.edu.au/__data/assets/word_doc/0006/620655/SC_Application_for_Special_Consideration_20180214.docx 5 Referencing All referencing should be in ACU Harvard style; however if you are coming from another faculty, you may choose to use your usual referencing style. If this is the case you must indicate at the top of your reference list what referencing style you are using (e.g. APA, MLA, Chicago, etc). Please ensure your assignment makes use of in-text citations and a reference list. Missing citations or references is equivalent to plagiarism. Criteria The full criteria is compiled in a rubric, which can be found on the following page/s. https://libguides.acu.edu.au/referencing/harvard 6 Rubric for Assessment 3 Relevant LO/GAs Criterion (related to a single GA from the related LO – one GA per criterion Does not meet expectations Meets expectations Exceeds expectations NN (0-49%) PA (50-64%) CR (65-74%) DI (75-84%) HD (85-100%) GA5 LO3 and LO4 Weight=25 marks TL=3 Learning stage = I and D Demonstrate correct understanding of the concepts of data processing, analysis and visualisation Fail to adequately demonstrate correct understanding of the concepts of data processing, analysis and visualisation, i.e., None of the above tasks are addressed and no figures are produced. (0 – 12.25) Adequately demonstrate correct understanding of the concepts of data processing, analysis and visualisation, i.e., at least one task is addressed and one figure is produced with reasonable quality (12.5 –16.0) Credibly demonstrate correct understanding of the concepts of data processing, analysis and visualisation, i.e., at least two tasks are addressed and one figure is produced with desired quality. (16.25 – 18.5) Distinctively demonstrate correct correct understanding of the concepts of data processing, analysis and visualisation, i.e., most of the tasks are addressed and the figures are produced with desired quality. (18.75 – 21.0) Highly distinctively demonstrate correct understanding of the concepts of data processing, analysis and visualisation, i.e., all tasks are addressed with figures of desired quality. (21.25 – 25) GA4 LO3 Weight=10 marks TL=3 Learning stage = I and D Demonstrate critical and reflective thinking skills by observing and summarizing output of codes and figures Fail to adequately demonstrate critical and reflective thinking skills by observing and summarizing output of codes and figures, i.e., no summary and conclusion are drawn around the output (0 – 0.49) Adequately demonstrate critical and reflective thinking skills by observing
Answered 5 days AfterMay 27, 2021ITEC102

Answer To: 202130_ITEC102_Assessment3 ASSESSMENT GUIDE Unit: ITEC102 Python fundamentals for data science,...

Saravana answered on Jun 01 2021
134 Votes
{
"cells": [
{
"cell_type": "markdown",
"id": "38a80830",
"metadata": {},
"source": [
"## Task 1 (15 marks): Produce a summary statistics graph on current_weight, weight_a_year_ago, and height.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "853fd218",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Unnamed: 0 age weight2 wtyrago wtkg2 htm3 sex\n",
"0 0 39.0 88.636364 88.636364 88.64 180.0 1\n",
"1 1 64.0 75.000000 84.545455 75.00 155.0 2\n",
"2 2 87.0 61.818182 63.636364 61.82 NaN 2\n",
"3 3 51.0 100.000000 100.000000 100.00 183.0 1\n",
"4 4 35.0 63.636364 61.363636 63.64 170.0 2\n",
"(414509, 7)\n",
"count 398484.000000\n",
"mean 78.992337\n",
"std 19.546212\n",
"min 20.000000\n",
"25% 64.545455\n",
"50% 77.272727\n",
"75% 90.909091\n",
"max 309.090909\n",
"Name: weight2, dtype: float64\n",
"count 390399.000000\n",
"mean 79.721319\n",
"std 20.565164\n",
"min 22.727273\n",
"25% 64.545455\n",
"50% 77.272727\n",
"75% 90.909091\n",
"max 342.272727\n",
"Name: wtyrago, dtype: float64\n",
"count 409129.000000\n",
"mean 168.825190\n",
"std 10.352653\n",
"min 61.000000\n",
"25% 160.000000\n",
"50% 168.000000\n",
"75% 175.000000\n",
"max 236.000000\n",
"Name: htm3, dtype: float64\n"
]
}
],
"source": [
"import os\n",
"# set workin gdirectory to the folder containing BRFSS.csv file\n",
"os.chdir(\"/media/priyan/Files/GreyNodes/Assignment20\")\n",
"\n",
"# Use pandas to read CSV file\n",
"import pandas as pd\n",
"dat = pd.read_csv('brfss.csv') \n",
"\n",
"# print 5 rows of the dataset to get a peek in to the data\n",
"print(dat.head(n= 5))\n",
"\n",
"# verify the number of rows and column of the dataset\n",
"print(dat.shape)\n",
"\n",
"# Generate and print the summary statistics of variable 'weight' in the dataset\n",
"wgt_ss = dat['weight2'].describe(include='all')\n",
"print(wgt_ss)\n",
"\n",
"# Generate and print the summary statistics of variable 'last year weight' in the dataset\n",
"wgt_yr_ss = dat['wtyrago'].describe(include='all')\n",
"print(wgt_yr_ss)\n",
"\n",
"# Generate and print the summary statistics of variable 'height' in the dataset\n",
"hgt_ss = dat['htm3'].describe(include = 'all')\n",
"print(hgt_ss)"
]
},
{
"cell_type": "markdown",
"id": "939a48ef",
"metadata": {},
"source": [
"### Creating plot for Task 1"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "318ca64f",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAn0AAAGtCAYAAABjpcJ/AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAA9hAAAPYQGoP6dpAABZp0lEQVR4nO3de1hUBf4/8PeZUUaQm4ojKDOC4AW0VBI1LwGl4GVNs8z1myaSFuFtNbfSwlAMNsty2229JmBbmb8iK7toKJh4QRTdTND1hrKKgq2KIDAyc35/jHPWEQZBZxjgvF/PM8/MuX/moPL2c26CKIoiiIiIiKhZU9i7ACIiIiKyPYY+IiIiIhlg6CMiIiKSAYY+IiIiIhlg6CMiIiKSAYY+IiIiIhlg6CMiIiKSgRb2LqCx0ul02L59O3x8fKBUKu1dDhEREdWBXq/HyZMnMXToUDg4OEjjVSoVVCqVHSuzP4Y+C7Zv344xY8bYuwwiIiKygrfeegtxcXH2LsOuGPos8PHxAQB899138PPzs28xREREVCenT5/GmDFjsG/fPgQGBkrj5d7lAxj6LDId0vXz80NAQICdqyEiIqL6cHNzg6urq73LaFR4IQcRERGRDDD0EREREckAQx8RERGRDPCcPiIiavb0ej1u3bpl7zLIShwcHKBQsG9VXwx9RETUbImiiEuXLuHatWv2LoWsSKFQwNfX1+w+fHRvDH1ERNRsmQKfWq2Gk5MTBEGwd0n0gAwGAy5evIjCwkJotVr+TOuBoY+IiJolvV4vBb527drZuxyyovbt2+PixYuoqqpCy5Yt7V1Ok8ED4kRE1CyZzuFzcnKycyVkbabDunq93s6VNC0MfURE1Kzx8F/zw5/p/WHoIyIiIpIBhj4iIiIiGWDoIyIiIpIBhj4iIqI6KCwE4uKM77YWGRkJQRAQHR1dbVpMTAwEQUBkZKTtC6FmhaGPiIioDgoLgSVLGib0AYBGo8GmTZtQXl4uj
\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"from textwrap import wrap\n",
"import numpy as np\n",
"plt.rcParams.update(plt.rcParamsDefault)\n",
"\n",
"fig, ax1 = plt.subplots()\n",
"\n",
"# wrap the x-axis label in the format as given in example\n",
"label1 = [ '\\n'.join(wrap(\"Current_weight\", 8)) ]\n",
"label2 = [ '\\n'.join(wrap(\"Weight_a_year_ago\", 9)) ]\n",
"\n",
"# plot the summary statistics of variable height\n",
"# the individual datapoints are provided with custom markers- colors and shape as in the example figure.\n",
"ax1.plot(label1, wgt_ss['mean'], 'b+') # Plot mean of variable weight with blue + marker\n",
"ax1.plot(label1, wgt_ss['std'], 'yo') # Plot standard deviation of variable weight with yellow circle\n",
"ax1.plot(label1, wgt_ss['mean'] + wgt_ss['std'], 'gv') # Plot mean + sd of variable weight with green lower triangle marker\n",
"ax1.plot(label1, wgt_ss['mean'] - wgt_ss['std'], 'g^') # Plot mean - sd of variable weight with green upper triangle marker\n",
"ax1.plot(label1, wgt_ss['50%'], 'bx') # Plot median of variable weight with blue x marker\n",
"ax1.plot(label1, wgt_ss['75%'], 'rv') # Plot 75% value of variable weight with red lower trianlge marker\n",
"ax1.plot(label1, wgt_ss['25%'], 'r^') # Plot 75% value of variable weight with red lower trianlge marker# Plot 75% value of variable height with red lower trianlge marker\n",
"ax1.plot(label1, wgt_ss['max'], 'kv') # Plot max value of variable weight with black lower triangle marker\n",
"ax1.plot(label1, wgt_ss['min'], 'k^') # Plot min of variable weight with black upper triangle marker\n",
"\n",
"ax1.plot(label2, wgt_yr_ss['mean'], 'b+') # Plot mean of variable last year weight with blue + marker\n",
"ax1.plot(label2, wgt_yr_ss['std'], 'yo') # Plot standard deviation of variable last year weight with yellow circle\n",
"ax1.plot(label2, wgt_yr_ss['mean'] + wgt_ss['std'], 'gv') # Plot mean + sd of variable last year weight with green lower triangle marker\n",
"ax1.plot(label2, wgt_yr_ss['mean'] - wgt_ss['std'], 'g^') # Plot mean - sd of variable last year weight with green upper triangle marker\n",
"ax1.plot(label2, wgt_yr_ss['50%'], 'bx') # Plot median of variable last year weight with blue x marker\n",
"ax1.plot(label2, wgt_yr_ss['75%'], 'rv') # Plot 75% value of variable last year weight with red lower trianlge marker\n",
"ax1.plot(label2, wgt_yr_ss['25%'], 'r^') # Plot 25% value of variable last year weight with red upper triangle marker\n",
"ax1.plot(label2, wgt_yr_ss['max'], 'kv') # Plot max value of variable last year weight with black lower triangle marker\n",
"ax1.plot(label2, wgt_yr_ss['min'], 'k^') # Plot min of variable last year weight with black upper triangle marker\n",
"\n",
"\n",
"ax1.set_ylabel('weight', fontweight='bold',rotation=0,labelpad=25) # X axis label is bolded and made horizontal\n",
"\n",
"ax2 = ax1.twiny() # create a twin Y axis\n",
" \n",
"ax2.plot(\"Height\", hgt_ss['mean'], 'b+', label = 'Mean') # Plot mean of variable height with blue + marker\n",
"ax2.plot(\"Height\", hgt_ss['std'], 'yo', label = 'STD') # Plot standard deviation of variable height with yellow circle\n",
"ax2.plot(\"Height\", hgt_ss['mean'] + wgt_ss['std'], 'gv', label = 'Mean+SD') # Plot mean + sd of variable height with green lower triangle marker\n",
"ax2.plot(\"Height\", hgt_ss['mean'] - wgt_ss['std'], 'g^', label = 'Mean-SD') # Plot mean - sd of variable height with green upper triangle marker\n",
"ax2.plot(\"Height\", hgt_ss['50%'], 'bx', label = 'Median' ) # Plot median of variable height with blue x marker\n",
"ax2.plot(\"Height\", hgt_ss['75%'], 'rv', label = '75%') # Plot 75% value of variable height with red lower trianlge marker\n",
"ax2.plot(\"Height\", hgt_ss['25%'], 'r^', label = '25%') # Plot 25% value of variable height with red lower trianlge marker\n",
"ax2.plot(\"Height\", hgt_ss['max'], 'kv', label = 'Max') # Plot max value of variable height with black lower triangle marker\n",
"ax2.plot(\"Height\", hgt_ss['min'], 'k^', label = 'Min') # Plot min of variable height with black upper triangle marker\n",
"# necessary labels are added for legend\n",
"\n",
"ax2.xaxis.tick_bottom() #X axis label of second y axis is formaated\n",
"ax2.legend() # legend information added \n",
"\n",
"\n",
"ax1.axis([-1, 5, 0, 375]) # X axis location and Y axis limits are defined for weight axis plot\n",
"ax2.axis([-5, 5, 0, 375]) # X axis location and Y axis limits are defined for second 'height' axis plot\n",
"\n",
"rng = np.arange(0, 375, 75) # y ticks range defined\n",
"ax1.set_yticks(rng) # Y ticks set for weight axis plots\n",
"\n",
"\n",
"\n",
"ax2 = ax1.twinx() # second Y axis ticks and values generated\n",
"ax2.set_ylabel('height', fontweight='bold',rotation=0,labelpad=25) # second Y axis label generated and rotated\n",
"ax2.set_yticks(rng)\n",
"\n",
"plt.setp( ax1.get_yticklabels(), visible=False) # Y axis tick values removed only ticks are visible\n",
"plt.setp( ax2.get_yticklabels(), visible=False) # Y axis tick values removed only ticks are visible for second y axis\n",
"\n",
"\n",
"plt.savefig(\"test.svg\") # plot is saved as 'test.svg' in the working directory set above\n",
"\n",
"plt.show() # generated plot is displayed\n"
]
},
{
"cell_type": "markdown",
"id": "7743956b",
"metadata": {},
"source": [
"## Task 2 (15 marks): Calculate correlation"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "ee789df2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Unnamed: 0 age weight2 wtyrago wtkg2 htm3 sex weight_change\n",
"0 0 39.0 88.636364 88.636364 88.64 180.0 1 0.000000\n",
"1 1 64.0 75.000000 84.545455 75.00 155.0 2 -9.545455\n",
"3 3 51.0 100.000000 100.000000 100.00 183.0 1 0.000000\n",
"4 4 35.0 63.636364 61.363636 63.64 170.0 2 2.272727\n",
"5 5 62.0 70.454545 70.454545 70.45 173.0 2 0.000000\n",
"Correlation between weight change and current weight: 0.03413217522408281\n",
"Correlation between weight change and weight a year ago: -0.31911696303346493\n",
"Correlation between weight change and age: -0.06867582903526576\n",
"Correlation between weight change and weight a year ago: -0.31911696303346493 is the most correlated\n"
]
}
],
"source": [
"# weight change variable is calculated\n",
"\n",
"dat['weight_change'] = dat['weight2'] - dat['wtyrago'] # new weight change variable column is computed\n",
"\n",
"print(dat.head(n= 5)) # new dataframe with added column for 'weight change'\n",
"\n",
"dat = dat.dropna()\n",
"\n",
"# correlation between weight_change and current weight\n",
"corr1 = dat['weight2'].corr(dat['weight_change'])\n",
"print('Correlation between weight change and current weight: ' + str(corr1))\n",
"\n",
"# correlation between weight_change and weight_a_year_ago\n",
"corr2 = dat['wtyrago'].corr(dat['weight_change'])\n",
"print('Correlation between weight change and weight a year ago: ' + str(corr2))\n",
"\n",
"# correlation between weight_change and age\n",
"corr3 = dat['age'].corr(dat['weight_change'])\n",
"print('Correlation between weight change and age: ' + str(corr3))\n",
"\n",
"print('Correlation between weight change and weight a year ago: ' + str(corr2) + ' is the most correlated')\n"
]
},
{
"cell_type": "markdown",
"id": "0f2e1075",
"metadata": {},
"source": [
"#### Plot the correlation relation using scatterplot"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "c4ca4fc1",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n",
"The PostScript backend does not support transparency; partially transparent artists will be rendered opaque.\n"
]
},
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here