final project.pdf ITEC 2270 XXXXXXXXXXFall 2020 Final Project This assignment is worth 100 points (20% of the course grade) and must be completed and turned in before 11:59 on November 30. Assignment...

the full assignment is in the attached file


final project.pdf ITEC 2270 Fall 2020 Final Project This assignment is worth 100 points (20% of the course grade) and must be completed and turned in before 11:59 on November 30. Assignment Overview This assignment will give you more experience on the use of functions and dictionaries. You will practice them by processing a file from a real-life dataset. In general, any time you find yourself copying and pasting your code, you should probably place the copied code into a separate function and then call that function. Problem Statement Given a data file of 507 individuals and their physical attributes (weight, height, etc. from the body dataset at http://www.amstat.org/publications/jse/datasets/), create two linear regression models and their correlation: • between a person’s BMI and their age. • between a person’s weight and a combination of physical attributes. The authors propose the following formula: o -110 + 1.34(ChestDiameter) + 1.54(ChestDepth) + 1.20(BitrochantericDiameter) + 1.11(WristGirth) + 1.15(AnkleGirth) + 0.177(Height) Background BMI is short for Body Mass Index, is a measure based on a person’s weight and height. It is used as a estimator of healthy body weight (see http://en.wikipedia.org/wiki/Body_mass_index ) Linear regression is a form of regression analysis in which the relationship between one or more independent variables and another variable, called the dependent variable, is modeled by a least squares function, called a linear regression equation. A linear regression equation with one independent variable represents a straight line when the predicted value (i.e. the dependant variable from the regression equation) is plotted against the independent variable: this is called a simple linear regression. For example, suppose that a straight line is to be fit to the points (yi, xi), where i = 1, …, n; y is called the dependent variable and x is called the independent variable, and we want to predict y from x. Least Squares and Correlation The method we are going to use is called the least squares method. It takes a list of x values and y values (the same number of each) and calculates the slope and intercept of a line that best matches those values. See http://easycalculation.com/statistics/learn-regression.php for an example. To calculate the least squares line, we need to calculate the following values from the data: • sumX and sumY: the sum of all the X values and the sum of all the Y values • sumXY: the sum of all the products of each corresponding X,Y pair • sumXSquared and sumYSquared: the sum of the square of every X value and the sum of the square of every Y value • N: the number of pairs The calculation then is: • slope=(N*sumXY - (sumX*sumY))/(N*sumXSquared - (sumX)2) • intercept = (sumY – (slope*sumX)) / N http://www.amstat.org/publications/jse/datasets/ http://en.wikipedia.org/wiki/Body_mass_index http://en.wikipedia.org/wiki/Regression_analysis http://en.wikipedia.org/wiki/Independent_variable http://en.wikipedia.org/wiki/Least_squares http://en.wikipedia.org/wiki/Least_squares http://en.wikipedia.org/wiki/Simple_linear_regression http://en.wikipedia.org/wiki/Simple_linear_regression http://easycalculation.com/statistics/learn-regression.php We will also then calculate the correlation coefficient, and indication of how “linear” the points are (how much, in total, the points are correlated as a line). That calculation is: • corr = (N*sumXY - (sumX*sumY)) / sqrt((N*sumXSq - (sumX)2) * (N*sumYSq - (sumY)2)) The correlation value ranges between -1 and 1. A negative value means an inverse correlation, a positive value a positive correlation. Values near -1 or 1 are “good” correlations, values near 0 are “bad” correlations. See http://easycalculation.com/statistics/learn-correlation.php Project Description • gather the data from the provided file ‘bodydat.txt’. The file ‘body.txt’ describes the data. • For the BMI calculation, Age will be the x values, BMI the y values. The BMI calculation must be done with a function. o BMI is not a value found in the data. You will have to calculate it using the data. o Get the units right when you calculate the BMI! • For the formula (body weight vs. physical attribites), Weight will be the x values and the formula results the y values. The calculation of the formula must be done with a function. o all units are correct as provided in the data for the formula • calculate the slope and intercept of a linear regression line for those two measures. Print those two values for both measures. The calculation must be done with a function. • calculate the correlation between the x and y data for both measures. Print the correlation. The calculation must be done with a function. Deliverables FinalProjectMyName.py – your source code solution (remember to include your section, the date, project number and comments in your program). 1. Please be sure to use the specified file name, i.e. “FinalProjectMyName.py” 2. Electronically submit a copy of the file in D2L. Notes and Hints: • Don’t try to tackle this project all at once. Complete one function (or part of a function) and test it out. • Test your least squares function on known data to make sure it works • You should test your functions before using them in the program. Create some small lists of known x and y values, for example [1,2,3,4,5] for both x and y. The slope and intercept of that should be obvious, as should the correlation. If you don’t get the required answers, fix the function before moving on. Create a small data file with only two or three entries and test that you can parse it correctly. Testing functions will make your life easier. http://easycalculation.com/statistics/learn-correlation.php bodydat.txt 42.9 26.0 31.5 17.7 28.0 13.1 10.4 18.8 14.1 106.2 89.5 71.5 74.5 93.5 51.5 32.5 26.0 34.5 36.5 23.5 16.5 21.0 65.6 174.0 1 43.7 28.5 33.5 16.9 30.8 14.0 11.8 20.6 15.1 110.5 97.0 79.0 86.5 94.8 51.5 34.4 28.0 36.5 37.5 24.5 17.0 23.0 71.8 175.3 1 40.1 28.2 33.3 20.9 31.7 13.9 10.9 19.7 14.1 115.1 97.5 83.2 82.9 95.0 57.3 33.4 28.8 37.0 37.3 21.9 16.9 28.0 80.7 193.5 1 44.3 29.9 34.0 18.4 28.2 13.9 11.2 20.9 15.0 104.5 97.0 77.8 78.8 94.0 53.0 31.0 26.2 37.0 34.8 23.0 16.6 23.0 72.6 186.5 1 42.5 29.9 34.0 21.5 29.4 15.2 11.6 20.7 14.9 107.5 97.5 80.0 82.5 98.5 55.4 32.0 28.4 37.7 38.6 24.4 18.0 22.0 78.8 187.2 1 43.3 27.0 31.5 19.6 31.3 14.0 11.5 18.8 13.9 119.8 99.9 82.5 80.1 95.3 57.5 33.0 28.0 36.6 36.1 23.5 16.9 21.0 74.8 181.5 1 43.5 30.0 34.0 21.9 31.7 16.1 12.5 20.8 15.6 123.5 106.9 82.0 84.0 101.0 60.9 42.4 32.3 40.1 40.3 23.6 18.8 26.0 86.4 184.0 1 44.4 29.8 33.2 21.8 28.8 15.1 11.9 21.0 14.6 120.4 102.5 76.8 80.5 98.0 56.0 34.1 28.0 39.2 36.7 22.5 18.0 27.0 78.4 184.5 1 43.5 26.5 32.1 15.5 27.5 14.1 11.2 18.9 13.2 111.0 91.0 68.5 69.0 89.5 50.0 33.0 26.0 35.5 35.0 22.0 16.5 23.0 62.0 175.0 1 42.0 28.0 34.0 22.5 28.0 15.6 12.0 21.1 15.0 119.5 93.5 77.5 81.5 99.8 59.8 36.5 29.2 38.3 38.6 22.2 16.9 21.0 81.6 184.0 1 40.3 29.0 33.0 20.1 30.3 13.4 10.4 19.4 14.5 117.1 97.7 81.9 81.0 98.4 60.5 34.6 27.9 38.9 40.1 23.2 16.2 23.0 76.6 180.0 1 43.7 29.0 31.3 20.5 29.7 15.0 11.7 20.9 16.0 123.5 99.5 82.6 82.5 95.0 58.5 38.5 30.4 39.0 38.4 24.3 18.2 22.0 83.6 177.8 1 47.4 29.6 35.7 20.8 31.4 16.1 11.3 21.5 15.4 116.5 103.0 85.0 94.5 103.0 59.0 33.5 29.0 40.5 40.0 26.0 18.0 20.0 90.0 192.0 1 40.3 27.5 31.4 21.7 28.0 13.3 10.3 18.8 13.2 113.0 99.6 85.6 89.2 98.0 59.1 35.6 29.0 35.8 36.0 21.5 16.6 26.0 74.6 176.0 1 41.0 26.8 32.2 21.9 28.6 14.9 10.6 17.8 14.0 107.5 101.5 78.0 89.5 95.0 57.0 36.0 29.0 34.5 35.0 22.0 16.5 23.0 71.0 174.0 1 45.0 27.0 33.2 21.7 30.6 13.7 11.1 20.7 14.0 112.0 104.1 82.0 84.0 97.0 56.0 34.5 29.5 39.0 35.7 24.0 17.5 22.0 79.6 184.0 1 39.9 30.0 34.5 21.0 29.4 15.6 11.9 21.2 16.0 112.2 100.0 88
Nov 16, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here