Project 1 CITS1401 Overview For the last few years, the United Nations Sustainable Development Solutions Network has been publishing the World Happiness Report. Details of the 2018 report can be found...

How do I write a program for this assignment? Keeping in mind that I cannot import any modules e.g panda, numpy etc beside os


Project 1 CITS1401 Overview For the last few years, the United Nations Sustainable Development Solutions Network has been publishing the World Happiness Report. Details of the 2018 report can be found here. The underlying data, which you can also download from the latter URL, is a combination of data from specially commissioned surveys undertaken by the Gallup organisation, and statistical and economic data from other sources. The web site linked above also provides the methodology for how the different data have been combined to compute the final score, most dramatically called the Life Ladder. Here is a sample: cou ntr y Lif e La dd er Log GDP per capit a Soc ial su pp ort Healthy life expectan cy at birth Freedo m to make life choices Ge ner osit y Confiden ce in national governm ent Afg han ista n 2.661718137.4601435660.49088007252.339527130.427010864-0.1063403490.261178523 Alb ani a 4.6395483029.3737182620.63769829369.051658630.74961102 -0.0351403770.457737535 Alg eria 5.2489123349.5402441020.80675387465.699188230.436670482-0.194670126 Arg enti na 6.0393300069.8435192110.90669912167.538703920.831966162-0.1862999050.305430293 Ar me nia 4.2877364169.0347108840.69792491265.125686650.613697052-0.1321661770.246900991 Au stra lia 7.2570376410.711827280.94995784872.783340450.9105501770.3016932610.45340696 https://en.wikipedia.org/wiki/World_Happiness_Report http://worldhappiness.report/ed/2018/ The data shown above (and discussed below) can be found in the CSV formated text file WHR2018Chapter2_reduced_sample.csv    The actual method used to compute the Life Ladder score is quite complicated, so the the aim of this Project, in brief, is to test whether simpler methods can yield similar results. In particular, the Project aims to see whether any of a range of proposed methods yields a similar ranking, when countries are ranked by Life Ladder score in descending order i.e. from happiest on these measures, to least happy. (The Wikipedia article also discusses criticisms of the World Happiness Report process.) Looking at the data sample above, you can see that the column headers occupy the first row, the countries are listed in the first column, while the Life Ladder scores that we are seeking to emulate are in the second column. The third and subsequent columns contain the data from which you will compute your own Life Ladder scores. However, for this exercise, please remember that the aim is not to replicate the precise Life Ladder scores, but rather to replicate the ranking of countries as a result of the Life Ladder scores. Eye-balling the Data In Data Science projects, it is always a good idea to "eyeball" the data before you attempt to analyse it. The aim is to spot any trends ("this looks interesting") or any issues. So, looking at the sample above (ignoring the first two columns), what do you notice? • There is a difference in scale across the columns. Healthy Life Expectancy at Birth ranges from 52.3 to 72.8, but in general is valued in 10's, while Social Support is a value in the range 0.0 to 1.0, and Freedom to Make Life Choices has both negative and positive floating point numbers. (The problem of GDP per Capita being actually valued in the thousands, or tens of thousands, has already been solved by the data collectors taking logs.) The issue is that you don't want a particular attribute to appear significant just because it has much larger values than other attributes. • The other thing you may have noticed is that sometimes the data is simply missing, e.g. the score for Confidence in National Government for Algeria. Any metric we propose will have to deal with such missing data (which is actually a very common problem). • Specification: What your program will need to do Input Your program needs to call the Python function input three times to: • get the name of the input data file https://lms.uwa.edu.au/bbcswebdav/pid-1477081-dt-content-rid-21364606_1/courses/CITS1401_SEM-2_2019/Project_1/WHR2018Chapter2_reduced_sample.csv https://en.wikipedia.org/wiki/World_Happiness_Report#Criticism • get the name of the metric to be computed across the normalised data for each country. The allowed names are "min", "mean", "median" and "harmonic_mean". • get the name of the action to be performed. The two options here are: "list", list the countries in descending order of the computed metric, or "correlation", use Spearman's rank correlation coefficient to compute the correlation between ranks according to the computed metric and the ranks according to the Life Ladder score. The order of the 3 calls is clearly important. Output The output, printed to standard output, will be either a listing of the countries in descending order based on the computed metric, or a statement containing the correlation value (a number between -1.0 and 1.0). Tasks: A more detailed specification • Use input to read in 3 strings, representing the input file name, the metric to be applied to the data from the file (excluding the first two columns) and the action to be taken to report to the user. • Read in the CSV formated text file. That is, fields in each row are separated by commas, e.g. Albania, 4.639548302,9.373718262,0.637698293,69.05165863,0.74961102,-0.035140377,0.4 57737535 Algeria, 5.248912334,9.540244102,0.806753874,65.69918823,0.436670482,-0.194670126, Apart from the first field, all the other fields are either numbers (so converted using float(), or empty, which can be translated to the Python object None. Each line will be transformed into a row, represented as a list, so you end up with a list of lists. • For each column apart from the first two, compute the largest and smallest values in the column (ignoring any None values). • Given the maximum and minimum values for each column, normalise all the values in the respective columns. That is, each value should be normalised by transforming it to a value between 0.0 and 1.0, where 0.0 corresponds to the smallest value, and 1.0 to the largest, with other values falling somewhere between 0.0 and 1.0. For example, the minimum Life Expectancy years in the small dataset is 52.33952713. https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient This is transformed to 0.0. The maximum value is 72.78334045, which is transformed to 1.0. So, working proportionally, 69.05165863 is transformed to 0.81746645. In general, the transformation is (score - min)/(max-min), where max and min are the respective maximum and minimum scores for a given column, and will, of course, differ from column to column. • For each row, across all the columns except the first two, compute the nominated metric using the normalised values (excluding None). "min", "mean" and "median" are, respectively, the minimum value (on the basis that a nation's happiness is bounded by the thing the citizens are grumpiest about), "mean" and "median" are the arithmetic mean and median value (discussed here). The harmonic mean of a list of numbers is defined here. For harmonic mean, apart from avoiding None values, you will also have to avoid any zeroes; the other metrics have no problem with 0. The output from this stage is a list of country,score pairs. • The list of country,score pairs are either to be listed in order of descending score, or the Spearman's rank correlation coefficient should be computed between the country,score list that you have computed and the Life Ladder list, when sorted by descending score. You can assume there are no tied ranks, which means that the simpler form of the Spearman calculation can be used. An example of how to compute Spearman's rank correlation can be found here. Example >>> happiness.main() 
 Enter name of file containing World Happiness computation data: WHR2018Chapter2_reduced_sample.csv 
 Choose metric to be tested from: min, mean, median, harmonic_mean mean 
 Chose action to be performed on the data using the specified metric. Options are list, correlation correlation 
 The correlation coefficient between the study ranking and the ranking using the mean metric is 0.8286 
 
 >>> happiness.main() 
 Enter name of file containing World Happiness computation data: WHR2018Chapter2_reduced_sample.csv 
 Choose metric to be tested from: min, mean, median, harmonic_mean harmonic_mean 
 https://www.diffen.com/difference/Mean_vs_Median https://en.wikipedia.org/wiki/Harmonic_mean#Definition https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient https://www.statisticshowto.datasciencecentral.com/spearman-rank-correlation-definition-calculate/ Chose action to be performed on the data using the specified metric. Options are list, correlation list
 Ranked list of countries' happiness scores based the harmonic_mean metric 
 Australia 0.9965 
 Albania 0.5146 
 Armenia 0.3046 
 Afghanistan 0.0981 
 Argentina 0.0884 
 Algeria 0.0733 
 The complete table is in file WHR2018Chapter2_reduced.csv  Important You will have noticed that you have not been asked to write specific functions. That has been left to you. However, it is important that your program defines the top-level function main(). The idea is that within main() the program calls the other functions, as described above. (Of course, these may call further functions.) The reason this is important is that when I test your program, my testing program will call your main() function. So, if you fail to define main(), my program will not be able to test your program. Assumptions Your program can assume a number of things: • Anything is that meant to be a string (i.e. a name) will be a string, and anything that is meant to be a number (i.e. a score for a country) will be a number. • The order of columns in each row will follow the order of the headings, though data in particular columns may be missing in some rows. What being said, there are number of error conditions that your program should explicitly test for and respond to. One example is detecting whether the named input file exists; for example, the user may have mistyped the name. The way this test can be done is to first: import os Then, assuming the file name is in variable input_filename, use the test: if not os.path.isfile(input_filename) : 
 return(None) and test for None in the calling function (likely main()). https://lms.uwa.edu.au/bbcswebdav/pid-1477081-dt-content-rid-21364606_1/courses/CITS1401_SEM-2_2019/Project_1/WHR2018Chapter2_reduced.csv Things to avoid There are a couple things for your program to avoid. • Please do not import any Python module, other than os. While use of the many of these modules, e.g. csv or scipy is a perfectly sensible thing
Sep 19, 2021CITS1401
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here