DESCRIPTIONS ARE ATTACHED IN A FILE! THIS WORK CAN BE DONE BOTH VIA R OR SAS, BUT I PREFER IT TO BE DONE IN R. ALSO, IT INCLUDES A 2500WORD REPORT, WHICH IS ABOUT THE DATA ANALYSIS DONE AS A PART OF...

DESCRIPTIONS ARE ATTACHED IN A FILE! THIS WORK CAN BE DONE BOTH VIA R OR SAS, BUT I PREFER IT TO BE DONE IN R. ALSO, IT INCLUDES A 2500WORD REPORT, WHICH IS ABOUT THE DATA ANALYSIS DONE AS A PART OF THE ASSIGNMENT, PLEASE BE CONSIDERATE ENOUGH TO MAKE IT LOOK LIKE A PROJECT OF A 2ND YEAR UNIVERSITY STUDENT. IF YOU NEED THE SYLLABUS OF WHICH THE COURSE HAS TAUGHT US TO DO THIS HOMEWORK, I WILL SEND IT ASAP SO THAT YOU CAN TAKE IT AS A GUIDELINE OF EXTENT OF COMPLEXITY/DIFFICULTY OF CODES . THANK YOU VERY MUCH, AND HOPE YOU HAVE A NICE DAY. Only one file can be attached, so I will email the necessary data and additional guidelines via email
Referencing is not needed


STAT7001 STATISTICS FOR PRACTICAL COMPUTING — ASSESSMENT 2 (2017/18 SESSION) • Your solutions should be your own work and are to be submitted electronically to the course Moodle page by 12 noon on MONDAY, 23RD APRIL 2018. • Ensure that you electronically ‘sign’ the plagiarism declaration on the Moodle page when submitting your work. • Late submission will incur a penalty unless there are extenuating circumstances (e.g. medical) supported by appropriate documentation and notified within one week of the deadline above. Penalties, and the procedure in case of extenuating circumstances, are set out in the latest editions of the Statistical Science Department student handbooks which are available from the departmental web pages. • Failure to submit this in-course assessment will mean that your overall examination mark is recorded as “non-complete”, i.e. you will not obtain a pass for the course. • Submitted work that exceeds the specified word count will be penalized. The penalties are described in the detailed instructions below. • Your solutions should be your own work. When uploading your scripts, you will be required to electronically sign a statement confirming this, and that you have read the Statistical Science department’s guidelines on plagiarism and collusion (see below). • Any plagiarism or collusion can lead to serious penalties for all students involved, and may also mean that your overall examination mark is recorded as non-complete. Guidelines as to what constitutes plagiarism may be found in the departmental student handbooks: the relevant extract is provided on the ‘In-course assessment 2’ tab on the STAT7001 Moodle page. The Turn-It-In plagiarism detection system may be used to scan your submission for evidence of plagiarism and collusion. • You will receive feedback on your work via Moodle, and you will receive a provisional grade. grades are provisional until confirmed by the Statistics Examiners’ Meeting in June 2018. Background and overview In the European Union (EU), car manufacturers are required to limit the carbon dioxide (CO2) emissions from the vehicles that they sell. This is due to the fact that CO2 is the most significant of the greenhouse gases contributing to human-induced climate change, and road vehicles are responsible for a substantial proportion of total emissions.1 Since 2009, each EU member state has been required to provide emissions information on each new car registered in its territory. A set of annual datasets, compiled from this information, is available from the web site of the European Environment Agency (click the blue text to follow the link). These data are used by the European Commission to calculate the average emissions of CO2 from new passenger cars, and to set emissions targets for car manufacturers. Each manufacturer’s target is based on the total number of cars that they sell, thus allowing 1See http://www.dft.gov.uk/vca/fcb/cars-and-carbon-dioxide.asp for example. https://www.eea.europa.eu/data-and-maps/data/co2-cars-emission-13#tab-european-data http://www.dft.gov.uk/vca/fcb/cars-and-carbon-dioxide.asp them to make a few high-emissions vehicles if they wish to do so, providing they offset this with a large number of low-emissions vehicles. For a useful summary of the regulation, see the UK Vehicle Certification Agency explanatory booklet (again, click on the blue text). Car manufacturers are also required to limit emissions of other pollutants including nitrogen oxides (NOx), with the emissions of each new vehicle calculated on the basis of laboratory tests. In September 2015, the manufacturer Volkswagen (which owns other makes including Audi, Porsche, Seat and Škoda) was found to have installed software in its new diesel engines, that automatically limited NOx emissions to unrealistically low levels in laboratory tests (see the relevant Wikipedia article for more on this). Other car manufacturers were subsequently discovered to have been doing the same thing. Since this came to light, there has been pressure on manufacturers to disable such ‘defeat devices’, although the extent to which this has yet been implemented is not yet clear. On the ‘In-course assessment 2’ tab of the STAT7001 Moodle page, you will find a CSV file called EmissionsData.csv which contains a slightly modified subset of the EU CO2 emissions data since 2010. The file contains 21 156 records (i.e. rows of data): each record contains data from one EU member state, on an individual vehicle type, for a specified year. The years are numbered 1, 2 and 3: these are in chronological order, with years 1 and 2 predating the Volkswagen emissions scandal and year 3 postdating it (I’m not telling you exactly which years they represent!). The first 14 104 records contain information on CO2 emissions (in grams per kilometre) along with other information about the vehicle manufacturer, model, mass, engine size and power, and other potentially relevant features: full details can be found in the Appendix to these instructions. For the final 7 052 records, however, the CO2 emissions figures are not provided: they are given as ‘−1’. Your task in this assessment is to carry out some data preprocessing and then to use the data from the first 14 104 records, to build a statistical model that will help you to: • Understand the variation in CO2 emissions between vehicles and over time; and • Estimate the CO2 emissions for each of the 7 052 records where you don’t have this information. Detailed instructions You may use either R or SAS for this assessment. 1. Read the data into your chosen software package, and carry out any necessary recoding and preprocessing. Some examples of this include: • For the CO2 variable, missing values are represented by −1. • One of the variables, TechnolType, contains character codes for any emissions- reducing ‘innovative technologies’ on a vehicle if these are present; this is supple- mented by variable ITReduction which gives, for any vehicle with such innovative technologies, the expected reduction in CO2 emissions (in grams per kilometre). 1 http://carfueldata.direct.gov.uk/additional/aug2017/VCA-Booklet-text-Aug-2017.pdf https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal https://en.wikipedia.org/wiki/Defeat_device For vehicles with no innovative technologies, the variable TechnolType is blank and ITReduction is recorded as −1. You need to think about a sensible way to handle these two variables, if you plan to use them. • The variable FuelType gives the type of fuel for the vehicle. However, the data have been compiled from different sources and the fuel type has not been entered consistently. To see this, look at the following table (produced in R): > table(EmissionsData$FuelType) Biodiesel diesel Diesel DIESEL 5 189 5985 4472 Diesel-electric E85 Electric ELECTRIC 6 44 10 6 LPG NG-biomethane NG-Biomethane NG-BIOMETHANE 229 125 1 33 petrol Petrol PETROL Petrol-electric 195 5665 4022 73 Petrol-Electric PETROL-ELECTRIC 23 73 You will need to figure out what to do about this (there are other similar examples in the data set). • The data were originally compiled from reports submitted separately by each individual member state. It is possible, therefore, that different member states submitted reports for exactly the same car model in any given year: these will lead to identical records in the data unless the member states carry out their own emissions inspections, which could potentially lead to different values of the CO2 variable for the same car model. You should look into this, and consider how best to deal with any duplicated values. Note, however, that no car model appears in more than one year in the data provided to you: each model appears only for the earliest year in which it appeared. There may be other examples as well: you need to check all of the variables carefully and ensure that you understand them, before starting your analysis.2 2. Carry out an exploratory analysis that will help you to start building a sensible sta- tistical model to understand and predict the CO2 emissions for each vehicle. This analysis should aim to identify an appropriate set of candidate variables to take into the subsequent modelling exercise, as well as to identify any important features of the data that may have some implications for the modelling. You will need to consider the context of the problem to guide your choice of exploratory analysis. See the ‘Hints’ below for some ideas. 2This preliminary ‘data screening’ is a vital part of any real-world statistical analysis. In case you think I should have done it for you: I just spent three full days cleaning up the original data downloaded from the EU web site, and I left only the easy bits for you to do! 2 3. Using your exploratory analysis as a starting point, develop a statistical model that enables you to predict the CO2 emissions for any vehicle based on (a subset of) its characteristics, and also to understand the variation of CO2 between vehicles. To be convincing, you will need to consider a range of models and to use an appropriate suite of diagnostics to assess them. Ultimately however, you are required to recommend a single model that is suitable for interpretation, and to justify your recommendation. Your chosen model should be either a linear model, a generalized linear model or a generalized additive model. 4. Use your chosen model to predict the CO2 emissions for each of the vehicles for which this information is missing, and also to estimate the standard deviation of your pre- diction errors. Submission for this assessment is electronic, via the STAT7001 Moodle page. You are required to submit three files, as follows: • A report on your analysis, not exceeding 2 500 words of text plus two pages of graphs and / or tables. The word count includes titles, footnotes, appendices, references etc. — in fact, it includes everything except the two pages of graphs / tables. Your report should be in four sections, as follows: I Describe, clearly but briefly, the decisions that you made about preprocessing the data before starting your analysis. You don’t need to go into details of exactly how you did it; just summarise what you did and why. II Describe briefly what aspects of the problem context you considered at the out- set, how you used these to start your exploratory analysis, and what were the important points to emerge from this exploratory analysis. III Describe briefly (without too many technical details) what models you considered in step (3) above, and why you chose the model that you did. IV State your final model clearly, summarise what your model tells you about the characteristics associated with variation of CO2 emissions between vehicles and over time, and discuss any potential limitations of the model. Your report should not include any computer code. It should include some graphs and / or tables, but only those that support your main points. Your report should be in PDF (recommended) or Word, and should be named as ########_rpt.pdf or ########_rpt.docx as appropriate, where ######## is your student ID number. For example, if your ID number is 150123456 and you are using PDF, your script should be named 150123456_rpt.pdf. • An R script or SAS program corresponding to your analysis and predictions. Your script /program should run without user intervention on any computer with R or SAS installed, providing the file EmissionsData
Apr 14, 2020
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here