MATH 2032 – Statistics Using R Project: COVID-19 pandemic data analysis You will be investigating a problem of a COVID-19 pandemic across the two submission points for the Project. Both submissions...

1 answer below »
With working code and Harvard style report



MATH 2032 – Statistics Using R Project: COVID-19 pandemic data analysis You will be investigating a problem of a COVID-19 pandemic across the two submission points for the Project. Both submissions will require you to use R to load a data set provided, prepare data for the analysis, answer a set of research questions, and prepare a formal report on your findings. You will use two separate data sets in the two parts, with the second being more challenging. Assessment 2 - Project Part 2: Student instructions REMEMBER: You will have to submit two files: 1. Written report as MS Word or PDF document; 2. R-script used to run data analysis. Both files will be assessed and have equal weighting. That is, 50% of the mark would come from the written report and 50% from the R-script. The data include two files and can be downloaded from the course site  Project_Data1.csv (COVID-19 daily information about the number of new cases, deaths, tests, vaccinations, etc. – the same data set as in Part 1)  Project_Data2.csv (general countries information – population, life expectancy, GDP, etc.) The data set was downloaded from Our World in Data website1 (ourworldindata.org). Please be aware, that there are some countries in both data sets that are not real countries, e.g., “Africa”, “Asia”, “High income”. You will need to exclude them from the analysis. Also, there are a lot of NA in the data – you have to make decisions how to deal with missing data and report it as a part of your analysis. Most of the variables are self-explanatory. Not all variables are required for the analysis in the Project. 1 Hannah Ritchie, Edouard Mathieu, Lucas Rodés-Guirao, Cameron Appel, Charlie Giattino, Esteban Ortiz- Ospina, Joe Hasell, Bobbie Macdonald, Diana Beltekian and Max Roser (2020) - "Coronavirus Pandemic (COVID-19)". Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/coronavirus' [Online Resource] 1. WRITTEN REPORT (50%) In your final report, you must present the following sections and address the points beneath them. Any graphs or tables must be labelled appropriately with meaningful titles. INTRODUCTION: (5%)  Write an introduction for the research problem (COVID-19 pandemic in the world) and the data set. Describe the data set, the variables, and the analyses used. There is no need to discuss variables that are not used for the analysis. ANALYSIS: (35%)  To analyse the progress of COVID-19 pandemic in terms of total cases in different countries you need to extract data for the following countries: Australia, China, India, Sweden, Russia, United Kingdom, United States. Then use ggplot2 to create a graph similar to the graph presented below. o You could see similar data visualisations in media. This graph was copied from https://thenewdaily.com.au/news/2020/03/31/australia-flattening-curve- coronavirus/. o Pay attention to details. Your data are much longer, and you want to show a full history, so your graph will be somewhat different. However, the beginning of your graph will be almost identical to this example. o You don’t have to add straight dashed lines, but you can do it as an extra challenge. o Provide brief discussion on the progress of COVID-19 based on your data visualisation. https://thenewdaily.com.au/news/2020/03/31/australia-flattening-curve-coronavirus/ https://thenewdaily.com.au/news/2020/03/31/australia-flattening-curve-coronavirus/  Combine data for COVID-19 information and countries statistics, provide numerical summaries and plot comparison graph of distributions of new cases per million of population for the same countries as above (Australia, China, India, Sweden, Russia, United Kingdom, United States). o Are there any patterns, differences and/or similarities? o Provide a brief statement describing your findings.  Study the possible relationship between total number of deaths per million of population and median age in the country. Use all countries in the data set. o Extract and transform the data, create appropriate data visualisation. o Run correlation and regression analysis. o Provide an interpretation and brief discussion on results.  Based on the variable “gdp_per_capita” create a new variable separating all countries in to three categories: rich – top 25% countries by GDP; average – above 50% but not in top 25%; poor – below 50% of countries. What is a total number of vaccinations per million of population for each category? o Provide appropriate visual and numerical summaries. o Provide a brief statement describing your findings. *NOTE: For the purpose of this assignment “brief statement” means one paragraph of 4 to 8 lines. SUMMARY/CONCLUSION: (5%)  Based on the data and your analysis, provide a brief discussion summarising/interpreting the results of the analyses you performed, and make a conclusion about the progress of COVID- 19, case and death numbers, vaccinations rates. PRESENTATION: (5%) You report need to include the following components: title, section headings (and sub-headings where appropriate) and page numbers. Tables and graphs must have appropriate headings. Report should be suitable for non-technical audience. IMPORTANT: Do not include R-code or screenshots with R output from RStudio in the Written Report. Your audience are not programmers, and they do not want to see the code. Your R-code and its output will be reviewed separately in the R- script you submit with the report. 2. R-SCRIPT FILE (50%) The second file you must submit for Assessment 2: Part 2 is the R-script file used to perform all analyses used in the report. As you prepare your final R-script file for submission, ensure you consider the following criteria: • Use a clear programming style with comments, meaningful variable and function names. • Use the correct code to support your answers for the requested analyses, including any manipulation of the data in order to conduct analyses. • Optimise your code o it might be impossible to avoid loops altogether, however you should aim to minimise use of loops where possible and use vectorisation. Be sure to comment and justify your use of loops where unavoidable. • Your code should run successfully on other computers without any changes. o Don’t use hard-coded path to data files. o You can assume that any required package is already installed, you just need to load it before using. MATH 2032 – Statistics Using R Project: COVID-19 pandemic data analysis Assessment 2 - Project Part 2: Student instructions
Answered 16 days AfterFeb 28, 2022

Answer To: MATH 2032 – Statistics Using R Project: COVID-19 pandemic data analysis You will be investigating a...

Namdeo Dnyandeo answered on Mar 11 2022
107 Votes
-
-
-
3/16/2022
library(dplyr)
library(ggplot2)
library(readr)
D_1 <- read_csv("New folder (2)/projectdata1.csv")
View(D_1)
dim(D_1)
## [1] 139839 7
D_2 = subset(D_1, location != "Africa" & location !="Asia" & location !="High income" )
D_3 = subset(D_2, location ==list("Australia", "China", "India", "Swedan", "Russia", "United Kingdom", "United States")
)
View(D_3)
View(head(D_3))
ggplot(D_3,aes(x=new_cases,color =location )) +
stat_bin(data=subset(D_3,location=="Australia"),aes(y=cumsum(..count..)),geom="line")+
stat_bin(data=subset(D_3,location=="China"),aes(y=cumsum(..count..)),geom="line")+
stat_bin(data=subset(D_3,location=="India"),aes(y=cumsum(..count..)),geom="line")+
stat_bin(data=subset(D_3,location=="Swedan"),aes(y=cumsum(..count..)),geom="line")+
stat_bin(data=subset(D_3,location=="Russia"),aes(y=cumsum(..count..)),geom="line")+
stat_bin(data=subset(D_3,location=="United Kingdom"),aes(y=cumsum(..count..)),geom="line")+
stat_bin(data=subset(D_3,location=="United States"),aes(y=cumsum(..count..)),geom="line")
T_1 = aggregate(D_2$new_cases,by =list(D_2$location), sum ,na.rm =T )
T_2 = aggregate(D_2$new_deaths,by =list(D_2$location), sum ,na.rm =T )
T_3 = aggregate(D_2$new_tests,by =list(D_2$location), sum ,na.rm =T )
T_4 = aggregate(D_2$new_vaccinations,by =list(D_2$location), sum ,na.rm =T )
View(T_1)
D_4 = cbind(T_1,T_2[,2],T_3[,2],T_4[,2])
Renaming Column Names
names(D_4)[names(D_4)== 'Group.1' ]='location'
names(D_4)[names(D_4)== 'x' ]='Total_cases'
names(D_4)[names(D_4)== 'T_2[, 2]' ]='Total_deaths'
names(D_4)[names(D_4)== 'T_3[, 2]' ]='Total_tests'
names(D_4)[names(D_4)== 'T_4[, 2]' ]='Total_vaccinations'
View(D_4)
dim(D_4)
## [1] 234 5
head (D_4)
## location Total_cases Total_deaths Total_tests Total_vaccinations
## 1 Afghanistan 157648 7328 0 6874
## 2 Albania 203925 3140 737014 1358191
## 3 Algeria 213058 6151 0 170786
## 4 Andorra 19440 133 0 4802
## 5 Angola 65404 1737 0 0
## 6 Anguilla 0 0 0 1421
Drop the observations who has no information
D_4=subset(D_4, Total_cases != 0 | Total_deaths !=0 | Total_tests !=0 |Total_vaccinations!=0 )
dim(D_4)
## [1] 217 5
#####Read Project_Data_2.csv data file ############
D_5 <- read_csv("New folder (2)/projectdata2.csv")
View(D_5)
is.na(D_5$location)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here