Page Title SIT741 Problem Solving Task 1 Unit Chair: Sergiy Shelyag Due: 27 August 2021 Problem Solving Task 1 contributes to 20% of your final SIT741 mark. The full mark is 100. It must be completed...

1 answer below »
Statistical data analysis


Page Title SIT741 Problem Solving Task 1 Unit Chair: Sergiy Shelyag Due: 27 August 2021 Problem Solving Task 1 contributes to 20% of your final SIT741 mark. The full mark is 100. It must be completed individually and submitted to CloudDeakin before the due date: 8 pm, 27/08/2021 (Week 6 Friday). Learning goals In this assignment, you will work on a real-world problem to consolidate your learning in the first five weeks, including organise your data as tidy data and perform simple statistical analyses. This activity also serves as scaffolding for the upcoming Assignment 2. Please start early so that you can identify any skill/knowledge gap and seek support from the teaching staff and other students. Background In Australia, we have experienced extreme heat in the year 2019. With the inevitable rise of extreme weather events, it is crucial that we better understand its potential impact on our everyday life. In November 2016, a storm in Victoria triggered an unexpected surge of emergency department visits at the local public hospitals. Some consequences of this weather event were captured in this news article: http://bit.ly/2gC8j6U Apart from such storms, various weather events may affect the demand for care at our emergency departments (EDs). In SIT741, you will use publicly available data to understand the relationship between weather patterns and ED demands. Your analysis could provide crucial knowledge for resource planning at our health care systems. Assignment 1 will focus on the analysis of ED demand data. Task 1: Obtaining ED demand data (16 points) First, let’s find data measuring ED demands. We will use the emergency departments admissions and attendances data set provided by the Department of Health of Western Australia: http://data.gov.au/dataset/emergency-department-admissisons-and-attendances Task 1.1 Download the data set using the link below (4 points). http://bit.ly/2nkCUEh Task 1.2 Answer the following questions: How many rows and columns are in the data? (1 point) How many hospitals are in the data? (1 point) What data types are in the data? (Use data type selection tree and provide detailed explanation) (2 points for data types, 2 points for explanations) What time period does the data cover? (1 point) What’s the difference between “Attendance” and “Admissions”? (3 points) What do the variables Tri_1, Tri_2, … represent? (2 points) Hint: You may need to consult the relevant dataset description (see the link above). Task 2: Tidy data (20 points) Task 2.1 Cleaning up columns You may notice that the ED csv file has two rows of heading. This is quite common in data generated by BI reporting tools. Let’s clean up the column names. ed_data_link <- 'govhack3.csv'="" top_row=""><- read_csv(ed_data_link,="" col_names="FALSE," n_max="1)" second_row=""><- read_csv(ed_data_link,="" n_max="1)" column_names=""><- second_row="" %="">% unlist(., use.names=FALSE) %>% make.unique(., sep = "__") # double underscore column_names[2:8] <- str_c(column_names[2:8],="" '0',="" sep='__' )="" daily_attendance=""><- read_csv(ed_data_link, skip = 2, col_names = column_names) now print out a list of healthcare facilities (hospitals) in the data set. (1 point) task 2.2 tidying data now we have a data frame. answer the following questions for this data frame. does each variable have its own column? (1 point) does each observation have its own row? (1 point) does each value have its own cell? (1 point) use spreading and/or gathering (or their pivot_wider and pivot_longer new equivalents) to transform the data frame into tidy data (6 points). the key is to put data from the same measurement source in a column and to put each observation in a row. please answer the following questions. how many spreading (or pivot_wider) operations do you need? (1 point) how many gathering (or pivot_longer) operations do you need? (1 point) explain the steps in detail. (3 points) are the variables having the expected variable types in r? clean up the data types. (3 points) are there any missing values? fix the missing data. justify your actions. (2 points) task 3: exploratory data analysis (20 points) it is often a good idea to visually check your data before fitting a model. the purpose is to understand the distribution of different measurements and relations between them. task 3.1 select a hospital select a hospital and create a dataset for only the selected hospital. (1 point) print out the hospital’s name (1 point), the total number of ed attendances (1 point), and the total number of admissions (1 point). check if the total number of ed attendances corresponds to the total number of triaged patients and print the difference (2 points). task 3.2 for the hospital selected, if we want to compare the volume of ed demands across the year, which plot can we use? show your plot and explain what the plot shows. (hint: which variable measures the ed demands?) (3 points) task 3.3 how do the ed demands change during a week? show it visually using violin plots (2 points), describe the results (2 points) and provide your interpretation (2 points). task 3.4 use skimrand fitdistrplus libraries to answer the following questions. which distributions are appropriate for modelling the ed demand? (1 point) which variables meet the assumptions for the poisson distribution and why? (2 points) to reduce the dependence between consecutive days, randomly sample 150 records out of the whole dataset (all records for the selected hospital) for modelling (2 points). task 4: fitting distributions (20 points) as you may have seen in the previous step, although we are dealing with count data, a poisson distribution may not provide a good fit. actually, unconditional poisson distribution is too restrictive for most real-world applications. in this task, we will fit a couple of distributions to the triage 3 attendance using the same sample of task 3.4. task 4.1: fitting distributions (4 points) fit a poisson distribution and a negative binomial distribution on tri_3. you may use functions provided by the package fitdistrplus. task 4.2: compare distributions (6 points) compare the log-likelihood of two fitted distributions. which distribution fit the data better? why? task 4.3: try other distributions (research question 1) (10 points) find which distributions r stats library includes. try to fit some of them to different triage variables. analyse and explain the results. write a short report (200 words). task 5: research question 2 (15 points) there are more than one ways to fit a distribution to a set of numbers. produce a short literature review on different distribution fitting methods, showing the pros and cons of each method. 5 points will be given to relevance of the literature. 7 points will be given for the quality of comparative analysis of distribution fitting methods. 3 points will be given for the quality of presentation. task 6: ethics question (7 points) during your work, have you identified any issues that have ethical implications? (2 points) does it concern security or privacy? (2 points) how was the risk mitigated? (3 points) task 7: reflection (2 points) answer the following questions: what help did you receive from other students? what did you learn from them? (1 point) please estimate the mark that you will receive for assignment 1. please provide both a point estimate and an interval estimate (a confidence interval). you don’t need to provide a mathematical model, but please explain how do you use conditional information to reach the estimates. based on the conditional information, explain what you would have done differently to improve that mark? (1 point) what to submit by the due date, you are required to submit the following files to the assignment dropbox in clouddeakin. an ms word or pdf file containing your answers to all the assignment questions. an r notebook file assignment1_submission.rmd filled in with the script for your calculations. the file should be able to run. include sufficient comments so that the script can be understood by the marker. indicate all the packages that need to be installed separately. marking criteria your submission will be marked using the following criteria. showing good effort through completed tasks. applying statistical thinking to understand the problems and to identify solutions. applying statistical programming skills to obtain data and to process them for data analysis. applying visualisation techniques to discover distribution patterns and relationships among variables. demonstrating creativity and resourcefulness in solutions. showing attention to details through a good quality assignment report. read_csv(ed_data_link,="" skip="2," col_names="column_names)" now="" print="" out="" a="" list="" of="" healthcare="" facilities="" (hospitals)="" in="" the="" data="" set.="" (1="" point)="" task="" 2.2="" tidying="" data="" now="" we="" have="" a="" data="" frame.="" answer="" the="" following="" questions="" for="" this="" data="" frame.="" does="" each="" variable="" have="" its="" own="" column?="" (1="" point)="" does="" each="" observation="" have="" its="" own="" row?="" (1="" point)="" does="" each="" value="" have="" its="" own="" cell?="" (1="" point)="" use="" spreading="" and/or="" gathering="" (or="" their="" pivot_wider="" and="" pivot_longer="" new="" equivalents)="" to="" transform="" the="" data="" frame="" into="" tidy="" data="" (6="" points).="" the="" key="" is="" to="" put="" data="" from="" the="" same="" measurement="" source="" in="" a="" column="" and="" to="" put="" each="" observation="" in="" a="" row.="" please="" answer="" the="" following="" questions.="" how="" many="" spreading="" (or="" pivot_wider)="" operations="" do="" you="" need?="" (1="" point)="" how="" many="" gathering="" (or="" pivot_longer)="" operations="" do="" you="" need?="" (1="" point)="" explain="" the="" steps="" in="" detail.="" (3="" points)="" are="" the="" variables="" having="" the="" expected="" variable="" types="" in="" r?="" clean="" up="" the="" data="" types.="" (3="" points)="" are="" there="" any="" missing="" values?="" fix="" the="" missing="" data.="" justify="" your="" actions.="" (2="" points)="" task="" 3:="" exploratory="" data="" analysis="" (20="" points)="" it="" is="" often="" a="" good="" idea="" to="" visually="" check="" your="" data="" before="" fitting="" a="" model.="" the="" purpose="" is="" to="" understand="" the="" distribution="" of="" different="" measurements="" and="" relations="" between="" them.="" task="" 3.1="" select="" a="" hospital="" select="" a="" hospital="" and="" create="" a="" dataset="" for="" only="" the="" selected="" hospital.="" (1="" point)="" print="" out="" the="" hospital’s="" name="" (1="" point),="" the="" total="" number="" of="" ed="" attendances="" (1="" point),="" and="" the="" total="" number="" of="" admissions="" (1="" point).="" check="" if="" the="" total="" number="" of="" ed="" attendances="" corresponds="" to="" the="" total="" number="" of="" triaged="" patients="" and="" print="" the="" difference="" (2="" points).="" task="" 3.2="" for="" the="" hospital="" selected,="" if="" we="" want="" to="" compare="" the="" volume="" of="" ed="" demands="" across="" the="" year,="" which="" plot="" can="" we="" use?="" show="" your="" plot="" and="" explain="" what="" the="" plot="" shows.="" (hint:="" which="" variable="" measures="" the="" ed="" demands?)="" (3="" points)="" task="" 3.3="" how="" do="" the="" ed="" demands="" change="" during="" a="" week?="" show="" it="" visually="" using="" violin="" plots="" (2="" points),="" describe="" the="" results="" (2="" points)="" and="" provide="" your="" interpretation="" (2="" points).="" task="" 3.4="" use skimrand fitdistrplus="" libraries="" to="" answer="" the="" following="" questions.="" which="" distributions="" are="" appropriate="" for="" modelling="" the="" ed="" demand?="" (1="" point)="" which="" variables="" meet="" the="" assumptions="" for="" the="" poisson="" distribution="" and="" why?="" (2="" points)="" to="" reduce="" the="" dependence="" between="" consecutive="" days,="" randomly="" sample="" 150="" records="" out="" of="" the="" whole="" dataset="" (all="" records="" for="" the="" selected="" hospital)="" for="" modelling="" (2="" points).="" task="" 4:="" fitting="" distributions="" (20="" points)="" as="" you="" may="" have="" seen="" in="" the="" previous="" step,="" although="" we="" are="" dealing="" with="" count="" data,="" a="" poisson="" distribution="" may="" not="" provide="" a="" good="" fit.="" actually,="" unconditional="" poisson="" distribution="" is="" too="" restrictive="" for="" most="" real-world="" applications.="" in="" this="" task,="" we="" will="" fit="" a="" couple="" of="" distributions="" to="" the="" triage="" 3="" attendance="" using="" the="" same="" sample="" of="" task="" 3.4.="" task="" 4.1:="" fitting="" distributions="" (4="" points)="" fit="" a="" poisson="" distribution="" and="" a="" negative="" binomial="" distribution="" on="" tri_3.="" you="" may="" use="" functions="" provided="" by="" the="" package="" fitdistrplus.="" task="" 4.2:="" compare="" distributions="" (6="" points)="" compare="" the="" log-likelihood="" of="" two="" fitted="" distributions.="" which="" distribution="" fit="" the="" data="" better?="" why?="" task="" 4.3:="" try="" other="" distributions="" (research="" question="" 1)="" (10="" points)="" find="" which="" distributions="" r="" stats="" library="" includes.="" try="" to="" fit="" some="" of="" them="" to="" different="" triage="" variables.="" analyse="" and="" explain="" the="" results.="" write="" a="" short="" report="" (200="" words).="" task="" 5:="" research="" question="" 2="" (15="" points)="" there="" are="" more="" than="" one="" ways="" to="" fit="" a="" distribution="" to="" a="" set="" of="" numbers.="" produce="" a="" short="" literature="" review="" on="" different="" distribution="" fitting="" methods,="" showing="" the="" pros="" and="" cons="" of="" each="" method.="" 5="" points="" will="" be="" given="" to="" relevance="" of="" the="" literature.="" 7 points="" will="" be="" given="" for="" the="" quality="" of="" comparative="" analysis="" of="" distribution="" fitting="" methods.="" 3 points="" will="" be="" given="" for="" the="" quality="" of="" presentation.="" task="" 6:="" ethics="" question="" (7="" points)="" during="" your="" work,="" have="" you="" identified="" any="" issues="" that="" have="" ethical="" implications?="" (2="" points)="" does="" it="" concern="" security="" or="" privacy?="" (2="" points)="" how="" was="" the="" risk="" mitigated?="" (3="" points)="" task="" 7:="" reflection="" (2="" points)="" answer="" the="" following="" questions:="" what="" help="" did="" you="" receive="" from="" other="" students?="" what="" did="" you="" learn="" from="" them?="" (1="" point)="" please="" estimate="" the="" mark="" that="" you="" will="" receive="" for="" assignment="" 1.="" please="" provide="" both="" a="" point="" estimate="" and="" an="" interval="" estimate="" (a="" confidence="" interval).="" you="" don’t="" need="" to="" provide="" a="" mathematical="" model,="" but="" please="" explain="" how="" do="" you="" use="" conditional="" information="" to="" reach="" the="" estimates.="" based="" on="" the="" conditional="" information,="" explain="" what="" you="" would="" have="" done="" differently="" to="" improve="" that="" mark?="" (1="" point)="" what="" to="" submit="" by="" the="" due="" date,="" you="" are="" required="" to="" submit="" the="" following="" files="" to="" the="" assignment="" dropbox="" in="" clouddeakin.="" an="" ms="" word="" or="" pdf="" file="" containing="" your="" answers="" to="" all="" the="" assignment="" questions.="" an="" r="" notebook="" file="" assignment1_submission.rmd="" filled="" in="" with="" the="" script="" for="" your="" calculations.="" the="" file="" should="" be="" able="" to="" run.="" include="" sufficient="" comments="" so="" that="" the="" script="" can="" be="" understood="" by="" the="" marker.="" indicate="" all="" the="" packages="" that="" need="" to="" be="" installed="" separately.="" marking="" criteria="" your="" submission="" will="" be="" marked="" using="" the="" following="" criteria.="" showing="" good="" effort="" through="" completed="" tasks.="" applying="" statistical="" thinking="" to="" understand="" the="" problems="" and="" to="" identify="" solutions.="" applying="" statistical="" programming="" skills="" to="" obtain="" data="" and="" to="" process="" them="" for="" data="" analysis.="" applying="" visualisation="" techniques="" to="" discover="" distribution="" patterns="" and="" relationships="" among="" variables.="" demonstrating="" creativity="" and="" resourcefulness="" in="" solutions.="" showing="" attention="" to="" details="" through="" a="" good="" quality="" assignment="">
Answered 1 days AfterAug 21, 2021

Answer To: Page Title SIT741 Problem Solving Task 1 Unit Chair: Sergiy Shelyag Due: 27 August 2021 Problem...

Subhanbasha answered on Aug 22 2021
140 Votes
## installing required packages
install.packages("readr")
install.packages("stringr")
install.pac
kages("dplyr")
install.packages("tidyr")
install.packages("ggplot")
install.packages("lubridate")
install.packages("fitdistrplus")
## calling packages
library(readr)
library(dplyr)
library(stringr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(fitdistrplus)
library(stats)
# Task 1
## Reading data into R
Ed_demand<-read.csv("govhack3.csv",header = TRUE, sep = ',',skip=1)
## dimension of the data
dim(Ed_demand)
## finding the number of columns
length(Ed_demand[1,])
## showing the data types of each variable
str(Ed_demand)
##changing the data type
Ed_demand$Date<-as.Date(Ed_demand$Date,format = "%d-%b-%Y")
## showing the minimum date
min(Ed_demand$Date)
##showing the maximum date
max(Ed_demand$Date)
# Task 2
## cleaning the columns as required format
ed_data_link <-'govhack3.csv'
top_row <- read_csv(ed_data_link, col_names = FALSE, n_max =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here