Background In Australia, we have experienced extreme heat in the year 2019. With the inevitable rise of extreme weather events, it is crucial that we better understand its potential impact on our...


Background


In Australia, we have experienced extreme heat in the year 2019. With the inevitable rise of extreme weather events, it is crucial that we better understand its potential impact on our everyday life.


In November 2016, a storm in Victoria triggered an unexpected surge of emergency department visits at the local public hospitals. Some consequences of this weather event were captured in this news article:




http://bit.ly/2gC8j6U


Apart from such storms, various weather events may affect the demand for care at our emergency departments (EDs). In SIT741, you will use publicly available data to understand the relationship between weather patterns and ED demands. Your analysis could provide crucial knowledge for resource planning at our health care systems.


Assignment 1 will focus on the analysis of ED demand data.


Task 1: Obtaining ED demand data (4 points)


First, let’s find data measuring ED demands. We will use theemergency departments admissions and attendancesdata set provided by the Department of Health of Western Australia:




http://data.gov.au/dataset/emergency-department-admissisons-and-attendances


Task 1.1 Download the data set using the link below.




http://bit.ly/2nkCUEh


Task 1.2 Answer the following questions:



  • How many rows and columns are in the data?

  • How many hospitals are in the data?

  • What data types are in the data?

  • What time period does the data cover?

  • What’s the difference between “Attendance” and “Admissions”?

  • What do the variablesTri_1,Tri_2, … represent?


Hint: You may need to consult the relevant background document, for example, the government webpage here:https://ww2.health.wa.gov.au/About-us/Policy-frameworks/Information-Management/Mandatory-requirements/Emergency-Department-and-Emergency-Services-Patient-Level-Data-Collection-and-Reporting.


Task 2: Tidy data (5 points)


Task 2.1 Cleaning up columns


You may notice that the ED csv file has two rows of heading. This is quite common in data generated by BI reporting tools. Let’s clean up the column names.



ed_data_link top_row second_row column_names <- second_row="" %="">% unlist(., use.names=FALSE) %>% make.unique(., sep = "__") # double underscore column_names[2:8] daily_attendance read_csv(ed_data_link, skip = 2, col_names = column_names)

Now print out a list of healthcare facilities (hospitals) in the data set.


Task 2.2 Tidying data



  1. Now we have a data frame. Answer the following questions for this data frame.



  • Does each variable have its own column?

  • Does each observation have its own row?

  • Does each value have its own cell?



  1. Use spreading and/or gathering to transform the data frame into tidy data. The key is to put data from the same measurement source in a column and to put each observation in a row. Please answer the following questions.



  • How many spreading operations do you need?

  • How many gathering operations do you need?

  • Explain the steps.




  1. Are the variables having the expected variable types in R? Clean up the data types.




  2. Are there any missing values? Fix the missing data. Justify your actions.




Task 3: Exploratory Data Analysis (5 points)


It is often a good idea to eyeball your data before fitting a model. The purpose is to understand the distribution of different measurements and their relations.


Task 3.1 Select a hospital


Select a hospital and create a data set for only that hospital. Print out the hospital’s name, the total number of ED attendances and the total number of admissions.


Task 3.2 For the hospital selected, if we want to compare the volume of ED demands across the year, which plot can we use? Show your plot and explain what the plot shows. (Hint: Which variables measure the ED demands?)


Task 3.3 How do the ED demands change during a week? Show it visually.


Task 3.4 Which distributions are appropriate for modelling the ED demand? Which variables meet the assumptions for the Poisson distribution? To reduce the dependence betweenconsecutive days, we randomly sample 200 records out of the whole dataset (all records for the selected hospital) for modelling.


Task 4: Fitting distributions (5 points)


As you may see in the previous step, although we are dealing with count data, a Poisson distribution may not provide a good fit. Actually, unconditional Poisson distribution is too restrictive for most real-world applications. In this task, we will fit a couple of distributions to the Triage2 attendance using the same sample of Task 3.4.


Task 4.1: Fitting distributions


Fit a Poisson distribution and a negative binomial distribution onTri_2. You may use functions provided by the packagefitdistrplus.


Task 4.2: Compare distributions


Compare the log-likelihood of two fitted distributions.


Which distribution fit the data better? Why?

Aug 22, 2021SIT741Deakin University
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here