Student Name: Ashish Muttepawar Student Number: XXXXXXXXXX Course: Master of Data Analytics Statistical Data Analysis (SIT 741) XXXXXXXXXXAssignment 1 Task 1: Obtaining ED demand Data Task 1.2: How...

you have to do assignment 2 and assignment 1 is supporting file.


Student Name: Ashish Muttepawar Student Number: 217437406 Course: Master of Data Analytics Statistical Data Analysis (SIT 741) Assignment 1 Task 1: Obtaining ED demand Data Task 1.2: How many rows are in the data? Answer: There are 366 rows in total in the given dataset of Emergency Department demand data. What data types are in the data? Answer: The data types of the data we have been given is ‘factor’. What time period does the data cover? Answer: Looking at the dataset I can say that the given data ranges from July-2013 to June-2014. What’s the difference between “Attendance” and “Admissions”? Answer: Answer: Attendance: - Attendance is any given number to a emergency patient to get registered in any manner in one of the electronic data collection systems, and they may include patients who are dead on arrival (DOA) or those who did not wait to be seen. Admission: - Admission are count of the patient coming to emergency department who are assigned the Triage numbers excluding the patients who got dead on arrival. What do the variables Tri_1, Tri_2, … represent? Answer: Those are the variables which are criteria for attendance which must be met before attending to the doctor and to be included in the report, again doing this patient automatically gets clerically registered. Task 2: Tidy Data Task 2.1 Cleaning up columns Answer: As we can see that we have printed out the list of Hospitals in the dataset. Task 2.2 Tidying data Does each variable have its own column? Answer: Yes, each variable in the obtained data frame has its own column Does each observation have its own row? Answer: Yes, each variable in the obtained data frame has its own row Does each value have its own cell? Answer: Yes, looking at the data I can say that each value has its own cell. Are the variables having the expected variable types in R? Clean up the data types. Answer: Yes, the variables are in expected variable type. Are there any missing values? Fix the missing data. Justify your actions. Answer: Yes, there were missing values in the dataset and they were replaced by zeroes in case of where there were a smaller number of zeroes in the dataset but where there were maximum missing values those columns were totally removed from the dataset as they don’t count any significance for further analysis. Task 3: Exploratory Data Analysis Task 3.1 Select a hospital and create a data set for only that hospital. Print out the hospital’s name, the total number of ED attendances and the total number of admissions. Answer: We selected Royal Perth Hospital in which the total number of attendances were 82862 and the total number of admissions were 35126. Task 3.2 For the hospital selected, if we want to compare the volume of ED demands across the year, which plot can we use? Show your plot and explain what the plot shows. (Hint: Which variables measure the ED demands?) Answer: We can use the line chart over Boxplot as it is showing us the ED demands across the year which says that the ED demands increases at the beginning of the trimester (every third month). Task 3.3 How do the ED demands change during a week? Show it visually. Answer: Weekly ED demands shows continuous increase in ED demands in first couple of days of a week following downward trend. Task 3.4 Which distributions are appropriate for modelling the ED demand? Which variables meet the assumptions for the Poisson distribution? (For simplicity, here we will make a “naive” assumption that counts on consecutive days are independent. We will relax this assumption later in the unit.) Answer: Task 4: Fitting distributions Task 4.1: Fit a Poisson distribution and a negative binomial distribution on Tri_1. You may use functions provided by the package fitdistrplus. Answer: Fitting Poisson and Negative Binomial distribution on the Tri_1 attribute of the given dataset which is stored in the dataframe gave us the results as follows; Task 4.2: Compare the log-likelihood of two fitted distributions. Which distribution fit the data better? Why? Answer: Poisson Distribution: -5909 (Log likelihood function), Negative Binomial Distribution: -3472 (Loglikelihood function). Its clearly seen that Negative Binomial distribution is the best model giving log likelihood value greater than Poisson Distribution loglikelihood values. Task 5: Research question There are more than one ways to fit a distribution to a set of numbers. Produce a short literature review on different distribution fitting methods, showing the pros and cons of each method. Answer: Task 6: Ethics question During your work, have you identified any issues that have ethical implications? Does it concern security or privacy? How do you mitigate the risk? Answer: No, in the given dataset there is no ethical implications and does not concern any security and privacy concerns as it does not involve any private information of any patient and there is no risk of any discrete information to be leaked. Task 7: Reflection What help did you receive from other students? What did you learn from them? Answer: I took help for fitting distribution where by I learned that how to access the metadata and how to plot fitted distribution on the data. Please estimate the mark that you will receive for assignment 1. Please provide both a point estimate and an interval estimate (a confidence interval). You don’t need to provide a mathematical model, but please explain how do you use conditional information to reach the estimates. Based on the conditional information, explain what you would have done differently to improve that mark? Answer: Am not expecting much from this assignment and the expected score will be around 50%-60% as can be seen clearly that I have not put much efforts in doing assignment and made delays in submission too which was due to some unforeseen incidence’s in my life. The interval mentioned is with 90% confidence and given that the questions I have answered above my score will be around 55%. I tried to answer as much as questions possible to improve my marks. Task 2.1 Cleaning up columns Answer: Task 3.2 For the hospital selected, if we want to compare the volume of ED demands across the year, which plot can we use? Show your plot and explain what the plot shows. (Hint: Which variables measure the ED demands?) Answer: We can use the line chart over Boxplot as it is showing us the ED demands across the year which says that the ED demands increases at the beginning of the trimester (every third month). Task 3.3 How do the ED demands change during a week? Show it visually. Answer: Weekly ED demands shows continuous increase in ED demands in first couple of days of a week following downward trend. Task 3.4 Which distributions are appropriate for modelling the ED demand? Which variables meet the assumptions for the Poisson distribution? (For simplicity, here we will make a “naive” assumption that counts on consecutive days are independent. We wi... Answer: Task 4: Fitting distributions Task 4.1: Fit a Poisson distribution and a negative binomial distribution on Tri_1. You may use functions provided by the package fitdistrplus. Answer: Fitting Poisson and Negative Binomial distribution on the Tri_1 attribute of the given dataset which is stored in the dataframe gave us the results as follows; Answer: Poisson Distribution: -5909 (Log likelihood function), Negative Binomial Distribution: -3472 (Loglikelihood function). Its clearly seen that Negative Binomial distribution is the best model giving log likelihood value greater than Poisson Dist... Task 5: Research question Answer: Task 6: Ethics question Answer: No, in the given dataset there is no ethical implications and does not concern any security and privacy concerns as it does not involve any private information of any patient and there is no risk of any discrete information to be leaked. Task 7: Reflection Student Name: Ashish Muttepawar Student Number: 217437406 Course: Master of Data Analytics Statistical Data Analysis (SIT 741) Assignment 1 Task 1: Obtaining ED demand Data Task 1.2: How many rows are in the data? Answer: There are 366 rows in total in the given dataset of Emergency Department demand data. What data types are in the data? Answer: The data types of the data we have been given is ‘factor’. What time period does the data cover? Answer: Looking at the dataset I can say that the given data ranges from July-2013 to June-2014. What’s the difference between “Attendance” and “Admissions”? Answer: Answer: Attendance: - Attendance is any given number to a emergency patient to get registered in any manner in one of the electronic data collection systems, and they may include patients who are dead on arrival (DOA) or those who did not wait to be seen. Admission: - Admission are count of the patient coming to emergency department who are assigned the Triage numbers excluding the patients who got dead on arrival. What do the variables Tri_1, Tri_2, … represent? Answer: Those are the variables which are criteria for attendance which must be met before attending to the doctor and to be included in the report, again doing this patient automatically gets clerically registered. Task 2: Tidy Data Task 2.1 Cleaning up columns Answer: As we can see that we have printed out the list of Hospitals in the dataset. Task 2.2 Tidying data Does each variable have its own column? Answer: Yes, each variable in the obtained data frame has its own column Does each observation have its own row? Answer: Yes, each variable in the obtained data frame has its own row Does each value have its own cell? Answer: Yes, looking at the data I can say that each value has its own cell. Are the variables having the expected variable types in R? Clean up the data types. Answer: Yes, the variables are in expected variable type. Are there any missing values? Fix the missing data. Justify your actions. Answer: Yes, there were missing values in the dataset and they were replaced by zeroes in case of where there were a smaller number of zeroes in the dataset but where there were maximum missing values those columns were totally removed from the dataset as they don’t count any significance for further
May 15, 2021SIT741Deakin University
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here