Note: You need to submit your answers in a word document. You need to transfer the results from the excel file into the word document. In addition, you must submit your Excel file (we prefer a single...


Note: You need to submit your answers in a word document. You need to transfer the results from the excel file into the word document. In addition, you must submit your Excel file (we prefer a single excel file with one or multiple worksheet for each question) but note that only the word document will be marked. If you think there is any issue or unclarity in any question, please make your assumptions (if there is any) and clearly explain them in your report.




You need to add the coversheet and sign it. Please write the name of your tutor as well as the name of your lecturer in the coversheet.


The analyses and the answers must be your own individual work without consultation of any other person. Also, you are not allowed to help/advise other students.



SECTION A: Discussion Questions (The word limit for each of the following four questions is 150 words) – You don’t need to provide any references it should be your own thinking and your own words.






1- Explain the overfitting concepts and how it can be avoided? (4 marks).


2- Give two practical examples on applications of predictive analytics in your area (e.g. supply chain or IT or Library or etc.). Provide detail explanations. You need to explain why you think predictive analytics can be used in those cases, you do not need to provide data or solve them. (4 marks).


3- Explain how missing values can be rectified if the type of missing values is “not missing at random”. Give one detail example in situation that you may observe not missing at random missing values. (4 marks).


4- Assume one of the explanatory variable (named X1) in your logistic regression is a categorical variable with the following levels: low, average and high, and another explanatory variable (named X2) is also categorical with the following levels: Sydney, Melbourne, Hobart and Brisbane. Explain how you will use them in developing your logistic regression model. How many coefficients you will have in your final model? (4 marks).



(4+4+4+4 = 16 marks)








SECTION B: QUANTITATIVE QUESTIONS


5- There are 500 client records in the first worksheet of the Excel file (provided for this assignment) who have shopped many special products from an e-Business website. Each record includes data on types of product purchased (between 1-5), purchase amount ($), age, gender, family size of the customer, whether the client has a membership and whether the customer has a discount card.



a) Develop a multiple regression model to predict the spend amount based on other variables in the data set. Write the final equation and interpret the coefficients of the model and explain the accuracy of the model?



b) What would be your recommendation to improve the accuracy of this model or improve the simplicity of the model while the accuracy is not significantly impacted?



(7+7=14 marks)







6- A company provides maintenance service for washing machines in Victoria. The collected data are presented in the Excel file (second worksheet).




a) Assume the manager asked you to analyse the data and provide him some insights and recommendations. The report should not exceed 2 pages. (8 marks)


b) Build a model to predict the repair time for a future booking service than needs to be done by James and it is a Mechanical repair. Do you suggest this service to be assigned to the morning shift or afternoon shift? (6 marks)


c) Which type of repair is more profitable for the company assuming the salary of the repairpersons are same. (2 marks)



(8+6+2 = 16 marks)











7- In worksheet 3, a dataset from blood bank is presented. The data are recorded for apheresis blood donation made by a group of donors of a period of time. The donor ID is unique for each donor. A donor might have donated more than once in this period. At each donation, the blood total protein level of the donor has been recorded. Use the dataset to answer the following questions:


a) There are some missing values for blood type. Think how you can fill in the missing values. Explain your approach (step by step) and also apply your approach and try to fill the missing value as much as possible in. (save the results in an Excel worksheet in and name it Question 7 Part a.) (4 marks)


b) Calculate the range of total protein for each blood type. Explain your approach (steps by step). Report them in a worksheet and name it Question 7 Part b. (4 marks)


c) Is the number of donations higher in Monday compare to it on Friday? (4 marks)


d) Present two best visualisation tool for this data that you think provide useful information? (4 marks)





(4+4+4+4= 16 marks)


8. The data presented in worksheet 4 is the results of a 4-year study conducted to assess how age, weight, and gender influence the risk of diabetes. Risk is interpreted as the probability (times 100) that the patient will have diabetes over the next 4-year period.







a)

What predictive model you suggest to relate risk of diabetes to the person’s age, weight and the gender. Why? (4 marks)




b)

Develop an estimated multiple regression model that relates risk of diabetes to the person’s age, weight, gender and life style. Present the regression formula as a mathematical equation. Interpret the coefficients of the regression and comment on the strength of the regression. (4 marks)




c)

What is the risk percentage of diabetes over the next 4 years for a 52-year-old woman living in a small town with 80 kg weight? (4 marks)



(4+4+4= 12 marks)







9- Matthew has a new job as business analyst. He plans to invest 15 percent of his annual salary after the tax into a retirement account at the end of every year for the next 25 years. Suppose that annual return is 4, and his current salary before tax is 80k which grow 3% per year. The tax will apply as 15% on the salary up to 50k and it is 20% for the salary interval of 50k and 80k and the tax rate will be 25% for the remaining salary more than 80k (for example if his salary will be 105k, he is paying 15% tax on his first 50k and 20% in the next 30 k and 25% on his next 25k of his salary). then:



a)
Create a spreadsheet which shows Matthew the balance of retirement account for various levels of annual investments and returns.





b)
If the annual return rate is uncertain and it is between 5% to 9% in any given year (hint: use = =RANDBETWEEN(2,9)/100) what would be the expected value (average) of Matthew balance after 25 years (use the model 10 times then report the average) .



(4+4 = 8 marks)









10-
let us consider the following validation data set confusion matrix is the result of a logistic regression model which will have a heart attack as a dependent variable which is connected to the range of independent variables. In this includes if the patient model y=1 indicates heart attack and y=0 indicates not having a heart attack. Cutoff value is considered 50 per cent.







a)
Calculate sensitivity, specificity and overall error of the model.




b)
Considering this confusion matrix do you think shall we make a change in the cut-off value? Justify your answer.








































Predicted class
















1




0




Actual class




1




2700




1000




0




70




3068






Jun 22, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here