Refer to the pdf attachedAssignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 ...

Question

Refer to the pdf attachedAssignment-1 MIS771 Descriptive Analytics and Visualisations  Page 1 of 9  MIS771 Descriptive Analytics and Visualisation  DEPARTMENT OF INFORMATION SYSTEMS AND BUSINESS ANALYTICS  DEAKIN BUSINESS SCHOOL  FACULTY OF BUSINESS AND LAW, DEAKIN UNIVERSITY   Assignment Two  Background  This is an individual assignment. You need to analyse the given dataset and then interpret and draw  conclusions from your analysis. You then need to convey your findings in a written report to an  expert in Business Analytics.     Percentage of the final grade 35%  The Due Date and Time 8 pm Thursday 20th May 2021    Submission instructions  The assignment must be submitted by the due date, electronically in CloudDeakin. When submitting  electronically, you must check that you have submitted the work correctly by following the  instructions provided in CloudDeakin. Please note that we will NOT accept any paper or email copies  or part of the assignment submitted after the due date.  Information for students seeking an extension BEFORE the due date  If you wish to seek an extension for this assignment before the due date, you need to apply directly  to the Unit Chair by completing the Assignment and Online Test Extension Application Form before  Thursday 5 pm 20th May 2021. Please make sure you attach all supporting documentation and a  draft of your assignment. The request for an extension needs to occur as soon as you become aware  that you will have difficulty meeting the due date.  Please note: Unit Chairs can only grant extensions up to two weeks beyond the original due date. If  you require more than two weeks or have already been provided with an extension by the Unit Chair  and require additional time, you must apply for Special Consideration via StudentConnect within  three business days of the due date.  Conditions under which an extension will usually be considered include:  • Medical – to cover medical conditions of a severe nature, e.g. hospitalisation, severe injury or  chronic illness.  Note: temporary minor ailments such as headaches, colds, and minor gastric upsets are not  severe medical conditions and are unlikely to be accepted. However, severe cases of these  may be considered.  • Compassionate – e.g. death of a close family member, significant family and relationship  problems.  • Hardship/Trauma – e.g. sudden loss or gain of employment, severe disruption to domestic  arrangements, a victim of crime.  Note: misreading the due date, assignment anxiety, or multiple assignments will not be accepted as  grounds for consideration. https://www.deakin.edu.au/students/faculties/buslaw/student-support/assignment-extensions MIS771 Descriptive Analytics and Visualisations  Page 2 of 9  Information for students seeking an extension AFTER the due date  If the due date has passed, you require more than two weeks extension, or you have already been  provided with an extension and require additional time, you must apply for Special Consideration via  StudentConnect. Please be aware that applications are governed by University procedures and must  be submitted within three business days of the due date or extension due date.  Please be aware that in most instances, the maximum amount of time that can be granted for an  assignment extension is three weeks after the due date, as Unit Chairs are required to have all  assignment submitted before results/feedback can be released back to students.  Penalties for late submission  The following marking penalties will apply if you submit an assessment task after the due date  without an approved extension:  • 5% will be deducted from available marks for each day, or part thereof, up to five days.  • Work submitted more than five days after the due date will not be marked; you will receive  0% for the task.  Note: 'Day' means calendar day.  The Unit Chair may refuse to accept a late submission where it is unreasonable or impracticable to  assess the task after the due date.  Additional information: For advice regarding academic misconduct, special consideration,  extensions, and assessment feedback, please refer to the document "Rights and responsibilities as a  student" in the "Unit Guide and Information" folder under the "Resources" section in the MIS771  CloudDeakin site.  The assignment uses the dataset file A2T12021.xlsx, which can be downloaded from CloudDeakin.  Analysis of the data requires the use of techniques studied in Module-2.       MIS771 Descriptive Analytics and Visualisations  Page 3 of 9  Assurance of Learning  This assignment assesses the following Graduate Learning Outcomes and related Unit Learning  Outcomes:  Graduate Learning Outcome (GLO) Unit Learning Outcome (ULO)  GLO1: Discipline-specific knowledge and  capabilities - appropriate to the level of  study related to a discipline or  profession.  GLO2: Communication - using oral, written and  interpersonal communication to inform,  motivate and effect change    GLO5: Problem Solving - creating solutions to  authentic (real world and ill-defined)  problems.    GLO6: Self-Management - working and learning  independently, and taking responsibility  for personal actions      ULO 1: Apply quantitative reasoning skills to  solve complex problems.    ULO 2: Plan, monitor, and evaluate own  learning as a data analyst.    ULO 3: Deduce clear and unambiguous  solutions in a form that they useful for  decision making and research purposes  and for communication to the wider  public.         Feedback before submission  You can seek assistance from the teaching staff to ascertain whether the assignment conforms to  submission guidelines.  Feedback after submission  An overall mark, together with feedback, will be released via CloudDeakin, usually within 15  working days. You are expected to refer and compare your answers to the feedback to understand  any areas of improvement.     MIS771 Descriptive Analytics and Visualisations  Page 4 of 9  The Case Study  RogerLake is a leading Australian supermarket chain with 500 stores. Originating from a family-based  general store, RogerLake now has stores all over Australia, with the first one being established in  1974. Individual store managers of RogerLake have wide-ranging powers about the day-to-day  operations of their stores. However, RogerLake's strategic planning and direction take place in the  company Head Office in Adelaide.    RogerLake is anticipating a shift in the business climate within the next five years. The Head Office  team is keen to implement the changes introduced during COVID-19 across the supermarket chain.  They are confused about the store manager's lack of enthusiasm to open their stores 24x7 or launch  an accompanying eStore, given that the Head office has invested heavily in a digital platform, self- checkout machines and staff.    Subsequently, the Head Office management team has approached ANALYTICS7 and asked them to  conduct a study to understand the characteristics of RogerLake stores and their business  performance.    The Data   For this study, ANALYTICS7 has collected two sets of Data:   1. The first dataset is a random sample of 150 stores extracted from the company's data mart.  A complete listing of variables, definitions, and an explanation of their coding are provided  in Working Sheet "Variable Description."  2. The second dataset is about quarterly sales of RogerLake stores. The details of the Time- Series data is available on Working Sheet "Quarterly Sales."     Your Role in ANALYTICS7   You are a modeller at ANALYTICS7. The team leader (Hugo Barra – MBA and MSc in DataScience) has  asked you to lead the modelling component for the RogerLake project. Your need to review and  complete the modelling activities as per the document. The minutes of the team meeting is below. MIS771 Descriptive Analytics and Visualisations  Page 5 of 9   Form 210-3   ANALYTICS7 Team Meeting    ANALYTICS7  727 Collins St, Docklands VIC 3008  Phone: (+61 3 212 66 000)  infor@analytics7.com.au  Reference AP-210 RogerLake Project  Revised 24th April 2021  Level Expert Analysis    Meeting Chair Hugo Barra  Date 24 April 2021 Time 11:00 AM Location ANALYTICS7 L4.340  Topic RogerLake Research Project – Analytics Details    Meeting  Purpose:  Specifying and Allocating Data Analytics Tasks  Discussion  items:          • Modelling Store Sales.  • Modelling the likelihood of a store opening 24x7  • Modelling the likelihood of a store launching an accompanying eStore  • Forecasting Quarterly Sales for the upcoming four quarters.  • Producing a technical report.  Detailed  Action  Items        Who:   Modeller      What:  1. Build a regression model to estimate Store Sales.  2. Hugo has performed a separate regression analysis and found that the number  of competitors is a significant predictor of Store Sales. He believes that the  relationship between Store Sales and the number of competitors should be  weaker for those stores that are open 24x7. Model the interaction between the  variables to test Hugo's assumption and comment whether there is sufficient  evidence to conclude that the interaction term is statistically significant in the  model.  3. Build a model to predict the likelihood of a store opening 24x7.  4. Finalise Hugo's model to predict the likelihood of a store launching an eStore.  4.1. Hugo has completed the initial analysis for this task. He has narrowed down  the key predictors of the likelihood of a store launching an eStore to  "Manager's Age, Experience and Gender". Your task is to continue his work  and develop a model to ascertain the "likelihood of a store launching an  eStore".   4.2. Hugo is specifically interested in understanding the probability of stores  that meet the following criteria to launch an eStore:   Those stores with managers,  a) in their mid-thirties;  b) with varying levels of managerial experience (i.e. 2-16 years?);  mailto:infor@analytics7.com.au MIS771 Descriptive Analytics and Visualisations  Page 6 of 9  c) and across both male and female store managers.  He believes that the store manager's age, managerial experience, and  gender may influence the decision to launch an eStore. RogerLake wishes to  know whether to recruit tech-savvy young store managers for their stores.  Accordingly, your job is to visualise the predicted probability of launching an  eStore with the attributes described earlier.  5. Develop a time-series model to forecast RogerLake's Sales for the next four  quarters.   6. Write a report detailing all aspects of the analysis above (items 1-5).   The report should be as

Subhanbasha · Accepted Answer

Technical Report
Introduction:
	The main aim of this analysis is to find the pattern or the trend of the supermarkets in Australia which is originating the family based general stores. They have 500 stores which they want to know the trend or the present situation of the supermarkets in Australia. After the Covid-19 the life style of the people is entirely changed. So we need to grasp that pattern followed by the people to turn the stores into profitable way.
	Here we can do some statistical analysis to tackle the above problem which is facing by the head office of the supermarkets. By using this analysis we can suggest or recommend them to further steps taken by them. The statistical analysis is about to regression analysis, visualization and using forecasting techniques to know the trend of the sales for upcoming days, weeks or months. And also we can suggest that is there any need to open new stores in particular locations or need to open the supermarkets 24x7 in the existing areas because in now a days of the Covid-19 most of the governments are going to take action like lockdown and some restrictions on the markets.
	We will do the analysis step by step by using the appropriate variables or features in the model building which are significant in the model. This will help us to find the trends of the markets. And also we can make decisions about the new stores or existing stores. The next analysis will be the forecasting part which will help us to know the future performance of the stores by quarterly. By forecasting the future sales of the supermarket will give us the glance and we can also take action according to the futures forecasted sales of supermarket.
The steps will took in the analysis is as follows
1. Regression analysis with default parameters
2. Regression analysis with appropriate parameters
3. Visualization of the results
4. Visualization of the probabilities of the regression ( final model)
5. Time series analysis and forecasting
By doing all the mentioned above analysis we can make decision about the supermarkets which we can suggest to the head office of the supermarkets.
Analysis:
1. Regression model to estimate Store Sales:
	By using the given data of sample of supermarkets we do regression analysis with the default parameters that is by using all the independent variables.
We performed the regression analysis by using all independent variables and sales as a dependent variable. The output as follows
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.928556709
	R Square
	0.862217562
	Adjusted R Square
	0.846794155
	Standard Error
	1.397739139
	Observations
	150
 The above output is all about the entire model performance on the data. 
Here the R square value is 0.8622 which means that the independent variable which we used in the model is explaining the variation the dependent variables that is sales is 86.22% which is pretty good model.  The multiple R square values is 0.9285 which means that there is 92.85% correlation between the dependent and independent variables which means that there is chance of 92.85% when we increase the independent variables the dependent variables sales will increase.
	The Adjusted R square is 0.8467 which is also same as the interpretation of R square but the difference is when we increase the unnecessary or un related variables in the model the R square values will increase but the Adjusted R square values won’t increase it will increase only when the related variables included in the model. So, here the considerable or the identical measure for accuracy of the model is Adjusted R square. The standard error also little bit high.
The next part of the output is Anova of the regression model which will help us to find the above given measures it is not considerable.
The next part of the output will say about the each independent variable behavior and their usage in the model. From this we can find the appropriate variables in the model that means which variables will be useful to find the variations in the sales.
	We can consider the best variable by using the p value the thumb rule is which variables have the p values less than 0.05 those are significant in the model. Here we can also use this method to identify the significant variables in the model.
 The variables wage, Number of Competitors, Gender Manager and Age Manager are the significant variables for the model so we can use those variables only into the model.
By using the above variables only into the model output as follows.
	SUMMARY OUTPUT
	
	
	Regression Statistics
	Multiple R
	0.846659199
	R Square
	0.7168318
	Adjusted R Square
	0.711013275
	Standard Error
	1.919673658
	Observations
	150
Here the R square value is 0.7168 and the Adjusted R square value is 0.7110 by comparing to the above that is default model the accuracy measures are very low.
The main reason for this is if we use the more relevant variables in the model then model will learn in better way and will give the good amount of accuracy. But here we are used 3 variables only in the model. From the above interpretation of p values is correct but there may be some interaction effects between the variables or there may be Multicollinearity present.
We can get to know that there are some other variables useful to find the variation in the sales so we can find those variables and include in the regression to get better result.
Here we included the variables wage, Advertising_Expense_...000., Number_of_Staff,         Age_of_the_Store..Yrs.,  Number_Of_Competitors., Hours_Trading., Parking_.Spaces         Membership_Union.., Open_24X7.
By using this variables in the model the output of the model as below
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.891968804
	R Square
	0.795608348
	Adjusted R Square
	0.782468885
	Standard Error
	1.665517314
	Observations
	150
The model performance is better than the above model where we used only three variables in the model but not better than the default model where we used all the variables in the model.
The output also shown that the variables Hours_Trading., Parking_.Spaces, mbership_Union.., Open_24X7 are not significant in the model. Though if we not using these variables then the accuracy is going off.
In the next step we will add another set of variables in the model then we will see the performance of the model.
By including the Experience_Manager and eStore into the above model the total performance of the model is 
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.915194156
	R Square
	0.837580344
	Adjusted R Square
	0.824633849
	Standard Error
	1.495413659
	Observations
	150
From the above output of the regression the accuracy of the model is somehow better than the above model. Here the R square value is 0.8375 and the Adjusted R square is 0.8246 and also standard error is low comparing to the above model.
The above normal probability plot also showing that the errors following approximately normal.
Though here some of the variables are not significant in the model we continue with this to better performance of the model which will help us to observe the variations in the dependent variables that is sales.
2. Regression model by using only two independent variables:
In the next step we are going to develop a regression model using the variables which we discussed in the team meeting. The variables are Number_Of_Competitors and Open_24X7 and to know there is any interaction effect on the sales we are created the interaction variable by using the above two variables. 
	Now we are going to run the regression model which is having the interaction term column to know the significance.
The regression output as follows
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.599219472
	R Square
	0.359063976
	Adjusted R Square
	0.345894057
	Standard Error
	2.888101859
	Observations
	150
It is giving low performance of the model.
The co efficient output table as follows
	 
	Coefficients
	Standard Error
	t Stat
	P-value
	Intercept
	8.154192
	0.853889
	9.549475
	4.34E-17
	Number_Of_Competitors:
	1.247082
	0.317329
	3.929934
	0.000131
	Open_24X7
	7.857261
	1.036783
	7.578502
	3.71E-12
	Intercation term
	-2.71651
	0.366981
	-7.40232
	9.78E-12
Here the p value indicates that the significance of the variables. Here the interaction term having the p values less than 0.05 so we can say that the interaction term is the significant difference in the regression model.
Here we can also see that the co efficient of the variables Open_24X7 is higher than Number_Of_Competitors which means that the store which is opened by 24x7 that store will have the high sales than the not opened 24x7.
Here also the errors are following normal distribution which is the one of the assumptions of regression.
So, we can say that there is sufficient evidence that the interaction term is significantly different in the model.
3. Model to predict the likelihood of a store opening 24x7:
Now we are going to develop a regression model to predict the likelihood of the store opening 24x7. For this we use the variables open 24x7 as the dependent and all other variables as independent variables. The model output as follows.
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.414756081
	R Square
	0.172022607
	Adjusted R Square
	0.07933857
	Standard Error
	0.462108234
	Observations
	150
The about output clearly showing the model performing very poor because the R square value is 0.1720 and the Adjusted R square value is 0.0793 which means there is only 7.9% explaining the independent variables to the dependent variable that is store open 24x7.
	From the co efficient table we can see the significance of the variables we can remove the non-significant variables from the model.
Here the variables Gross_Profit ($m), Number_Of_Competitors, Age_Manager and eStore are the significant variables in the model. So we use these variables in the model and see the model performance.
The model output as follows
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.286999302
	R Square
	0.082368599
	Adjusted R Square
	0.057054629
	Standard Error
	0.467667294
	Observations
	150
This model is giving low performance than the above model.
Next we are going to add some other variables to the existing model that is Number_of_Staff, Sales ($m), Age_Manager then we can see the performance of the model.
The model output is as follows
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.304552311
	R Square
	0.09275211
	Adjusted R Square
	0.061250447
	Standard Error
	0.466625646
	Observations
	150
Compare to the above two models the model which we executed now is the not that much of good model. So we can use the default model that is using all variables as a independent variables.
4. Model for predicting store launching an eStore:
	Here we use the Managers age, experience and gender as a independent variables to make a model and the dependent variable is eStore.
This is about the logistic regression because the dependent variables is categorical that is it have two levels that is 0 and 1 that mean presence of estore and non-presence of estore.
	The model output as follows
	SUMMARY OUTPUT
	
	
	
	Regression Statistics
	Multiple R
	0.600514715
	R Square
	0.360617923
	Adjusted R Square
	0.347479935
	Standard Error
	0.399112521
	Observations
	150
 The above output is showing that there is only 34.74% accuracy of the model. The R square values also 36.06% which is somehow better but not good accuracy.
The co efficient table as follows
	 
	Coefficients
	Standard Error
	t Stat
	P-value
	Intercept
	0.519507258
	0.193683553
	2.682247671
	0.008156005
	Age_Manager
	-0.018718302
	0.004577348
	-4.089333082
	7.12001E-05
	Experience_Manager
	0.057377217
	0.008610923
	6.663306419
	5.10648E-10
	Gender_Manager
	0.166663835
	0.068771589
	2.423440228
	0.01659799
The above output table is the each independent variable co efficient and their significance. By observing the p value column all the independent variables used in the model are significant which means those variables will be useful to predict the likelihood of the estore.
The plots of the regression as follows and these will helpful us to get to know the each variable effect on the dependent variables that is estore.
The plot above is the relationship between the age of the manager and estore. Here we can clearly see that the managers age greater than 30 is most of the stores are not presence of the eStore. But in some of the stores which are less is have the presence of the eStore.
This plot is about the relationship between the gender of the manager and the estore presence. The male gender of the manager is likely to have the eStore than the female.
The above plot is the relationship of the manager experience and the estore presence. The plot is clearly showing that where the manager age is high then there is likely to have the estore of the supermarket.

Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 MIS771 Descriptive Analytics and Visualisation DEPARTMENT OF INFORMATION SYSTEMS AND BUSINESS ANALYTICS DEAKIN BUSINESS SCHOOL...

Answer To: Assignment-1 MIS771 Descriptive Analytics and Visualisations Page 1 of 9 MIS771 Descriptive...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment