Learning objectives: The key frameworks and concepts covered in modules 1–6 are particularly relevant for this assignment. Assignment 2 relates to the course learning objectives 1, 2 and 4: 1....


Learning objectives:


The key frameworks and concepts covered in modules 1–6 are particularly relevant for this assignment. Assignment 2 relates to the course learning objectives 1, 2 and 4:
1. demonstrate applied knowledge of people, markets, finances, technology and management in a global context of business intelligence practice (data warehouse design, data mining process, data visualisation and performance management) and resulting organisational change and how these apply to implementation of business intelligence in organisation systems and business processes
2. identify and solve complex organisational problems creatively and practically through the use of business intelligence and critically reflect on how evidence based decision making and sustainable business performance management can effectively address real world problems
4. demonstrate the ability to communicate effectively in a clear and concise manner in written report style for senior management with correct and appropriate acknowledgment of main ideas presented and discussed.
Assignment 2 consists of three main tasks and a number of sub tasks


Task 1 Data Management Architectures (Worth 30 Marks)
For Task 1 you are required to conduct a critical literature review of the related concepts of data
warehouses, data lakes and data marts in order to complete two sub tasks.
Task 1.1) Define and discuss the concepts of (1) a data warehouse, (2) a data lake and (3) a data
mart drawing relevant and reputable literature (about 500 words).
Task 1.2) Explain how a data warehouse, a data lake and a data mart would be used in an
organization and in your answer provide some real world examples of how each would be used in
an organization (about 500 words).
Task 2 Exploratory Data Analysis and Linear Regression Analysis (Worth 40
Marks)
For Task 2 consider a set of observations on a large number of white wine varieties involving
their chemical properties and ranking by wine tasters contained in white-wines.csv data set. Wine
industry has been growing steadily as social drinking of wine is on the rise. The price of a wine
largely depends on wine appreciation by wine tasters which may have a high degree of variability.
Another key factor in wine certification and quality assessment is physicochemical tests which
are laboratory-based and take into account factors like acidity, pH level, presence of sugar and
other chemical properties.
For wine producers, it would be of interest if wine tasters’ perception of wine quality after tasting
can be related to the chemical properties of wine so that certification and quality assessment and
assurance process of wines is more rigorous.
The white-wines.csv data set consists of 4898 white wine varieties in total (records). All wines are
from one wine producing region. The white-wines.csv data set was collected on 12 different
properties of wines. Quality is based on sensory data (wine tasters’ perception of the quality of a
wine), the rest are based on chemical properties of wines including density, acidity, alcohol
content etc. All chemical properties of wines are coded as continuous numeric variables. Quality
is an ordinal variable with a possible ranking from 1 (worst) to 10 (best). Each white wine variety
is tasted by three independent tasters and final rank assigned is the median rank given by tasters.
See Table 1 White Wines Data Set Data Dictionary for full details of white-wines.csv data set.
Discuss the key results of your exploratory data analysis presented in Table 2.1 and provide a
rationale for why you have selected your five top variables for predicting a wine taster’s ranking of
a white wine drawing on the results of your EDA analysis and relevant literature (About 500 words).
Task 2.2) Build a Linear Regression model for predicting the quality ranking of a white wine using
a RapidMiner data mining process and an appropriate set of data mining operators and a reduced
set of variables from the white-wines.csv data set determined by your exploratory data analysis in
Task 2.1. Provide these outputs from RapidMiner (1) Final Linear Regression Model process and
(2) Summary Table of Results of Final Linear Regression Model for Task 2.2 for white-wines.csv
data set.
Briefly describe your final Linear Regression Model Process, and discuss the results of the
Final Linear Regression Model for white wine.csv data set drawing on the key outputs
(coefficient, standardised coefficients, t-statistics values, p-values and significance levels etc) for
predicting Wine Quality and relevant supporting literature on the interpretation of a Linear
Regression Model (About 500 words).
Include all appropriate RapidMiner outputs such as RapidMiner Processes, Graphs and Tables that
support the key aspects of your exploratory data analysis and linear regression model analysis of
the white-wines.csv data set in your Assignment 2 report. Note you need export these outputs
from RapidMiner using the File/Print/Export Image option and include in Task 2 where
relevant or in Appendix A of Assignment 2 report.





Oct 07, 2019
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here