MIS772 Assignment A1 MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3 Assignment A2 / Workshops M1-M3: RM his assignment covers all workshops in modules M1-M3. By completing...

1 answer below »
Part LP3 Exec: Briefly define a problem in business terms. Rels: Perform cluster analysis of wines’ text. Conduct segmentation analysis, including both text and structured data. Identify relationships in data. Visualise and interpret results. Answer the management question (A).


MIS772 Assignment A1 MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3 Assignment A2 / Workshops M1-M3: RM his assignment covers all workshops in modules M1-M3. By completing the workshops and assignment students will understand how to use RapidMiner (RM) to explore data, gain insights into the problem domain, create and validate estimation and clustering models, perform segmentation analysis and text mining. The workshop will rely on students’ knowledge of methods and techniques introduced in a series of classes. The assignment will have two deliverables in the form of learning portfolios LP3 and LP4. During the workshop (on-campus and on-cloud) students will work in teams but submit their individual reports based on their tasks as related to the data set. The work is expected to use RapidMiner Studio. Demonstrations and lab exercises will assist skill development. Before attending RM workshops, students are required to become familiar with class notes and all textbook readings (see the topic schedule with chapter references). Activities – No late arrivals for the on-campus sessions! Topic 1. Learn to use RapidMiner Studio. Preparation 2. The workshop facilitator will explain the case in the focus of this assignment. Work in groups of up to 4 (also 1-2-3). M1, M2T1 Classification Cross-Validation Optimisation Data Prep Start by formulating a business problem (it may change later). 3. Revise classification models (such as k-NN and decision trees), cross- validation, clustering and simple model optimisation. Learn about the problem area and the assignment data. Download your data as a CSV (or JSON if brave) file, explore your data. Select attribute types, nominate them as labels and predictors. Do not modify these ‘raw’ data files outside of the RM environment. 4. Learn to parse and represent text data, reduce data dimensionality, perform segmentation analysis, create and evaluate predictive models with attributes derived from text, visualise results. M2T2 Text Mining & Sentiment 5. Use RM to clean and transform data, deal with missing values, produce simple statistics and charts, build estimation models using multiple regression and neural networks. Learn how to create model ensembles, such as random forests, boosting, stacking and bootstrapping ensembles. M2T3 & M2T4 Estimation Neural Nets Ensembles 6. Study the techniques associated with the deployment or analytic processes. Extend your work on neural networks with deep learning systems. M3T1 & M3T2 Deployment Deep Learning 7. As a team member, prepare an individual report using the provided template. The report should be in PDF format. Also, include all RM processes in RMP format. If you have altered the data, attach the modified data to your submission. Report and Executive Summary 8. By the specified deadline, individually submit two components of your learning portfolio, i.e. LP3 and later LP4 parts of the assignment via CloudDeakin dropbox. With each submission, include your report in PDF, formatted using the provided template plus a ZIP archive of all models, i.e. your RapidMiner scripts (.RMP files) – do not use other file formats! Submission / Learning Portfolio 1 of 4 Objectives Methods Prerequisites Workshop Schedule MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3 This mini case study will be used in all workshops of module 1, i.e. M1T1-M1T4. All amendments, extensions and assumptions should be recorded in the final submission. Australian Wine Importers (AWI) asked you to develop a method of estimating rating (points) of imported wines based on their text and structured attributes. AWI provided you with a sample of 130,000 wine tasting results, which include:  Wine “title” (name + vintage);  Country, Province and Region;  Variety and Winery;  Description and Designation;  Price (US$). However:  Taster name and Points to be excluded. In the future, AWI would like to get the preliminary insight as to the wine quality based on social media reviews. The following questions are of interests to AWI: A) What group of wines the new wine is most similar to, and why / how? and, B) What is the estimated rating of the newly introduced wine to the Australian market? (fractional ratings permitted) AWI wants you to cleanup and explore wine tasting data, develop and evaluate a wine rating estimator, and minimize the estimation error in the process. In technical terms: Your project objectives form a learning portfolio. The first objective (LP3) is to acquire and explore the available data using clustering and segmentation analysis, visualise and report relationships in text and structured data, also prepare data for further processing. The second objective (LP4) is to create an estimation system able to answer management questions using all available data. Text mining will be strongly featured in assignment A2. Reports in PDF format and models developed in LP3 and LP4 in ZIP archives are to be submitted via CloudDeakin by their respective deadlines. Data: Data: http:// www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip Original data source: https://www.kaggle.com/zynicide/wine-reviews Hints on the process: Formulate a business problem using plain English statements, however, cross-reference them with technical aspects described in the subsequent sections. When describing the problem and its solution keep in mind what can be achieved by using the available data. Note that what you have been asked for and what can be delivered are two different things, e.g. to solve the problem you may need to narrow or slightly change the problem scope or the model may provide quality answers only within a specific range of data characteristics, if so then this is what you need to report or recommend to AWI management. 2 of 4 Mini Case Study http://www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip https://www.kaggle.com/zynicide/wine-reviews http://www.deakin.edu.au/~jlcybuls/pred/data/Wine-Reviews.zip MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3 Explore your text and non-text attributes in terms of their clustering and segmentation. Use appropriate visualisations, analyse and interpret them. As the report template provides very limited space, be selective about what you include in the report – each chart and table must have a purpose and a description to advance your argument, use them as evidence! Depending on the model, some attributes may need to be transformed before using them in modelling tasks. You may also have to deal with incorrect or missing values. Look at your modelling options, optimise their parameters and compare evaluation results. Check the assessment criteria on the next page to see how you are going to be assessed. Stick to the recommended process. Complete the basics first before moving to the more advanced tasks or any extensions and research tasks. You will submit your work in two learning portfolio parts LP3 and LP4. Each part needs to be lodged via CloudDeakin dropbox before the deadline. You will be allowed to submit your work once only! It is essential that your reports use LP3 and LP4 templates. Follow instructions embedded in the templates! Both reports must fit into a strict page limit imposed by the template. Only pages within the template limit will be reviewed and assessed! Make sure that the problem statement and the executive summary are aimed at non- technical readers, while the remaining parts of the reports aim at a data / business analyst (and not highly technical programmers). Your submission must include the report in PDF format and a ZIP archive of .RMP script files (these can be found in the RM project folder – simply ZIP these files). Submissions not in a PDF and ZIP format will not be open or assessed! There is a strict deadline for each submission. In cases of some documented illness, a special consideration may be granted but must be applied for well ahead of the deadline. In general, requests for special considerations received less than three days before deadline will not be considered! An automatic late penalty of 5% of the available marks per day (up to 5 days) will be applied to all late assignment submissions. Late penalties apply immediately past the deadline – even 1 second! Both parts LP3 and LP4 will be marked together after part LP4 is submitted. Feedback will be provided on both parts together. Team work and collaboration is encouraged but plagiarism will be penalised. Team members can share ideas and help each other in solving technical problems. Seek your team’s feedback on all aspects of your assignment, especially before its submission. However, your assignment needs to be completed individually. Ensure that your assignment is unique, otherwise plagiarism will be assumed! 3 of 4 Assignment Submission MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3 The work will be assessed based on the following criteria. Use RapidMiner for both assignment tasks LP3 and LP4. Other tools can be used for the tasks associated with the research section only. Do not start the advanced tasks before meeting the expectations first (or no points will be given). Use submission template for both LP3 and LP4. LP3 Exceptional Ranges: 80–90–100% Meets Expectations Ranges: 50–65–79% Unacceptable Ranges: 0–25–49% 5  One page limit  0 Pr ob le m Identify what decisions need to be drawn and what actions need to be supported. Succinctly state a business problem (or question) and specify what insights need to be generated from data. Not provided or in- comprehensible. 25  One page limit  0 Da ta P re p Deal with errors and missing values. Reduce data dimensionality. Provide comprehensive analysis, tabulate your results. Answer the management question (A). Parse text attributes. Then, conduct clustering and segmentation analysis of both structured and text data. In the process, identify relationships in data. Visualise and interpret the obtained results. Annotate all charts (with text and arrows) to highlight important insights. Not meeting expectations. Missing RM process files. Over the page limit. Include: Report (use template, in PDF) and RMP files (in ZIP), with explanation how to reproduce all results. LP4 Exceptional Ranges: 80–90–100% Meets Expectations Ranges: 50–65–79% Unacceptable Ranges: 0–25–49% 5  One page limit  0 Ex ec R ep or t Narrow down the business problem. Identify decisions and actions that will be supported by the analytic solution. Include a list of used academic refs. Restate /
Answered Same DayApr 15, 2021MIS772Deakin University

Answer To: MIS772 Assignment A1 MIS772 Predictive Analytics (2019 T2) Assignment A2 / Workshops M1-M2-M3...

Abr Writing answered on Apr 20 2021
129 Votes
Load data.
































Change role to 'regular' for all columns.















Define the target column for the predictive model.








Should define a target column?































Discretize by binning (same range per bin).





























Discretize by frequency (same count per bin).








Should discretize numerical target column?



























Map some nominal target values to new values.








Should map nominal values?




























Make sure that target is binary for positive class mapping.
















Potentially define which one should be the positive class.









Should define positive class?

























Potentially remove columns.








Should remove columns?

















No date processing is desired here, so simply remove the date columns completely.






















Check if there actually are any date columns in the data.













Adds an additional column with the date today. This can be useful for calculations of ages etc.
















































Select the other way around here and store in the macro if that column already exists.







Store if the other way round exists.












Generate the difference for the two date columns in milliseconds.
















Both date columns are the same or the other way round already was created - do nothing here!

Only calculate the differences between the two date columns if the columns are not equal and if the other way around has not been calculated yet.










Loop over all combinations of date attributes and calculate their differences (which includes the new today column generated previously).








Loop over all combinations of date attributes and calculate their differences (which includes the new today column generated previously).














Remove the generated today column again.















































































































































...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here