Week 4 Lab 4 Session 4 Lab: Decision Tree 1. Open a browser and enter the following URL: https://tech.knime.org/getting-started 2. Read through the step-by-step process of building a workflow using...

1 answer below »
Using attached two instruction sets I need two excel experiments with screen shot of every step and analysis for 2-3 lines and explanation of all the steps carried out. Data set are to be taken from kettle.com.I can provide example data set that were used by lecturer just give an idea on how they should be as I was advised to use similar data set as used in class


Week 4 Lab 4 Session 4 Lab: Decision Tree 1. Open a browser and enter the following URL: https://tech.knime.org/getting-started 2. Read through the step-by-step process of building a workflow using KNIME. 3. Open KNIME and choose your workspace. 4. From the node repository, select appropriate node for reading data file. 5. Then perform any kind of preprocessing covered in the previous lecture with the help of Statistics node. 6. Create two separate partitions from the original data set using the Partitioning node. 7. Use a training model (decision tree) and apply the models. 8. Collect the result and prepare a report that describes the overall process and discuss the quality of your model for this classification task. 9. Submit the report by clicking the weekly submission link named “A2 (Session 4 Part 2 Lab Submission 1%)” under the Assessments folder. 10. Please note: You will get a warning while uploading *.knwf of *.zip extensions, please ignore that warning. A Sample Workflow for Classification task 1 2 A4Data Mining and BI report Data Mining & Business Intelligence Assessment Type: Group Assignment Assessment Number: A4 Assessment Name: Data Mining & BI Report Student Id and Name: Muna Lama (200376) Adeel Ahmed (190771) Jenson Pathak (190437) Kishwor Bohora Date: 06/10/2021 Introduction: Data Preparation and Extraction Feature: For the Topic Detection Analysis, the data from twitter is selected, exported, and converted into a excel file using a website. The link of the source data: < https://twitter.com/rottentomatoes/status/1443574493813755916?s="20"> The selected data is the review of the movie ‘Venom’, it has 13 columns and 97 rows, few columns were deleted which were not important for the analysis. Column B has the texts tweeted by the users. Figure: Output of cleaned data. Experiment 2: Topic Detection Analysis Topic detection extracts the information from unstructured data to define a number of topics. We have used the KNIME tool to perform the experiment of our cleaned data which is a review of a movie and following the steps we followed ; Step1: Create a new workflow on KNIME, configure the data in the excel file reader which performs the reading operation. Step 2: “Column Rename” node allows us to rename the name of the columns or change their type. Step 3: Node “Strings to Documents” node is taken and configured as shown in the figure below. This node will convert the specified strings to documents. Figure: Strings to Documents Step 4: On “Column Filter” node we only included the document. Figure: Column Filter Step 5: “Punctuation Erasure” node is selected and configured as shown in the figure below. figure: Punctuation Erasure Step 6: Number filter is selected to filter all the terms of the data that consist of digit Figure: Number Filter Step 7 : N Chars filter node is selected and set the N chars value as 3 that filters all terms with less than the specified number N characters. Figure: N Chars Filter. Step 8 : “Stop Word Filter” node is used in this stage to remove stop words such as ;the ,is. At least one stop word must be selected. Figure: Output of Stop Word Filter Step 9: “Case Converter” node is used to convert upper-case letters to lower case letters. Figure: Output of Case Converter Step 10:”Topic Extractor”( Parallel LDA) node is selected to extract the topics and configured as shown in the figure below: Figure: Topic Extractor (Parallel LDA) Step 11: On “Group By” node ,under the manual aggregation we have included the Term and selected concatenate and configured as below. This node will group the words by the topic. Figure: GroupBy "product","country","date","quantity","amount","card","Cust_ID" "prod_4","unknown","2008-12-12",1,3,,"Cust_8" "prod_3","China","2009-04-10",2,160,"N","Cust_2" "prod_3","China","2009-04-10",2,160,"Y","Cust_5" "prod_3","China","2009-05-10",2,160,,"Cust_2" "prod_3","USA","2009-05-20",20,1600,,"Cust_3" "prod_3","Brazil","2009-06-08",15,1200,,"Cust_7" "prod_1","USA","2009-07-04",2,70,"Y","Cust_3" "prod_1","USA","2009-07-14",2,70,,"Cust_6" "prod_3","USA","2009-08-20",20,1600,,"Cust_3" "prod_2","Germany","2009-11-02",15,600,,"Cust_1" "prod_2","Germany","2009-11-22",15,600,"N","Cust_1" "prod_1","Germany","2009-12-02",1,35,"Y","Cust_1" "prod_1","China","2009-12-12",1,35,"Y","Cust_2" "prod_3","USA","2010-01-03",20,1600,,"Cust_3" "prod_1","Germany","2010-01-10",1,35,"N","Cust_1" "prod_3","Germany","2010-01-13",1,80,,"Cust_4" "prod_2","Germany","2010-01-15",25,1000,,"Cust_1" "prod_2","USA","2010-01-20",2,80,,"Cust_6" "prod_2","USA","2010-02-12",6,240,"Y","Cust_6" "prod_2","USA","2010-02-22",6,240,,"Cust_6" "prod_2","Brazil","2010-03-11",6,240,"N","Cust_7" "prod_3","China","2010-03-12",1,80,,"Cust_5" "prod_3","Germany","2010-03-14",2,160,,"Cust_9" "prod_3","USA","2010-03-17",1,80,"Y","Cust_3" "prod_2","Germany","2010-03-31",5,200,"Y","Cust_4" "prod_2","USA","2010-04-22",10,400,"Y","Cust_3" "prod_3","China","2010-05-12",2,160,"N","Cust_2" "prod_1","USA","2010-05-17",5,175,"Y","Cust_6" "prod_2","Germany","2010-06-22",6,240,,"Cust_1" "prod_1","China","2010-06-28",10,350,"Y","Cust_5" "prod_2","USA","2010-07-07",12,480,,"Cust_3" "prod_1","Brazil","2010-07-17",5,175,,"Cust_7" "prod_1","China","2010-08-28",10,350,"N","Cust_2" "prod_2","Germany","2010-08-31",5,200,,"Cust_1" "prod_3","Germany","2010-09-14",2,160,,"Cust_1" "prod_1","China","2010-10-01",2,70,,"Cust_5" "prod_1","USA","2010-10-11",2,70,"Y","Cust_6" "prod_2","USA","2010-12-07",15,600,"N","Cust_6" "prod_3","China","2011-01-02",8,640,,"Cust_2" "prod_1","USA","2011-01-10",10,350,"Y","Cust_3" "prod_2","Germany","2011-02-01",1,40,"Y","Cust_1" "prod_3","Brazil","2011-02-02",8,640,"Y","Cust_7" "prod_2","Germany","2011-02-11",1,40,,"Cust_4" "prod_1","Germany","2011-03-06",10,350,,"Cust_4" "prod_1","Germany","2011-03-18",1,35,"Y","Cust_4" "prod_1","Germany","2011-03-20",11,385,"N","Cust_4" "prod_1","Brazil","2011-04-06",1,35,,"Cust_7"
Answered 2 days AfterOct 07, 2021

Answer To: Week 4 Lab 4 Session 4 Lab: Decision Tree 1. Open a browser and enter the following URL:...

Mohd answered on Oct 08 2021
121 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here