Assessment criteria Demonstrates ability to perform a detailed exploration, investigation and visualisation of a given real-world dataset using techniques learned in Weeks 2 and 3. Demonstrates...

1 answer below »

Assessment criteria



  1. Demonstrates ability to perform a detailed exploration, investigation and visualisation of a given real-world dataset using techniques learned in Weeks 2 and 3.

  2. Demonstrates ability to build a Decision Tree classifier with the dataset for predicting class labels on the dataset.

  3. Demonstrates ability to perform experiments to evaluate the performance of the models built.

  4. Presents a well-structured report that addresses the different components identified in the instruction section. This report is error free (spelling, grammar and language). Where appropriate, UniSA Harvard referencing convention is employed to cite information that is taken from external sources.

  5. Presents the two-slide PPT presentation summarising the findings.


Assignment Requirement


In this assignment, you work as a professional to write a report to your client. The report is on a decision tree analysis of a dataset from your client. The report must have a cover page which you design, the table of contents, and an executive summary, in addition to the sections required below. The assignment is quite open and the quality depends on what you do, how you do it, how you interpret the results, and whether the results are convincing.


You are asked to conduct a classification analysis of the given dataset (given in theProject 1 Resources folder) which includes the dataset and its description. The aim of the analysis is to develop models to show how the class variable on a person’s income level (i.e., high or low) is determined by the input variables. That is, given some input data about a person, the model predicts whether the person earns high or low income. You are asked to write a report about this analysis. Your report should cover the following areas:


Part 1: Present the initial handling and inspection of the dataset and discussion on how the analysis may be affected by data properties (e.g. value distribution, null values, and extreme values) of the data.


In this part, you would include the visualization of representative variables such as distributions/histograms, skewness, etc., and your treatment of the data such as imputing null values (if any) and binning continuous variables. If necessary, different columns should be treated differently. You may also discuss rejection of variables and creation of new features. The knowledge of this part relates to Chapters 2 and 3 of the Textbook.


If multiple variables share similar properties, you present one and summaries the others. This enables you to keep the report short.


Your description must mentionhigh-level steps and principles/methods (and the parameters) that you follow to get the result/conclusions. For example, in inputing null values, you can say that median is used. In feature selection, you would mention what method and threshold are used. These do not have to be long, but you need to include them. Do not explain the steps and methods, just mention them.


Part 2: Apply the decision tree algorithm to the dataset to train decision trees.


In this part, you run the decision tree algorithm at least three times to learn trees in different settings. The trees are expected to be learnt from data with different data treatments and/or different training parameters. In the report, for each tree, you need to describe



  • purposes/reasons for using these treatments/parameters,

  • data treatments and parameters used in the training


Part 3: Interpretation of the generated models.


In this part, youcomparethe trees learnt in the previous step, describe and interpret the best one(s). By interpretation, it means that you describe the decision tree in the language understandable by people who are not from the technical area and describe the similarities and differences of the trees. Your interpretation is also expected to include the major variables influencing the output and how they influence them.


You may search the internet for information and use the information to back up your interpretation. If you do this, make sure that you cite the sources of the information properly. Otherwise, you may commit plagiarism.


Part 4: Write a PPT file to have two and only two slides (without any title page) to summarise your findings: your methods, your best model and conclusion.


The slides must be readable, and the font size should not be smaller than 18 while size 24 is typical. Do not put any identifier in this ppt/pptx file as this will be merged and published.

Answered 3 days AfterMar 11, 2021Monash University

Answer To: Assessment criteria Demonstrates ability to perform a detailed exploration, investigation and...

Rajeswari answered on Mar 14 2021
123 Votes
Adult data set
Adult data set
Histogram for age groups sorted in intervals of 10.
Histogram
Freq
uency    25    35    45    55    65    75    85    95    More    3060    4005    3763    2631    1449    489    108    19    0    Cumulative...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here