Assessment criteria Demonstrates ability to perform a detailed exploration, investigation and...

Question

Assessment criteria

Demonstrates ability to perform a detailed exploration, investigation and visualisation of a given real-world dataset using techniques learned in Weeks 2 and 3.

Demonstrates ability to build a Decision Tree classifier with the dataset for predicting class labels on the dataset.

Demonstrates ability to perform experiments to evaluate the performance of the models built.

Presents a well-structured report that addresses the different components identified in the instruction section. This report is error free (spelling, grammar and language). Where appropriate, UniSA Harvard referencing convention is employed to cite information that is taken from external sources.

Presents the two-slide PPT presentation summarising the findings.

Assignment Requirement

In this assignment, you work as a professional to write a report to your client. The report is on a decision tree analysis of a dataset from your client. The report must have a cover page which you design, the table of contents, and an executive summary, in addition to the sections required below. The assignment is quite open and the quality depends on what you do, how you do it, how you interpret the results, and whether the results are convincing.

You are asked to conduct a classification analysis of the given dataset (given in theProject 1 Resources folder) which includes the dataset and its description. The aim of the analysis is to develop models to show how the class variable on a person’s income level (i.e., high or low) is determined by the input variables. That is, given some input data about a person, the model predicts whether the person earns high or low income. You are asked to write a report about this analysis. Your report should cover the following areas:

Part 1: Present the initial handling and inspection of the dataset and discussion on how the analysis may be affected by data properties (e.g. value distribution, null values, and extreme values) of the data.

In this part, you would include the visualization of representative variables such as distributions/histograms, skewness, etc., and your treatment of the data such as imputing null values (if any) and binning continuous variables. If necessary, different columns should be treated differently. You may also discuss rejection of variables and creation of new features. The knowledge of this part relates to Chapters 2 and 3 of the Textbook.

If multiple variables share similar properties, you present one and summaries the others. This enables you to keep the report short.

Your description must mentionhigh-level steps and principles/methods (and the parameters) that you follow to get the result/conclusions. For example, in inputing null values, you can say that median is used. In feature selection, you would mention what method and threshold are used. These do not have to be long, but you need to include them. Do not explain the steps and methods, just mention them.

Part 2: Apply the decision tree algorithm to the dataset to train decision trees.

In this part, you run the decision tree algorithm at least three times to learn trees in different settings. The trees are expected to be learnt from data with different data treatments and/or different training parameters. In the report, for each tree, you need to describe

purposes/reasons for using these treatments/parameters,

data treatments and parameters used in the training

Part 3: Interpretation of the generated models.

In this part, youcomparethe trees learnt in the previous step, describe and interpret the best one(s). By interpretation, it means that you describe the decision tree in the language understandable by people who are not from the technical area and describe the similarities and differences of the trees. Your interpretation is also expected to include the major variables influencing the output and how they influence them.

You may search the internet for information and use the information to back up your interpretation. If you do this, make sure that you cite the sources of the information properly. Otherwise, you may commit plagiarism.

Part 4: Write a PPT file to have two and only two slides (without any title page) to summarise your findings: your methods, your best model and conclusion.

The slides must be readable, and the font size should not be smaller than 18 while size 24 is typical. Do not put any identifier in this ppt/pptx file as this will be merged and published.

project-1-resources-20210311-szd1e11s-s0qxvzgt.zip screen-shot-2021-03-11-at-62653-pm-jr2hmp3s-ntb4efzz.png

Rajeswari · Accepted Answer

Adult data set
Adult data set
Histogram for age groups sorted in intervals of 10.
Histogram
Frequency	25	35	45	55	65	75	85	95	More	3060	4005	3763	2631	1449	489	108	19	0	Cumulative %	25	35	45	55	65	75	85	95	More	0.19711414583870138	0.45510177789229578	0.69750064416387525	0.86698015975264109	0.

Assessment criteria Demonstrates ability to perform a detailed exploration, investigation and visualisation of a given real-world dataset using techniques learned in Weeks 2 and 3. Demonstrates...

Assessment criteria

Assignment Requirement

Answer To: Assessment criteria Demonstrates ability to perform a detailed exploration, investigation and...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment