make as it given in attached document

1 answer below »
make as it given in attached document


Guide to Project for Supervisors ITECH 7401 LEADERSHIP IN IT PROJECT MANAGEMENT Assignment 2: Analytics Report Overview The purpose of this task is to provide students with practical experience in writing a data analytical report to provide useful insights, pattern and trends in a chosen dataset in the light of a set of tasks required within this document. This dataset will be chosen from the UC Irvine Machine Learning Repository[footnoteRef:1]. This activity will give students the opportunity to show innovation and creativity in applying the WEKA data mining software, and designing useful visualization and data mining solutions presented as an analytics report. [1: https://archive.ics.uci.edu/ml/index.php] Timelines and Expectations Percentage Value of Task: 35% Due: Week 11 Minimum time expectation: Preparation for this task will take approximately 40 hours Project Details You will use an analytical tool (i.e. WEKA) to explore, analyse and visualise a dataset of your choosing. An important part of this work is preparing a good quality report, which details your choices, content, and analysis, and that is of an appropriate style. The dataset should be chosen from the following repository: UC Irvine Machine Learning Repository https://archive.ics.uci.edu/ml/index.php The aim is to use the data set allocated to provide interesting insights, trends and patterns amongst the data. Your intended audience is the CEO and middle management of the Company for whom you are employed, and who have tasked you with this analysis. Tasks Task 1 - Data choice. Choose any dataset from the repository that has at least five attributes, and for which the default task is classification. Transform this dataset into the ARFF format required by WEKA. Task 2 - Background information. Write a description of the dataset and project, and its importance for the organization. Provide an overview of what the dataset is about, including from where and how it has been gathered, and for what purpose. Discuss the main benefits of using data mining to explore datasets such as this. This discussion should be suitable for a general audience. Information must come from at least two appropriate sources be appropriately referenced. Task 3 - Data description. Describe how many instances does the dataset contain, how many attributes there are in the dataset, their names, and include which is the class attribute. Include in your description details of any missing values, and any other relevant characteristics. For at least 5 attributes, describe what is the range of possible values of the attributes, and visualise these in a graphical format. Task 4 – Data preprocessing. Preprocess the dataset attributes using WEKA's filters. Useful techniques will include remove certain attributes, exploring different ways of discretizing continuous attributes and replacing missing values. Discretizing is the conversion of numeric attributes into "nominal" ones by binning numeric values into intervals[footnoteRef:2]. Missing values in ARFF files are represented with the character "?"[footnoteRef:3]. If you replaced missing values explain what strategy you used to select a replacement of the missing values. Use and describe at least three different preprocessing techniques. [2: See: weka.filter.DiscretizeFilter] [3: See: weka.filter.ReplaceMissingValuesFilter] Task 5 – Data mining. Compare and contrast at least three different data mining algorithms on your data, for instance:. k-nearest neighbour, Apriori association rules, decision tree induction. For each experiment you ran describe: the data you used for the experiments, that is, did you use the entire dataset of just a subset of it. You must include screenshots and results from the techniques you employ. Task 6 – Discussion of findings. Explain your results and include the usefulness of the approaches for the purpose of the analysis. Include any assumptions that you may have made about the analysis. In this discussion you should explain what each algorithm provides to the overall analysis task. Summarize your main findings. Task 7 – Report writing. Present your work in the form of an analytics report. Submission The assignment is to be submitted via the Assignment submission box in Moodle. This can be found in the Assessments section of the course Moodle shell. Your report file will be submitted as either a MS word file or a PDF. If you are using MacOS, please submit as a PDF. Your report will include the following in the order provided below: A cover page with your name and student ID Table of Contents Table of Figures / Tables Data choice Background information Data description Data preprocessing Data mining Discussion of findings References Appendices Your references should use the APA referencing style; information is available here: https://federation.edu.au/library/student-resources/help-with-referencing https://federation.edu.au/library/student-resources/fedcite Identify all sources of information used. You are reminded to read the “Plagiarism” section of the course description. A passing grade will be awarded to assignments adequately addressing all assessment criteria. Higher grades require better quality and more effort. For example, a minimum is set on the wider reading required. A student reading vastly more than this minimum will be better prepared to discuss the issues in depth and consequently their report is likely to be of a higher quality. So before submitting, please read through the assessment criteria very carefully. Marking Criteria/Rubric Tasks Marks Awarded Comments 1 - Data choice i. Data correctly transformed into the ARFF format. 10 2 - Background information i. Description of the dataset including what the dataset is about, including from where and how it has been gathered, and for what purpose ii. Description of project, and its importance for the organization. iii. Main benefits of using data mining to explore datasets such as this. iv. At least two references from peer reviewed sources. 4 x 5 = 20 3 - Data description i. General details of dataset ii. Detailed description of five attributes 5 + 15 = 20 4 - Data preprocessing i. Implemented and described three different preprocessing techniques. 3 x 10 = 30 5 – Data mining Three different data mining algorithms used Description of techniques with screenshots and results 3 x 10 = 30 6 – Discussion of findings. Explanation of results Usefulness of the data mining algorithms Summary of main findings. 3 x 10 = 30 7 - Presentation of Report. Report is well-written and presented professionally, containing all required sections. 10 Total Marks 150 Total Marks out of 35 35% Feedback Feedback and marks will be provided in Moodle. Marks will also be available in FDL Marks. Plagiarism Plagiarism is the presentation of the expressed thought or work of another person as though it is one's own without properly acknowledging that person. You must not allow other students to copy your work and must take care to safeguard against this happening. More information about the plagiarism policy and procedure for the university can be found at http://federation.edu.au/students/learning-and-study/online-help-with/plagiarism Please refer to the Course Description for information regarding late assignments, extensions, and special consideration. A reminder all academic regulations can be accessed via the university’s website, see: http://federation.edu.au/staff/governance/legal/feduni-legislation CRICOS Provider No. 00103D ITECH2004 Assignment1 ER-SQL 2020.docx Page 1 of 5 ITECH1103 Big Data and Analytics Page 4 of 5
Answered 4 days AfterMay 24, 2022

Answer To: make as it given in attached document

Suraj answered on May 29 2022
77 Votes
Data Analysis using WEka
Data Analysis using WEka
Data Choice:
Since, the project is an open-source internet type project. So, I have selected data related to medical field where many variables are measured on the health state of people
and the response variable is the death event. That is the person will survive or not. The population of the data set are all the patients who suffered from the cardiovascular disease earlier and whether they survived or not.
Background information:
This is a quite interesting question which can build our analytical thinking skills. This data set is medical data set, so we are interested to analyse some medical aspects of the human life. That is to check about the whether a diabetic person has high chances of death or not. Whether the presence of some disease features leads to heart attack and also the person will be survived or not. Here, all the variables are important for the analysis purpose, so no single variable is deleted from the data set.
Data Description:
A brief explanation about the variables given as follows:
Description Table
There are 299 rows and 12 different variables in the data set. The data set does not contain any missing value for any row.
The names of all the 12 attributes are given in the above image. Here, death event is a class variable.
Missing Value Filling
The data set does not consist any missing value. If any varaible contains the missing values then the missing values replaced by mean, median or previous value. There are plenty of values to fill the missing values. The steps to replace the missing values are given as follows:
a. Go to the filer option.
b. Choose unsupervised >> attribute >> Replace Missing Values
c. Click on APPLy button.
In WEKA we can find the descriptive statistics of any variable by selecting that variable. The appropriate graph and four descriptive statistics that is mean, standard deviation, minimum and maximum values are calculated automatically.
We will show it for two variables that one for quantitative variable and other one is qualitative variable.
Age
Data Pre-processing:
In this section we will apply three pre-processing techniques to the data for creating model. Since, the data does not contain any missing value so, missing value method is not applicable here. Since, the data consist of many attributes with different dimensions. So, to...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here