hi need help with this oneMLN 601_Assessment 2 Brief_Source Code and Presentation_Module 8 Page 1 of...

Question

hi need help with this oneMLN 601_Assessment 2 Brief_Source Code and Presentation_Module 8 Page 1 of 8   Task Summary  Customer churn, also known as customer attrition, refers to the movement of customers from one  service provider to another. It is well known that attracting new customers costs significantly more  than retaining existing customers. Additionally, long-term customers are found to be less costly to  serve and less sensitive to competitors’ marketing activities. Thus, predicting customer churn is  valuable to telecommunication industries, utility service providers, paid television channels, insurance  companies and other business organisations providing subscription-based services. Customer-churn  prediction allows for targeted retention planning.  In this Assessment, you will build a machine learning (ML) model to predict customer churn using the  principles of ML and big data tools.  As part of this Assessment, you will write a 1,000-word report that will include the following:   a) A predictive model from a given dataset that follows data mining principles and techniques;  b) Explanations as to how to handle missing values in a dataset; and  c) An interpretation of the outcomes of the customer churn analysis.  Please refer to the Task Instructions (below) for details on how to complete this task.  ASSESSMENT 2 BRIEF  Subject Code and Title BDA601—Big Data and Analytics  Assessment Visualisation and Model Development  Individual/Group Individual  Length Source Code and Report 1,000 words (+/—10%)  Learning Outcomes The Subject Learning Outcomes demonstrated by the successful  completion of the task below include:   c) Apply data science principles to the cleaning, manipulation, and  visualisation of data  d) Design analytical models based on a given problems; and  e) Effectively report and communicate findings to an appropriate  audience.  Submission Due by 11.55 pm AEST on the Sunday at the end of Module 8.  Weighting 30%  Total Marks 100 marks      MLN 601_Assessment 2 Brief_Source Code and Presentation_Module 8 Page 2 of 8    Task Instructions  1. Dataset Construction  Kaggle telco churn dataset is a sample dataset from IBM, containing 21 attributes of  approximately 7,043 telecommunication customers. In this Assessment, you are required to  work with a modified version of this dataset (the dataset can be found at the URL provided  below). Modify the dataset by removing the following attributes: MonthlyCharges,  OnlineSecurity, StreamingTV, InternetService and Partner.  As the dataset is in .csv format, any spreadsheet application, such as Microsoft Excel or Open  Office Calc, can be used to modify it. You will use your resulting dataset, which should  comprise 7,043 observations and 16 attributes, to complete the subsequent tasks. The ‘Churn’  attribute (i.e., the last attribute in the dataset) is the target of your churn analysis.  Kaggle.com. (2020). Telco customer churn—IBM sample data sets. Retrieved from  https://www.kaggle.com/blastchar/telco-customer-churn [Accessed 05 August 2020].  2. Model Development  From the dataset constructed in the previous step, present appropriate data visualisation and  descriptive statistics, then develop a ‘decision-tree’ model to predict customer churn. The  model can be developed in Jupyter Notebook using Python and Spark’s Machine Learning  Library (Pyspark MLlib). You can use any other platform if you find it more efficient. The  notebook should include the following sections:  a) Problem Statement  In this section, briefly state the context and the problem you will solve in the  notebook.  b) Exploratory Data Analysis  In this section, perform both a visual and statistical exploratory analysis to gain  insights about the dataset.  c) Data Cleaning and Feature Selection  In this section, perform data pre-processing and feature selection for the model,  which you will build in the next section.  d) Model Building  In this section, use the pre-processed data and the selected features to build a  ‘decision-tree’ model to predict customer churn.  In the notebook, the code should be well documented, the graphs and charts should be neatly  labelled, the narrative text should clearly state the objectives and a logical justification for  each of the steps should be provided.  3. Handling Missing Values  The given dataset has very few missing values; however, in a real-world scenario, data- scientists often need to work with datasets with many missing values. If an attribute is  important to build an effective model and have significant missing values, then the data- scientists need to come up with strategies to handle any missing values.   From the ‘decision-tree’ model, built in the previous step, identify the most important  attribute. If a significant number of values were missing in the most important attribute  https://www.kaggle.com/blastchar/telco-customer-churn     MLN 601_Assessment 2 Brief_Source Code and Presentation_Module 8 Page 3 of 8    column, implement a method to replace the missing values and describe that method in your  report.   4. Interpretation of Churn Analysis  Modelling churn is difficult because there is inherent uncertainty when measuring churn.  Thus, it is important not only to understand any limitations associated with a churn analysis  but also to be able to interpret the outcomes of a churn analysis.  In your report, interpret and describe the key findings that you were able to discover as part  of your churn analysis. Describe the following facts with supporting details:  • The effectiveness of your churn analysis: What was the percentage of time at which  your analysis was able to correctly identify the churn? Can this be considered a  satisfactory outcome? Explain why or why not;  • Who is churning: Describe the attributes of the customers who are churning and  explain what is driving the churn; and  • Improving the accuracy of your churn analysis: Describe the effects that your previous  steps, model development and handling of missing values had on the outcome of your  churn analysis and how the accuracy of your churn analysis could be improved. Submission Instructions  • Zip the following files and submit the .zip files via the Assessment link in the main  navigation menu in BDA601—Big Data and Analytics:  o Modified dataset (.csv file) constructed in Task 1;  o Notebook (.ipynb file) from Task 2; and  o Report (.pdf file) from Task 3.  The Learning Facilitator will provide feedback via the Grade Centre in the LMS portal. Feedback can  be viewed in My Grades. Academic Integrity Declaration  I declare that except where referenced, the work I am submitting for this assessment task is my own  work. I have read and am aware of the Academic Integrity Policy and Procedure of Torrens University,  Australia, viewable online at http://www.torrens.edu.au/policies-and-forms.  I am also aware that I need to keep a copy of all submitted material and any drafts and I agree to do  so.  http://www.torrens.edu.au/policies-and-forms     BDA601_Assessment 2 Brief_Source Code and Report_Module 8       Page 4 of 8    Assessment Rubric  Assessment  Attributes  Fail   (Yet to Achieve Minimum  Standard)  0–49%  Pass  (Functional)  50–64%  Credit  (Proficient)  65–74%  Distinction  (Advanced)  75–84%  High Distinction  (Exceptional)  85–100%    Knowledge and  understanding of  exploratory data  analysis     15%    Demonstrates partial or  unsatisfactory knowledge  and understanding of the  exploratory data analysis.    Demonstrates unsatisfactory  skills in:  • Exploring the data using  both the measure of  central tendency and  the measure of  dispersions; and/or  • Exploring the data using  various visual  representations, such as  a histogram, scatter  plot, box plot, heatmap,  pair plot or probability  distribution plot.    Demonstrates functional  knowledge and  understanding of the  exploratory data analysis.    Demonstrates satisfactory  skills in:  • Exploring the data using  both the measure of  central tendency and the  measure of dispersions;  and  • Exploring the data using  various visual  representations, such as  a histogram, scatter plot,  box plot, heatmap, pair  plot or probability  distribution plot.    Demonstrates solid  knowledge and  understanding of the  exploratory data analysis.    Demonstrates solid skills in:  • Exploring the data  using both the measure  of central tendency  and the measure of  dispersions; and  • Exploring the data  using various visual  representations, such  as a histogram, scatter  plot, box plot,  heatmap, pair plot or  probability distribution  plot.  • Only selective statistics  were produced from  the above-mentioned  visuals.    Demonstrates advanced  knowledge and  understanding of the  exploratory data analysis.    Demonstrates advanced  skills in:  • Exploring the data  using both the measure  of central tendency and  the measure of  dispersions; and  • Exploring the data  using various visual  representations, such  as a histogram, scatter  plot, box plot,  heatmap, pair plot or  probability distribution  plot.  • Appropriate statistics  were produced from  the above-mentioned  visuals.    Demonstrates exceptional  knowledge and  understanding of the  exploratory data analysis.    Demonstrates exemplary  skills in:  • Exploring the data  using both the  measure of central  tendency and the  measure of  dispersions; and  • Exploring the data  using various visual  representations, such  as a histogram, scatter  plot, box plot,  heatmap, pair plot or  probability distribution  plot.  • Appropriate statistics  were produced from  the above-mentioned  visuals.  • Gained unique insights  about the dataset      BDA601_Assessment 2 Brief_Source Code and Report_Module 8       Page 5 of 8    through the statistical  observations.    Analytical design for  data pre-processing  and feature selection    15%    Demonstrates partial or  unsatisfactory knowledge  and understanding of data  pre-processing and feature  selection.    Completed less than 50% of  the following tasks and the  tasks completed were  unsatisfactory in terms of  quality, accuracy and  completeness:  • Handling data  anomalies;  • Conducting the  redundancy and  correlation analysis;  and/or  • Selecting the feature for  model building.    Demonstrates satisfactory  knowledge and  understanding of data pre- processing and feature  selection.    Completed most of the  following tasks with accuracy  and completeness to a

Kushal · Accepted Answer

Data Science has been set up as an Associate in Nursing fundamental rising logical field driving examination advancement in disciplines treasure insights, registering science and knowledge science, and viable change. The field of information science incorporates zones of registering, data examination, AI, design acknowledgment, regular language understanding, and large information control. It additionally manages new logical difficulties, going from information catch, creation, stockpiling, recovery, sharing, examination, improvement, and representation, to integrative investigation across heterogeneous and associated muddled assets for higher dynamic, coordinated effort, and, eventually, esteems creation. Recently, the booming development in Data Science with an exponential increase in unorganized knowledge around United States, data science emerges as Associate in Nursing answer to several real-life issues. 
Data improvement typically takes up most of the time of a knowledge man of science, auto -data cleaning has been heavily a topic of analysis over the past few years. Some companies such as IBM provide automation and tooling for knowledge improvement. Another modern development in the field of knowledge science has been the feature tools that supply an answer for automatic feature engineering. Now, companies area unit heavily finance in shopping for tools and services that create the method easier and cheaper. With the huge quantity of knowledge out there, the storage of data has become an even bigger concern.
Moreover, we have to face the fact of non stationary and nonlinearity within the knowledge. Some methods, such as empirical mode decomposition (EMD), have already been developed to analyze nonlinear and non stationary knowledge. It uses a reiterative formula primarily based solely on knowledge not on a fastened basis. EMD, Bayesian methodology, Kalman filtering, and machine learning techniques may be thought of adjustive analysis.

MLN 601_Assessment 2 Brief_Source Code and Presentation_Module 8 Page 1 of 8 Task Summary Customer churn, also known as customer attrition, refers to the movement of customers from one service...

Answer To: MLN 601_Assessment 2 Brief_Source Code and Presentation_Module 8 Page 1 of 8 Task Summary Customer...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment