BTA 350 Analytics Project, Name BTA 350 Analytics Project Bank Marketing Data Set Steve Gallegos Project Section 1 Bank Marketing Data Set: · Bank-additional-full.csv with all examples XXXXXXXXXXand...

1 answer below »
Please follow and complete Sections 3-6 only, each section has its word count minimum (highlighted in yellow) that equals to about 2000 words total. Reference Sections 1 & 2, section 1 has a link to the dataset (fyi). Thank you!


BTA 350 Analytics Project, Name BTA 350 Analytics Project Bank Marketing Data Set Steve Gallegos Project Section 1 Bank Marketing Data Set: · Bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analysed in [Moro et al., 2014] · Link to data set: UCI Machine Learning Repository: Bank Marketing Data Set Project Section 2: CONTEXT Structured approach of the Bank marketing data set: The referred data set is pertaining to a banking institution which shows the direct phone marketing campaigns being done so at to attract clients in subscribing to term deposits in the bank. This is done so as to increase the deposits and also understand the behavior of the customers in approving the term deposits in the bank. The dataset is originally collected from UCI Machine learning repository. The dataset that we are referring to in this abstract gives us the insights for developing new strategies in the years to come in the field of marketing strategies required for the bank. This helps in the betterment of the customer relationship management as a whole. The bank client data mentioned here in the dataset has inputs which gives personal, educational, demographic and income level information about the clients. These helps in analyzing the output of the workflow which is nothing but getting the likely yes for a term deposit subscription. The approach for such study would involve detailed analysis of the existing customer database. This would also help in filtering the potential clients who can be more likely to subscribe a term deposit plan in the bank. Developing better strategies for future marketing campaigns is the key aspect of analyzing the data set. This would also ensure better sales and revenue for the institution. Project Section 3: Defining the problem Instructions (delete after you have completed this section): Define the problem you will be investigating in the project, the question you will be asking and their business relevance. The problem should be a higher-level business problem that the organization may be facing that you can then turn into a specific question to be asked of the dataset. The default question in the suggested dataset is: “to predict if the client will subscribe a term deposit”. You can adopt this question for your analysis, and if you do so, work your way up to identify a business problem this question is related to. You can also identify your own business problem and a suitable question for this dataset, or your own dataset. This section should contain: 1) Problem definition, 2) Question definition and link with the problem, 3) Explanation of the business relevance of the problem. This section should have around 200-300 words. Project Section 4: Data Instructions (delete after you have completed this section): Describe the dataset. Identify the type of data and its source. Conduct exploratory data analysis. Report on the number of instances and variables, including their type. Provide some summary statistics of some variables you think may be important and explain how you identified them. Provide at least one plot of at least one variable. Assess if the data you have is right for the question you are asking and if there are any problems in the dataset. This section should contain around 300-400 words and at least one plot. Project Section 5: Analysis Instructions (delete after you have completed this section): Define the outcome variable for your analysis and show how it is linked with the question and problem you identified. Define at least two models with at least two different variables that help predict the outcome variable. Describe these models in words and as formulas. Report on the analyses you conducted in Microsoft Excel using these models by describing the analyses and their results and providing at least one table and at least one plot for each model. This section should contain around 500-600 words, at least two tables and at least two plots. Project Section 6: Evaluation Instructions (delete after you have completed this section): Evaluate the results of your analyses by comparing and contrasting the two models. 1) Confounding: provide at least two other possible explanations of the apparent relationship between variables you used. 2) Overfitting: assess if you have enough data and the right complexity of the model to reduce overfitting. 3) Causality: assess if you can make claims about causation or should you limit your claims to correlation. 4) Significance: assess whether your findings have business significance. 5) Effect size: compare the effect size for your models and comment on its relevance. Provide a general comment on the quality and suitability of your results. Evaluate whether you are satisfied with the results or whether you have to develop alternative or more complex models. Comment on your thinking regarding this and whether you decided to conduct any further analysis. Provide details of any further analyses and models conducted. Comment on the final model you selected and how you arrived at this decision. This section should contain about 600-700 words. 2
Answered 9 days AfterApr 18, 2022

Answer To: BTA 350 Analytics Project, Name BTA 350 Analytics Project Bank Marketing Data Set Steve Gallegos...

Suraj answered on Apr 26 2022
98 Votes
Defining the Problem:
Problem definition: In this section we are testing or analysing the data set in three aspects. In the first problem we test on the marital status of the customer with the outcome variable and in the second major problem we are going to predict the outcome that the clint will subscribe a term deposit or not. In the third and last problem we are going test whether there is a
ny effect of the education of the clint on the term deposit or not.
Questions: The questions for the problem defined earlier are given as follows:
Question 1: Whether marital status of the clint have any impact on the outcome variable that is marital status and outcome variable are independent or dependent.
Question 2: This problem related to the prediction of the outcome variable. We are interest to make a model that will predict whether a clint with certain variables is going to subscribe a term deposit or not.
The link of this question with the problem defined is that we are able to classify a clint or customer in subscribed term deposit clint or unsubscribed term deposit clint.
Question 3: Whether the education level have any impact on the subscription of the term deposit that is whether both the variables are dependent or not.
Business relevance of the problem:
The problems defined in the previous sections have a minor or major impact on the business. That is if we are able to classify a customer to subscribed or not then we can give such offers to the customers which will attract them to subscribe the term deposit service of the bank because we know which things will attract them towards the services of the bank.
Data:
In this section we are going to explain the data set used in the analysis and generate some summary statistics and exploratory data analysis.
The description about the data set is given as follows:
Age: Age of the clint (numeric)
Job: type of job
Marital: marital status
Education: Education type
Default: has credit in default?
Housing: has housing loan?
Loan: has personal loan?
Contact: contact communication type
Month: last contact month of year
Day_of_week: last contact day of the week
Duration: last contact duration, in seconds (numeric).
Campaign: number of contacts performed during this campaign and for this client
Pdays: number of days that passed by after the client was last contacted from a previous
Previous: number of contacts performed before this campaign and for this client (numeric)
Poutcome: outcome of the previous marketing campaign
Emp.var.rate: employment variation rate - quarterly indicator (numeric)
Cons.price.idx: consumer price index - monthly indicator (numeric)
Cons.conf.idx: consumer confidence index - monthly indicator (numeric)
Euribor3m: Euribor 3-month rate - daily indicator (numeric)
Nr. employed: number of employees - quarterly indicator (numeric)
Output variable (desired target):
Y - has the client subscribed a term deposit? (Binary: "yes”, “no")
The summary statistics table for the data is given as follows:
We will explain about 2-3 variables here. The job variable has maximum number of people are in admin department and minimum people are retried. In education most of the people are university students and few are of other category.
There are some missing values are also in the variables labelled as unknown.
The following are some graphs produced from the data set.
This is the bar plot where it describes about the marital status of the clints. From this visual we can say that most of the clints are married followed by single and then divorced peoples. There are few missing values present in the variable.
Consider the next boxplot created with age variable with respect to outcome variable,
The boxplot tells two things about the variable, the first is the distribution and second is about the outliers present in the variable. Here, in the age variable the distribution is approximately skewed and outliers are...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here