PPT #18 Cluster Analysis with WEKA and Instructions for ICA #9 1 Learning Objectives for the topic of: data mining techniques Describe the goals of data mining (its role in decision support systems)...

read over slide, activity instructions are at the end labeled "activity 9"


PPT #18 Cluster Analysis with WEKA and Instructions for ICA #9 1 Learning Objectives for the topic of: data mining techniques Describe the goals of data mining (its role in decision support systems) Describe several different types of data mining techniques and understand the goals of each technique. Describe what a data mining model is and have a basic understanding of how models are developed using discovery algorithms (which are also called “machine learning techniques”) Identify business-related decisions that may be addressed or business-related problems that may be resolved by using data mining techniques to derive useful information. As just mentioned, this is the first presentation of a set of five about data mining techniques. The learning objectives with regard to the general topic of data mining techniques are listed here. Once you complete these lessons, you should be able to (1) describe the goals of data mining. (That is, its role within a decision support system), (2) describe several different types of data mining techniques and understand the goals of each technique. (3) describe what a data mining model is and have a basic understanding of how models are developed using discovery algorithms (which are also called “machine learning techniques”). (4) identify business-related decisions that may be addressed or business-related problems that may be resolved by using data mining techniques to derive useful information. Cluster Analysis Cluster analysis is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Marketing segmentation Analysis uses cluster analysis. 3 Scenario A BMW dealership has customers who come to browse. Some will wander around the lot but not come into the show room. Some will come into the show room and ask about a particular car such as the 3-series, Z4 or the M5. Some will ultimately buy a car, some will lease a car, and others are just browsing. The dealership would like to have a means of identifying whether a customer is going to be a “buyer” or a “browser” and whether it is best to try to interest the customer in the 3-series, Z4 or M5 series. You will need this file 5 Here’s the data Dealership: 0 = did not visit to browse; 1= did visit to browser Showroom: 0 = did not come into the showroom; 1 = did come inside showroom Computer Search: 0 = Did not look at the cars online; 1 = Did look at the cars online M5: 0 = Did not show interest in the M5; 1 = Did show interest in the M5 Z4: 0 = Did not show interest in the Z4; 1 = Did show interest in the Z4 3-series: 0 = Did not show interest in the 3 series; 1 = Did show interest in the 3 series Financing: 0 = Did not qualify for financing; 1 = Did qualify for financing Purchase: 0 = Did not purchase or lease a car; 1= Did purchase or lease a car Start WEKA. Click [Explorer] Start up WEKA and click the EXPLORER button to start the regression analysis. 7 Open the file BMW_browsers.arff Select [Cluster] analysis Select the simple K means method Change parameters of cluster analysis Right Click Change number of clusters to 5 The number of clusters is a guess. Usually, the cluster analysis will be repeated using different values for K. Run the clustering analysis Review Results 0 Dreamers – can’t afford a BMW 1 M5 lookers but not buyers 2 Throw-aways – don’t look, don’t buy 3 Got to have a Z4 or M5 4 Starting out with a 3-series Review Results 0 Dreamers – can’t afford a BMW Cluster 0— This group we can call the "Dreamers," as they appear to wander around the dealership, looking at cars parked outside on the lots, but trail off when it comes to coming into the dealership, and worst of all, they don't purchase anything. 15 Review Results Cluster 1— We'll call this group the "M5 Lovers" because they tend to walk straight to the M5s, ignoring the 3-series cars and the Z4. However, they don't have a high purchase rate — only 52 percent. This is a potential problem and could be a focus for improvement for the dealership, perhaps by sending more salespeople to the M5 section. 1 M5 lookers but not buyers 16 Review Results • Cluster 2— This group is so small we can call them the "Throw-Aways" because they aren't statistically relevant, and we can't draw any good conclusions from their behavior. (This happens sometimes with clusters and may indicate that you should reduce the number of clusters you've created). 2 Throw-aways – don’t look, don’t buy 17 Review Results • Cluster 3— This group we'll call the "BMW Babies" because they always end up purchasing a car and always end up financing it. They walk around the lot looking at cars, then turn to the computer search available at the dealership. Ultimately, they tend to buy M5s or Z4s (but never 3-series). This cluster tells the dealership that it should consider making its search computers more prominent around the lots (outdoor search computers?), and perhaps making the M5 or Z4 much more prominent in the search results. Once the customer has made up his mind to purchase the vehicle, he always qualifies for financing and completes the purchase. 3 Got to have a Z4 or M5 18 Review Results • Cluster 4— This group we'll call the "Starting Out With BMW" because they always look at the 3-series and never look at the much more expensive M5. They walk right into the showroom, not the lot and tend to ignore the computer search terminals. While 50 percent get to the financing stage, only 32 percent ultimately purchase. The dealership could draw the conclusion that these customers looking to buy their first BMWs know exactly what kind of car they want (the 3-series entry-level model) and are hoping to qualify for financing to be able to afford it. The dealership could possibly increase sales to this group by relaxing their financing standards or by reducing the 3-series prices. 4 Starting out with a 3-series 19 Activity #9 Cluster Analysis bank customer segmentation analysis 20 Scenario I wonder whether my bank customers fall into distinguishable groups. Also I wonder what factors (variables) can be used to distinguish one group from another. I am going to do a customer segmentation analysis using a clustering technique. Now here is the scenario for activity #9. Here we have Robert. Robert works at a bank in the marketing department. Robert would like to do some targeted promotions of bank products and services. He is just not sure what the “targets” look like. In other words, he believes that his bank customers are not all the same in terms of their needs for bank products and services. Some customers have a greater need for loan products such as auto loans, credit cards, and maybe mortgage loans. Other customers may have a greater need for investment products such as money market accounts, savings, and retirement accounts such as IRAs. If he is going to do a marketing campaign to say, solicit credit card customers, he would probably be wasting his money and effort trying to recruit older people who may be near retirement and who don’t have much use for another credit card. What Robert would like to do is customer segmentation analysis so he can see how and whether his customers fall into distinct groups and to identify the factors that are the most important in terms of distinguishing one group from another. Robert has learned that cluster analysis can be used to do customer segmentation analysis. So Robert thinks “I am going to do customer segmentation analysis using a clustering technique.” 21 You need this file for Activity #9 Bank Customer Data (600 records) Field NameDescription Customer IDID code AgeInteger Gender F or M or J (for Joint account) RegionWhere they live: Inner City, Suburb, Downtown, Rural area IncomeAnnual income in dollars Married?Y or N ChildrenNumber of children living in same household Auto Loan?Do they have an auto loan account? Y or N Savings Account?Do they have a savings account? Y or N Checking Account?Do they have a checking account? Y or N Mortgage?Do they have a mortgage account? Y or N Activity #9 I think that the K-means clustering technique would be the best one to use for this analysis. But I don’t know how many clusters there are. What value should I use for K? So Robert collected the data he thinks will be relevant to this analysis and he has his data set of 600 customers. He decides to use the K-means clustering technique because it is less complex than other clustering techniques but the drawback of using the K-means technique is that you have to guess at how many clusters there are before you run the algorithm. Sometimes plotting methods can be used to visualize the data points so you can see whether they fall into clusters or not, and if so, you can see how many clusters there are. But many times, when there are many variables in the data set, it is not possible to visualize clusters. So what Robert is going to do is run the K-means analysis several times. Each time, incrementing the value of K by one. So in other words, he’ll run K-means analysis using 2 for the value of K, then 3, 4, 5, and 6. He will see which value of K provides the best-fit solution. 24 Point of diminishing returns Number of Clusters (k) SSE How to find the best value for K Run k-means cluster analysis with K = 2, 3, 4, 5, and 6 (five times) Each time, an SSE statistic is generated. SSE (sum of squared errors) is the sum of the squared differences between each observation and its group's mean. It can be used as a measure of variation within a cluster. The lower the SSE is, the better the clustering model. But every time K increases, SSE decreases. SSE doesn’t go to 0 until K = n (600 cases) So we find the point of diminishing returns. At what point does SSE not decline that much any more? SSE when K=2 We are going to guess that our bank customers fall into some number of clusters between 2 and 6. So we will run the cluster analysis 5 times using these different values for K. Each time cluster analysis is run, an SSE statistic is generated. SSE (sum of squared errors) is the sum of the squared differences between each observation and its group's mean. It can be used as a measure of variation within a
Oct 25, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here