4/2/22, 3:03 AM Assignment 7: Clustering https://wssu.instructure.com/courses/19153/assignments/ XXXXXXXXXX/2 Assignment 7: Clustering Due Tuesday by 11:59pm Points 80 Submitting a file upload Start...

1 answer below »
I need help doing this assignment


4/2/22, 3:03 AM Assignment 7: Clustering https://wssu.instructure.com/courses/19153/assignments/325908 1/2 Assignment 7: Clustering Due Tuesday by 11:59pm Points 80 Submitting a file upload Start Assignment I Do the following experiments: Use the original loan data set (without any discretizaton or nominalization of the numeric attributes) represented as an ARFF file. Apply the following clustering algorithms with the specified parameter settings. Use the guidelines provided in Week 13 Handout Clustering Experiments. 1. k-means (weka.clusterers.SimpleKMeans): Generate two clusters in Classes to clusters evaluation mode. Compare the cluster centroids with the data models generated by the Weka classifiers, that is, if we consider these centroids as instances how are they classified according to the models produced by OneR or J48? 2. EM (weka.clusterers.EM): Generate two clusters in Classes to clusters evaluation mode (change the default number of clusters to 2). Compare the clusters and their parameters with the class descriptions produced by the Naive Bayes classifier. Are they similar? 3. Cobweb (weka.clusterers.Cobweb): Use training set cluster mode and ignore the class attribute (loan approval). Describe only the top level split (the successors of the root) in the clustering hierarchy (using the approach described in the Handout Clustering Experiments). Compute the classes to clusters evaluation error for the top level clustering by hand (see again Handout Clustering Experiments). Weka cannot be used for this purpose because it computes the evaluation for the leaf clusters only. Hints about parameter tuning: 1. For k-means, try several different seeds (used by the random number generator) to find the best clustering. 2. In EM, use numClusters = 2 and try several seeds as well. 3. In Cobweb, try different cutoff's (category utility threshold) to generate the smallest possible clustering hierarchy (with minimal number of clusters). II Write a report on the experiments described above including all comparisons, analyses and the answers to all questions. Use tables whenever possible and include only experimental data which are relevant to the analysis. 4/2/22, 3:03 AM Assignment 7: Clustering https://wssu.instructure.com/courses/19153/assignments/325908 2/2 Chapter 4 Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4, Algorithms: Clustering of Data Mining by I. H. Witten, E. Frank, M. A. Hall and C. J. Pal 2 • Clustering techniques apply when there is no class to be predicted: they perform unsupervised learning • Aim: divide instances into “natural” groups • As we have seen, clusters can be: • disjoint vs. overlapping • deterministic vs. probabilistic • flat vs. hierarchical • We will look at a classic clustering algorithm called k-means • k-means clusters are disjoint, deterministic, and flat Clustering 3 • Step 1: Choose k random cluster centers • Step 2: Assign each instance to its closest cluster center based on Euclidean distance • Step 3: Recompute cluster centers by computing the average (aka centroid) of the instances pertaining to each cluster • Step 4: If cluster centers have moved, go back to Step 2 • This algorithm minimizes the squared Euclidean distance of the instances from their corresponding cluster centers • Determines a solution that achieves a local minimum of the squared Euclidean distance • Equivalent termination criterion: stop when assignment of instances to cluster centers has not changed The k-means algorithm 4 The k-means algorithm: example 5 Discussion • Algorithm minimizes squared distance to cluster centers • Result can vary significantly • based on initial choice of seeds • Can get trapped in local minimum • Example: • To increase chance of finding global optimum: restart with different random seeds • Hierarchical k-means: apply recursively with k = 2 to the resulting clusters instances initial cluster centres 6 Faster distance calculations • Can we use kD-trees or ball trees to speed up the process? Yes, we can: • First, build the tree data structure, which remains static, for all the data points • At each node, store the number of instances and the sum of all instances (summary statistics) • In each iteration of k-means, descend the tree and find out which cluster each node belongs to • Can stop descending as soon as we find out that a node belongs entirely to a particular cluster • Use summary statistics stored previously at the nodes to compute new cluster centers 7 Example scenario (using a ball tree) 8 Choosing the number of clusters • Big question in practice: what is the right number of clusters, i.e., what is the right value for k? • Cannot simply optimize squared distance on training data to choose k • Squared distance decreases monotonically with increasing values of k • Need some measure that balances distance with complexity of the model, e.g., based on the MDL principle (covered later) • Finding the right-size model using MDL becomes easier when applying a recursive version of k-means (bisecting k-means): • Compute A: information required to store data centroid, and the location of each instance with respect to this centroid • Split data into two clusters using 2-means • Compute B: information required to store the two new cluster centroids, and the location of each instance with respect to these two • If A > B, split the data and recurse (just like in other tree learners) 9 Hierarchical clustering • Bisecting k-means performs hierarchical clustering in a top-down manner • Standard hierarchical clustering performs clustering in a bottom- up manner; it performs agglomerative clustering: • First, make each instance in the dataset into a trivial mini-cluster • Then, find the two closest clusters and merge them; repeat • Clustering stops when all clusters have been merged into a single cluster • Outcome is determined by the distance function that is used: • Single-linkage clustering: distance of two clusters is measured by finding the two closest instances, one from each cluster, and taking their distance • Complete-linkage clustering: use the two most distant instances instead • Average-linkage clustering: take average distance between all instances • Centroid-linkage clustering: take distance of cluster centroids • Group-average clustering: take average distance in merged clusters • Ward’s method: optimize k-means criterion (i.e., squared distance) 10 Example: complete linkage 11 Example: single linkage 12 Incremental clustering •Heuristic approach (COBWEB/CLASSIT) •Forms a hierarchy of clusters incrementally •Start: • tree consists of empty root node •Then: • add instances one by one • update tree appropriately at each stage • to update, find the right leaf for an instance • may involve restructuring the tree using merging or splitting of nodes •Update decisions are based on a goodness measure called category utility Clustering the weather data I ID Outlook Temp. Humidity Windy A Sunny Hot High False B Sunny Hot High True C Overcast Hot High False D Rainy Mild High False E Rainy Cool Normal False F Rainy Cool Normal True G Overcast Cool Normal True H Sunny Mild High False I Sunny Cool Normal False J Rainy Mild Normal False K Sunny Mild Normal True L Overcast Mild High True M Overcast Hot Normal False N Rainy Mild High True 1 2 3 13 ID Outlook Temp. Humidity Windy A Sunny Hot High False B Sunny Hot High True C Overcast Hot High False D Rainy Mild High False E Rainy Cool Normal False F Rainy Cool Normal True G Overcast Cool Normal True H Sunny Mild High False I Sunny Cool Normal False J Rainy Mild Normal False K Sunny Mild Normal True L Overcast Mild High True M Overcast Hot Normal False N Rainy Mild High True 4 3 Merge best host and runner-up 5 Consider splitting the best host if merging does not help Clustering the weather data II 14 Final clustering 15 Example: clustering a subset of the iris data 16 Example: iris data with cutoff 17 The category utility measure • Category utility: quadratic loss function defined on conditional probabilities: • Every instance in a different category  numerator becomes CU(C1,C2,...,Ck )= P(Cl ) (P(ai = vij |Cl ) 2 -P(ai = vij ) 2 ) j å i å l å k m-P(ai = vij ) 2 maximum number of attributes 18 Numeric attributes? • Assume normal distribution: • Then: • Thus becomes • Pre-specified minimum variance can be enforced to combat overfitting (called acuity parameter) f (a) = 1 2ps e - (a-m )2 2s 2 P(ai j å = vij ) 2Û f (ai ) 2 dai = 1 2 ps i ò CU = P(Cl ) (P(ai = vij |Cl ) 2 -P(ai = vij ) 2 ) j å i å l å k k C CU l i iil lå å ÷÷ ø ö çç è æ - = ssp 11 2 1 ]Pr[ 19 20 Multi-instance learning • Recap: multi-instance learning is concerned with examples corresponding to sets (really, bags or multi-sets) of instances • All instances have the same attributes but the full set of instances is split into subsets of related instances that are not necessarily independent • These subsets are the examples for learning • Example applications of multi-instance learning: image classification, classification of molecules • Simplicity-first methodology can be applied to multi-instance learning with surprisingly good results • Two simple approaches to multi-instance learning, both using standard single-instance learners: • Manipulate the input to learning • Manipulate the output of learning 21 Aggregating the input • Idea: convert multi-instance learning problem into a single- instance one • Summarize the instances in a bag by computing the mean, mode, minimum and maximum, etc., of each attribute as new attributes • “Summary” instance retains the class label of its bag • To classify a new bag the same process is used • Any single-instance learning method, e.g., decision tree learning, can be applied to the resulting data • This approach discards most of the information in a bag of instances but works surprisingly well in practice • Should be used as a base line when evaluating more advanced approaches to multi-instance learning • More sophisticated region-based summary statistics can be applied to extend this basic approach 22 Aggregating the output • Idea: learn a single-instance classifier directly from the original instances in all the bags • Each instance is given the class of the bag it originates from • But: bags can contain differing numbers of instances  give each instance a weight inversely proportional to the bag's size • To classify a new bag: • Produce a prediction for each instance in the bag • Aggregate the predictions to produce a prediction for the bag as a whole • One approach: treat predictions as votes for the various class labels; alternatively, average the class probability estimates • This approach treats all instances as independent at training time, which is not the case in true multi-instance applications • Nevertheless, it often works very well in practice and should be used as a base line Clustering • Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. • Differences between models/algorithms for clustering: – Conceptual
Answered 3 days AfterApr 02, 2022

Answer To: 4/2/22, 3:03 AM Assignment 7: Clustering https://wssu.instructure.com/courses/19153/assignments/...

Mohd answered on Apr 06 2022
109 Votes
k-means (weka.clusterers.SimpleKMeans): Generate two clusters in Classes to clusters
evaluation mode. Compare the clu
ster centroids with the data models generated by the Weka
classifiers, that is, if we consider these centroids as instances how are they classified according to the
models produced by OneR or J48?
A. We have already done k-means clustering on your data to get two clusters, then you could use J48 on the new data point to find out which class it belongs to. J48 has higher accuracy than kmeans clustering.
     
    Seed
    Number of clusters
    Incorrectly Classified instances
    weka.clusterers.SimpleKMeans
    10
    2
    35%
    
    2
    2
    45%
    
    5
    2
    45%
    
    15
    2
    45%
    
    20
    2
    35%
2. EM (weka.clusterers.EM): Generate two clusters in Classes to clusters evaluation mode (change
the default number of clusters to 2). Compare the clusters and their parameters with the class
descriptions produced by the Naive Bayes classifier. Are they similar?
A. Somehow, they are similar in terms of class attribute levels and number of clusters. In naive bayes we have different response variable to classify the data and in EM clustering we have two cluster to...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here