Sys800_Lab2_Fall2021_EN 1 Assignment 2: Classification methods. I – Objective In the first laboratory, you have initially extracted numbers of feature vectors form digit images using the projection...

In the first laboratory, you have initially extracted numbers of feature vectors form digit images using theprojection and LBP techniques; then you have reduced the dimension of feature vectors using PCA. During thissecond laboratory, you are going to implement, test and compare different classification algorithms


Sys800_Lab2_Fall2021_EN 1 Assignment 2: Classification methods. I – Objective In the first laboratory, you have initially extracted numbers of feature vectors form digit images using the projection and LBP techniques; then you have reduced the dimension of feature vectors using PCA. During this second laboratory, you are going to implement, test and compare different classification algorithms. II - Introduction First of all, classifier algorithms can be categorized into two categories, those acting through modeling and those acting through separation (see Figure 1). The objective of the first category (Figure 1-a) is to determine the most accurate possible model of each class, while the second type (Figure 1-b) seeks to optimize the decision boundary to separate as max as possible the classes. In the first case, the decision is made based on a similarity measure to compare a given sample to each class model; and in the second case, the decision is made on the basis of the position of a sample relative to the separation boundaries. III - Classification Algorithms As part of this laboratory, three classification approaches will be discussed, one approach acting by modeling (Quadratic Bayes and k-NN), another one is acting by separation (SVM), and the last one performing a combination of several decision-based classifiers. (b) Classification by separation (a) Classification by modeling Figure 1. Two categories of classification algorithms SYS800- Pattern recognition and inspection Winter 2021. 2 III.1. Bayesian Quadratic It is conventional in pattern recognition to use parametric approaches, which consist of making an assumption about the nature of the data distribution and to estimate the distribution parameters using the training data. Thus, the Quadratic Bayes classifier is based on the assumption that the data of each of class follows a multivariate normal distribution defined by: , (1) Where Σ! and µ! are the covariance matrix and the mean vector of class ω!, d is the dimension of the feature vectors, and $∑"$ denotes the determinant of the covariance matrix of class wj. The training phase is then to use the training data to estimate for each class, the mean and covariance matrix. These parameters are then saved so that it can be used later to classify the test examples using Bayes' rule to estimate the probabilities of unseen observations. , (2) where P(w!) represents the a priori probability of the class ω! and p(x) represents the evidence defined by , (3) where c represents the total number of classes. In practice, it is preferable to calculate the discriminant functions associated with each class which are obtained by taking the natural logarithm of equation (2) and by removing the constant terms: . (4) Note that in our case, the ten classes are equiprobable; i.e., all the classes have the same prior probability, i.e. P,w!- = 1 c2 . The first term of Eq. (4) is not considered when the decision is made and it can be removed. Moreover, it is interesting to note that the last term of equation (4) represents the Mahalanobis distance between the sample to classify and the distribution of the class ω!. Finally, the decision is made by looking for the maximum value among the discriminant functions associated with each class. III.2. Random forests To avoid having to make an assumption about the nature of the distribution of data, it is possible to use a non- parametric method such as the algorithm of Random forests. The random forests method is a supervised learning algorithm that belongs to the ensemble learning methods. The main idea of the Random Forests algorithm is to combine several decision trees built using different samples of the same training base. The output of the predicted class is obtained by taking the Majority Vote decision of all the trees. p(x |ω j ) = 1 (2π )d /2 | Σ j | 1/2 exp − 1 2 x −µ j( ) T Σ j −1 x −µ j( ) ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ P(ω j | x) = p(x |ω j )P(ω j ) p(x) p(x) = p(x |ω j )P(ω j ) j=1 c ∑ dj (x) = ln P̂ ω j( )− 1 2 ln Σ̂ j( )− 12 x − µ̂ j( ) T Σ̂ j −1 x − µ̂ j( ) 3 A decision tree, which is in the heart of random forests, is a decision-making process. It can be illustrated as a decision diagram that contains a root node, interior nodes, and leaves. Everything is connected by branches. Like biological trees, a decision tree diagram begins with a root node that branches out to another node through branches until the chart reaches a leaf. Each node asks a question in order to make a decision, and the branches represent the different possibilities that this node could take. As it can be seen in Figure 3, the random forests algorithm is composed of several decision trees, each with the same nodes, but using different data that leads to different leaves. The algorithm merges the multiple decisions of the decision trees obtained in order to make a final decision. The algorithm is called random for two main raisons, for each tree, the choice of the data set for each tree is random, and the choice of the variable set for each node is random. ……….. . Dataset Tree 1 Tree 2 Tree n Class 1 Class 2 Class 2 Majority voting Subset 1 Subset 2 Subset n Class 2 Figure 3. Random forest method. Branches Yes Root node Interior node Interior node No Leaf node Leaf node Branches No Yes Interior node Leaf node Leaf node Leaf node Branches Yes Yes Figure 2. Concept of decision trees. 4 III.3. Support Vector Machine A support vector machine (SVM) is a binary classifier whose purpose is to determine the "optimal" boundary separating the training data of two classes. This boundary is defined by: , (6) where ?# ∈ {1, −1} correspond to the labels of the training data ?#, and K is the Mercer’s kernel used to project the data into a larger space optionally wherein the linear separation of classes is possible. The training phase is to determine the coefficients ?# and the bias b, which maximize the margin of separation; that is to say the distance between the data of the positive class and the negative class data closest to the separating border. Training solves a problem of optimization equivalent to the following equation: where et (7) The training data ?# associated with nonzero coefficients ?# are called support vectors. For more information on SVMs, we advise you to read the document "for cores and SVM classifiers"1. The output of SVM is defined by: (8) and the decision will be taken according to the sign of Eq. (8). If we use a linear kernel ?(?# , ?) = ?# ∙ ?, the sepration border is then a hyperplan defined by ? ∙ ? + ? = 0 with ? =∑ ?#?#?#$#%& . An example of two dimensions is presented in Fig. 4, where the training data shown with a circle correspond to support vectors. Figure 4. Example boundary decision obtained using a linear SVM 1 Extract from: Mathias M. Adankon, "Optimisation de ressources pour la sélection de modèle des SVM ", Mémoire de Master, École de Technologie Supérieure, Montréal, Septembre 2005 yiαi K (xi ,x ) i =1 n ∑ +b = 0 ú û ù ê ë é - åå == n ji jijiji n i i xxKyy 1,1 ),( 2 1max aaa a yiαi i=1 n ∑ = 0 0 ≤αi ≤C, i =1,...,n f (x ) = yiαi K (xi ,x ) i =1 m ∑ +b 5 However, in the case of real applications, the data is unfortunately not always linearly separable. It will then be necessary to use other types of kernels, such as polynomial kernels ?,?# , ?"- = (??# ∙ ?" + ?)', and gaussian kernel ?,?# , ?"- = exp(−?C?# − ?"C (). An illustration of the utility of the polynomial kernel is shown in Figure 5. The data can then not be separated by a straight boundary, but the use of a 2nd order polynomial kernel to determine a non-linear boundary of separation is better adapted. (a) (b) Figure 5. Borders of separation obtained by a linear SVM (a) and non-linear SVM (b) Moreover, in most pattern recognition problems, the number of classes is greater than two. It is then necessary to decompose the classification problem into binary sub-problems that can be solved by different SVMs. The simplest strategy to implement is to build many SVMs. Each SVM is then driven to distinguish samples from one class to those of all other classes. One strategy is "one against all" and the decision is made by looking for the maximum value among the outputs of the various SVMs. IV - Role of different databases The training data will be used to build the three classifiers, while the test database is used to evaluate their ability of generalization. Indeed, it is common that the error rate evaluated on the training database is much lower than that estimated using other data. Therefore, this phenomenon of data overfitting justifies the use of a second database to evaluate the ability of a classifier to generalize. Moreover, in order to be rigorous, no parameters of the classifier will be set taking into account the results obtained on the test database. Thus, in some cases it may be necessary to divide the training set into two subsets. Part of the data is then used for training the classifiers while the rest is used as the base to optimize some parameters of the classifier, such as the number of k neighbors for k-NN, or the hyperparameters (kernel parameters and C) for SVM. The second subset is called validation database and generally consists of one third (1/3) of the data for the training database. V - Criteria for comparing the classifiers The primary criterion to compare different classification algorithms will be the ability to generalize, which will be evaluated by measuring the
Mar 18, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here