Machine LearningIncluding both Python programming problems and writingsCSci 5521: Fall’20...

Question

Machine LearningIncluding both Python programming problems and writings

CSci 5521: Fall’20 Introduction To Machine Learning Homework 2 (Due Friday, October 22, 11:59 pm) 1. (25 points) Let X = {x1, . . . , xn} be a set of n samples drawn i.i.d. from an univariate distribution with density function p(x|θ), where θ is an unknown parameter. In general, θ will belong to a specified subset of R, the set of real numbers. For the following choices of p(x|θ), derive the maxmimum likelihood estimate of θ based on the samples X :1 (a) (5 points) p(x|θ) = 1√ 2πθ exp ( − (x−2) 2 2θ2 ) , θ > 0. (b) (5 points) p(x|θ) = 1θ exp ( −xθ ) , 0 ≤ x <∞, θ=""> 0. (c) (5 points) p(x|θ) = 1 2θ3 x2 exp ( −xθ ) , 0 ≤ x <∞, θ=""> 0. (d) (5 points) p(x|θ) = θxθ−1 , 0 ≤ x ≤ 1, 0 < θ=""><∞. (e)="" (5="" points)="" p(x|θ)="1θ" ,="" 0="" ≤="" x="" ≤="" θ,="" θ=""> 0. 2. (25 points) Let X = {x1, . . . ,xn},xi ∈ Rd be a set of n samples drawn i.i.d. from a multivariate Gaussian distribution in Rd with mean µ ∈ Rd and covariance matrix Σ ∈ Rd×d. Recall that the density function of a multivariate Gaussian distribution is given by: p(x|µ,Σ) = 1 (2π)d/2|Σ|1/2 exp [ −1 2 (x− µ)TΣ−1(x− µ) ] . (a) (10 points) Derive the maximum likelihood estimates for the mean µ and covariance Σ based on the sample set X .1,2 (b) (7 points) Let µ̂n be the maximum likelihood estimate of the mean. Is µ̂n a biased estimate of the true mean µ? Clearly justify your answer by computing E[µ̂n], the expectation of µ̂n. (c) (8 points) Let Σ̂n be the maximum likelihood estimate of the covariance matrix. Is Σ̂n a biased estimate of the true covariance Σ? Clearly justify your answer by computing E[Σ̂n], the expectation of Σ̂n. Programming assignment: The next problem involves programming. For Question 3, we will be using the 2-class classifica- tion datasets from Boston50, Boston75, and the 10-class classification dataset from Digits which were used in Homework 1. 3. (50 points) We will develop two parametric classifiers by modeling each class’s conditional distribution p(x|Ci) as multivariate Gaussians with (a) full covariance matrix Σi and (b) 1You have to show the details of your derivation. A correct answer without the details will not get any credit. 2You can use material from the Matrix Cookbook and/or the textbook for your derivation. 1 diagonal covariance matrix Σi. In particular, using the training data, we will compute the maximum likelihood estimate of the class prior probabilities p(Ci) and the class conditional probabilities p(x|Ci) based on the maximum likelihood estimates of the mean µ̂i and the (full/diagonal) covariance Σ̂i for each class Ci. The classification will be done based on the following discriminant function: gi(x) = log p(Ci) + log p(x|Ci) . We will develop code for a class MultiGaussClassify with two key functions: MultiGaussClassify.fit(self,X,y) and MultiGaussClassify.predict(self,X). For the class, the init (self,k,d, diag) function can initialize the parameters for each class to be uniform prior, zero mean, and identity covariance, i.e., p(Ci) = 1/k, µi = 0 and Σi = I, i = 1, . . . , k. Here, the number of classes k and the dimensionality d of features is passed as an argument to the constructor of MultiGaussClassify. Further, diag is boolean (TRUE or FALSE) which indicates whether the estimated class covariance matrices should be a full matrix (diag=FALSE) or a diagonal matrix (diag=TRUE). For fit(self,X,y), the inputs (X, y) are respectively the feature matrix and class labels, and the function will learn (estimate) the parameters for each class, i.e., p(Ci), µi, and Σi, i = 1, . . . , k. For predict(X), the input X is the feature matrix corresponding to the test set and the output should be the predicted labels for each point in the test set. We will compare the performance of three models: (i) MultiGaussClassify with full class covariance matrices, (ii) MultiGaussClassify with diagonal covariance matrices, and (iii) LogisticRegression3 applied to three datasets: Boston50, Boston75, and Digits. Using cross val score with 5- fold cross-validation, report the error rates in each fold as well as the mean and standard deviation of error rates across folds for the three models applied to the three classification datasets. You will have to submit (a) code and (b) summary of results: (a) Code: You will have to submit code for MultiGaussClassify as well as a wrapper code hw2q3(). For the class, please use the following template: class MultiGaussClassify: def init (self, k, d, diag=False): ... def fit(self, X, y): ... def predict(self, X): 3You should use LogisticRegression from scikit-learn, similar to HW1. 2 ... Your class MultiGaussClassify should not inherit any base class in sklearn. Again, the three functions you must implement in the MultiGaussClassify class are init , fit, and predict. The wrapper code hw2q3() (main file) has no input and is used to prepare the datasets, and make calls to cross val score(method,X,y,cv) to generate the error rate results for each dataset and each method. You should use cross val score() in sklearn for cross-validation. For the method argument in cross val score, you can call the method corresponding to MultiGaussClassify with full covariance matrix as just ‘multigauss- classify’ and the method corresponding to MultiGaussClassify with diagonal covariance matrix as ‘multigaussdiagclassify.’ The results should be printed to terminal (not generating an additional file in the folder). Make sure the calls to cross val score(method,X,y,k) are made in the following order and add a print to the terminal before each call to show which method and dataset is being used: 1. MultiGaussClassify with full covariance matrix on Boston50, 2. MultiGaussClassify with full covariance matrix on Boston75, 3. MultiGaussClassify with full covariance matrix on Digits, 4. MultiGaussClassify with diagonal covariance matrix on Boston50, 5. MultiGaussClassify with diagonal covariance matrix on Boston75, 6. MultiGaussClassify with diagonal covariance matrix on Digits, 7. LogisticRegression with Boston50, 8. LogisticRegression with Boston75, and 9. LogisticRegression with Digits. For example, the first call to cross val score(method,X,y,k) should result in the fol- lowing output: Error rates for MultiGaussClassify with full covariance matrix on Boston50: Fold 1: ### Fold 2: ### ... Fold 5: ### Mean: ### Standard Deviation: ### (b) Summary of results: For each dataset and each method, report the test set error rates for each of the k = 5 folds, the mean error rate over the k folds, and the standard deviation of the error rates over the k folds. Make a table to present the results for each method and each dataset (9 tables in total). Each column of the table represents a fold, and add two columns at the end to show the overall mean error rate and standard deviation over the k folds. For example: Error rates for MGC with full cov matrix on Boston50 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Mean SD # # # # # # # 3 Additional instructions: Code can only be written in Python (not IPython notebook); no other programming languages will be accepted. One should be able to execute all programs directly from command prompt (e.g., “python3 hw2q3.py”) without the need to run Python interactive shell first. You are allowed to use only LogisticRegression and cross val score from scikit-learn. Test your code yourself before submission and suppress any warning messages that may be printed. Your code must be run on a CSE lab machine (e.g., csel-kh1260-01.cselabs.umn.edu). Please make sure you specify the version of Python you are using as well as instructions on how to run your program in the README file (must be readable through a text editor such as Notepad). Information on the size of the datasets, including number of data points and dimensionality of features, as well as number of classes can be readily extracted from the datasets in scikit-learn. Each function must take the inputs in the order specified in the problem and display the output via the terminal or as specified. For each part, you can submit additional files/functions (as needed) which will be used by the main file. Please put comments in your code so that one can follow the key parts and steps in your code. Follow the rules strictly. If we cannot run your code, you will not get any credit. • Things to submit 1. hw2.pdf: A document which contains the solution to Problems 1, 2, and 3 including the summary of results for 3. This document must be in PDF format (no word, photo, etc. is accepted). If you submit a scanned copy of a hand-written document, make sure the copy is clearly readable, otherwise no credit may be given. 2. Python code for Problem 3 (must include the required hw2q3.py). 3. README.txt: README file that contains your name, student ID, email, instructions on how to run your code, the full Python version your are using, any assumptions you are making, and any other necessary details. The file must be readable by a text editor such as Notepad. 4. Any other files, except the data, which are necessary for your code. 4 CSci 5521: Fall’20 Introduction To Machine Learning Homework 2 (Due Friday, October 23, 11:59 pm) 1. (25 points) Let X = {x1, . . . , xn} be a set of n samples drawn i.i.d. from an univariate distribution with density function p(x|θ), where θ is an unknown parameter. In general, θ will belong to a specified subset of R, the set of real numbers. For the following choices of p(x|θ), derive the maxmimum likelihood estimate of θ based on the samples X :1 (a) (5 points) p(x|θ) = 1√ 2πθ exp ( − (x−2) 2 2θ2 ) , θ > 0. (b) (5 points) p(x|θ) = 1θ exp ( −xθ ) , 0 ≤ x <∞, θ=""> 0. (c) (5 points) p(x|θ) = 1 2θ3 x2 exp ( −xθ ) , 0 ≤ x <∞, θ=""> 0. (d) (5 points) p(x|θ) = θxθ−1 , 0 ≤ x ≤ 1, 0 < θ=""><∞. (e)="" (5="" points)="" p(x|θ)="1θ" ,="" 0="" ≤="" x="" ≤="" θ,="" θ=""> 0. 2. (25 points) Let X = {x1, . .

hw2-4-fb000iuz.pdf hw2-1-dp3wh05j-0cpljzl4.pdf hw2-1dfegwa1-wgc0rtr1.pdf

CSci 5521: Fall’20 Introduction To Machine Learning Homework 2 (Due Friday, October 22, 11:59 pm) 1. (25 points) Let X = {x1, . . . , xn} be a set of n samples drawn i.i.d. from an univariate...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment