Midterm Page 1 Homework #4 (Problem Set) CSE 3521/5521 Autumn 2021 Due: Friday, November 12 by 11:59pm Worth 75 points I. Preparation 1. Read/study the assigned reading sections from AI: A Modern...

Can someone help me with this homework


Midterm Page 1 Homework #4 (Problem Set) CSE 3521/5521 Autumn 2021 Due: Friday, November 12 by 11:59pm Worth 75 points I. Preparation 1. Read/study the assigned reading sections from AI: A Modern Approach, 3rd edition (Russell and Norvig) 2. Read/study the assigned reading sections from PRML: Pattern Recognition and machine learning II. Collaboration & Piazza Please read the course policy on Academic Misconduct in the syllabus. You may discuss the problems with your classmates at a high level only. Though, you must formulate your own solution without any help from others or third party sources. In your solution, you need to list with whom you have discussed for each problem (please do so in the first page). Do not post any part of your solution or spoilers on Piazza. III. Guidelines You will place your solutions to this portion of the homework (Problem Set) into a single PDF file named HW4_problems_name_dotnumber.pdf (e.g., HW4_problems_shareef_1.pdf). It is highly suggested that you directly type your solutions and save them as a PDF file. If you write your solutions on paper and scan it you run the risk that the TA won’t be able to read your solution. In this case the TA will not give credit for any portion that is not readable. You should write down the derivation of your answers, and highlight your answers clearly! Page 2 IV. Problems Linear Regression (7 points) (i) In lecture we derived a closed form solution for linear regression: �̃� = (�̃��̃�?) −1 �̃�?, where �̃� = [�̃�1, … , �̃�?], �̃�? = [?? ? , 1]?, and ? = [?1, … , ??] ? where a potential problem is that �̃��̃�? may not be invertible. Consider a different minimization problem to find �̃�. Again, let ?(�̃�) = ∑ (�̃�?�̃�? − ??) 2 ? . Instead of minimizing ?(�̃�), we will minimize ?(�̃�) + ? × ‖ �̃�‖2 2 = ?(�̃�) + ? × �̃�?�̃�, where ? > 0. [8 points] Show that the solution �̃� which minimizes ?(�̃�) + ? × �̃�?�̃� is �̃� = (�̃��̃�? + ??) −1 �̃�?, where I is a (D + 1) – by - (D + 1) identity matrix and D is the dimensionality of ??. Write your derivation with no more than 8 lines. In machine learning, this solution is called rigid regression. Note that, now �̃��̃�? + ?? is invertible. Regression, Gauss-Newton, and gradient descent: Part-1 (24 points) Figure 1: (a) A dataset of one data instance; (b) the corresponding ?(??; ?) − ??; (c) the corresponding ?(?) = (?(??; ?) − ??) 2. Figure 1 (a) shows a dataset of just one data instance: (?1, ?1) = (2,2). Now we want to fit a nonlinear curve ?(?; ?) = ??? to the dataset, using the sum of square error (SSE): ?(?) = (?(??; ?) − ??) 2. Figures 1 (b) and (c) show the curve ?(??; ?) − ?? w.r.t. w and the curve ?(?) w.r.t. w, respectively. (ii) Gauss-Newton method (12 points) Use the Gauss-Newton method introduced in the lectures to find the solution that minimizes ?(?). Note that, the optimal solution is 0.347. Since the Gauss-Newton method is an iterative method, the solution depends on the number of iterations. Page 3 1. Begin with the initialization �̂�(0) = 1.5, perform three iterations of Gauss-Newton to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] 2. Begin with the initialization �̂�(0) = 0.0, perform three iterations of Gauss-Newton to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] 3. Begin with the initialization �̂�(0) = −1.0, perform three iterations of Gauss- Newton to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333. You should see that iterative methods are sensitive to the initialization. (iii) Gradient descent (12 points) Use the gradient descent (GD) method introduced in the lectures to find the solution ?∗ that minimizes ?(?). Since GD is an iterative method, the solution depends on the number of iterations. For our problem where w is one-dimensional, given the current guess �̂�(?), GD performs the following update: �̂�(?+1) ← �̂�(?) − ? ?? ?? (�̂�(?)) where ?? ?? (�̂�(?)) is the derivative computed at �̂�(?) and ? is the step size or learning rate. 1. Begin with the initialization �̂�(0) = 1.5, perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] 2. Begin with the initialization �̂�(0) = 0.0, perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] 3. Begin with the initialization �̂�(0) = 1.5, perform three iterations of GD (with ? = 0.001) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333. You should see that iterative methods are sensitive to the initialization. Page 4 Regression, Gauss-Newton, and gradient descent: Part-2 (44 points) Figure 2: (a) A dataset of one data instance; (b) the corresponding ?(??; ?) − ??; (c) the corresponding ?(?) = (?(??; ?) − ??) 2. Consider a simpler problem: to fit a linear line ?(?; ?) = ?? to the same data, again using the sum of square error (SSE): ?(?) = (?(??; ?) − ??) 2. Figure 2 (a) shows the same data as Figure 1, but Figure 2 (b) and (c) show the curve of ?(??; ?) − ?? w.r.t. w and the curve of ?(?) w.r.t. w, respectively, using ?(?; ?) = ??. (iv) Gauss-Newton method (10 points) Let us again use the Gauss-Newton method to find the solution ?∗ that minimizes ?(?) in Figure 2. Note that, the optimal solution is 1.000. 1. Begin with the initialization �̂�(0) = 1.5, perform just one iterations of Gauss- Newton to get �̂�(1). What is �̂�(1) (a numerical value)? [4 points] 2. Begin with the initialization �̂�(0) = 0.0, perform just one iterations of Gauss- Newton to get �̂�(1). What is �̂�(1) (a numerical value)? [4 points] Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333. Describe the difference and explain the reason if you see a difference from your observations in “Regression, Gauss-Newton, and gradient descent: Part-1” above using Gauss-Newton. Write “No difference” if you do not find any difference. [2 points] (v) Gradient descent (14 points) Let us again use the gradient descent (GD) method to find the solution ?∗ that minimizes ?(?) in Figure 2. For our problem where w is one-dimensional, given the current guess �̂�(?), GD performs the following update: �̂�(?+1) ← �̂�(?) − ? ?? ?? (�̂�(?)) Page 5 where ?? ?? (�̂�(?)) is the derivative computed at �̂�(?) and ? is the step size or learning rate. 1. Begin with the initialization �̂�(0) = 1.5, perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] 2. Begin with the initialization �̂�(0) = 0.0, perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] 3. Begin with the initialization �̂�(0) = 1.5, perform three iterations of GD (with ? = 1.0) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points] Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333. Can GD obtain the optimal solution in a few iterations? [2 points] Naive Bayes and MLE (20 points) Please write down the derivation of your answers, and highlight your answers clearly! Calculation: Please perform rounding to your results after the second decimal number, unless stated otherwise. For example, 1.245 becomes 1.25 and -1.228 becomes -1.23. Figure 3: A two-dimensional labeled dataset with 10 data instances. Figure 3 gives a labeled dataset {(??, ??)}?=1 ? of 10 data instances. Each ?? is two- dimensional: the first dimension ??[1] is binary (i.e., {0,1}); the second dimension ??[2] is a real number. The label ?? is binary (i.e., either class 0 or class 1). Page 6 Denote by X the two-dimensional random variable of data instances and Y the binary random variable of class labels, you are to construct the Naive Bayes classifier to predict the class label �̂� of a future data instance ? = ? �̂� = ??? max ?∈{0,1} ?(? = ?|? = ?) = ??? max ?∈{0,1} ?(? = ?) × ∏ ?(?[?] = ?[?]|? = ?) 2 ?=1 You will begin by estimating the parameters of  ?(?)  ?(?[?]|? = ?) ∀? ∈ {0, 1}, ? ∈ {1, 2} from the labeled dataset, using Maximum Likelihood Estimation (MLE). Note that, ?(?) is a Bernoulli distribution; ?(?[1]|? = ?) is a Bernoulli distribution of ?[1]; ?(?[2]|? = ?) is a Gaussian distribution of ?[2]. (vi) Prior distributions [2 points] What is ?(? = 1)? Write your derivation/calculation. The answer is a real number. (vii) Conditional distributions [4 points] What is ?(?[1] = 1|? = 0)? Write your derivation/calculation.
Nov 08, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here