Can someone help me with this homeworkMidterm Page 1 Homework #4 (Problem Set) CSE 3521/5521...

Question

Can someone help me with this homeworkMidterm    Page 1     Homework #4 (Problem Set)  CSE 3521/5521  Autumn 2021   Due: Friday, November 12 by 11:59pm  Worth 75 points  I. Preparation  1. Read/study the assigned reading sections from AI: A Modern Approach, 3rd edition  (Russell and Norvig)  2. Read/study the assigned reading sections from PRML: Pattern Recognition and  machine learning  II. Collaboration & Piazza  Please read the course policy on Academic Misconduct in the syllabus. You may discuss the  problems with your classmates at a high level only. Though, you must formulate your own  solution without any help from others or third party sources.   In your solution, you need to list with whom you have discussed for each problem (please  do so in the first page).  Do not post any part of your solution or spoilers on Piazza.  III. Guidelines  You will place your solutions to this portion of the homework (Problem Set) into a single PDF  file named HW4_problems_name_dotnumber.pdf (e.g., HW4_problems_shareef_1.pdf).  It is highly suggested that you directly type your solutions and save them as a PDF file. If you  write your solutions on paper and scan it you run the risk that the TA won’t be able to read  your solution. In this case the TA will not give credit for any portion that is not readable.  You should write down the derivation of your answers, and highlight your answers clearly!    Page 2       IV. Problems  Linear Regression (7 points)  (i) In lecture we derived a closed form solution for linear regression:   �̃� = (�̃��̃�?) −1 �̃�?, where �̃� = [�̃�1, … , �̃�?], �̃�? = [?? ? , 1]?, and ? = [?1, … , ??] ?  where a potential problem is that �̃��̃�? may not be invertible.  Consider a different minimization problem to find �̃�. Again, let ?(�̃�) = ∑ (�̃�?�̃�? − ??) 2 ? .  Instead of minimizing ?(�̃�), we will minimize ?(�̃�) + ? × ‖ �̃�‖2 2 = ?(�̃�) + ? × �̃�?�̃�,  where ? > 0.  [8 points] Show that the solution �̃� which minimizes ?(�̃�) + ? × �̃�?�̃� is �̃� = (�̃��̃�? + ??) −1 �̃�?, where I is a (D + 1) – by - (D + 1) identity matrix and D is the  dimensionality of ??. Write your derivation with no more than 8 lines.  In machine learning, this solution is called rigid regression. Note that, now �̃��̃�? + ?? is  invertible.  Regression, Gauss-Newton, and gradient descent: Part-1 (24 points) Figure 1: (a) A dataset of one data instance; (b) the corresponding ?(??; ?) − ??; (c) the  corresponding ?(?) = (?(??; ?) − ??) 2.  Figure 1 (a) shows a dataset of just one data instance: (?1, ?1) = (2,2). Now we want to fit  a nonlinear curve ?(?; ?) = ??? to the dataset, using the sum of square error (SSE):  ?(?) = (?(??; ?) − ??) 2. Figures 1 (b) and (c) show the curve ?(??; ?) − ?? w.r.t. w and  the curve ?(?) w.r.t. w, respectively.  (ii) Gauss-Newton method (12 points)  Use the Gauss-Newton method introduced in the lectures to find the solution that  minimizes ?(?). Note that, the optimal solution is 0.347. Since the Gauss-Newton method  is an iterative method, the solution depends on the number of iterations.     Page 3       1. Begin with the initialization �̂�(0) = 1.5,  perform three iterations of Gauss-Newton  to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  2. Begin with the initialization �̂�(0) = 0.0,  perform three iterations of Gauss-Newton  to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  3. Begin with the initialization �̂�(0) = −1.0,  perform three iterations of Gauss- Newton to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your  answer. Round your answer to the third decimal.  For example, 1.333333 becomes  1.333; -1.333333 becomes -1.333.  You should see that iterative methods are sensitive to the initialization.  (iii) Gradient descent (12 points)  Use the gradient descent (GD) method introduced in the lectures to find the solution ?∗  that minimizes ?(?). Since GD is an iterative method, the solution depends on the number  of iterations.  For our problem where w is one-dimensional, given the current guess �̂�(?), GD performs  the following update:  �̂�(?+1) ← �̂�(?) − ? ?? ?? (�̂�(?))  where  ?? ?? (�̂�(?)) is the derivative computed at �̂�(?) and ? is the step size or learning rate.  1. Begin with the initialization �̂�(0) = 1.5,  perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  2. Begin with the initialization �̂�(0) = 0.0,  perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  3. Begin with the initialization �̂�(0) = 1.5,  perform three iterations of GD (with ? = 0.001) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your  answer. Round your answer to the third decimal.  For example, 1.333333 becomes  1.333; -1.333333 becomes -1.333.  You should see that iterative methods are sensitive to the initialization.    Page 4       Regression, Gauss-Newton, and gradient descent: Part-2 (44 points) Figure 2: (a) A dataset of one data instance; (b) the corresponding ?(??; ?) − ??; (c) the  corresponding ?(?) = (?(??; ?) − ??) 2.  Consider a simpler problem: to fit a linear line ?(?; ?) = ?? to the same data, again using  the sum of square error (SSE): ?(?) = (?(??; ?) − ??) 2. Figure 2 (a) shows the same data  as Figure 1, but Figure 2 (b) and (c) show the curve of ?(??; ?) − ?? w.r.t. w and the curve  of ?(?) w.r.t. w, respectively, using ?(?; ?) = ??.  (iv) Gauss-Newton method (10 points)  Let us again use the Gauss-Newton method to find the solution ?∗ that minimizes ?(?) in  Figure 2. Note that, the optimal solution is 1.000.  1. Begin with the initialization �̂�(0) = 1.5,  perform just one iterations of Gauss- Newton to get �̂�(1). What is �̂�(1) (a numerical value)? [4 points]  2. Begin with the initialization �̂�(0) = 0.0,  perform just one iterations of Gauss- Newton to get �̂�(1). What is �̂�(1) (a numerical value)? [4 points]  Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your  answer. Round your answer to the third decimal.  For example, 1.333333 becomes  1.333; -1.333333 becomes -1.333.  Describe the difference and explain the reason if you see a difference from your  observations in “Regression, Gauss-Newton, and gradient descent: Part-1” above using  Gauss-Newton. Write “No difference” if you do not find any difference. [2 points]  (v) Gradient descent (14 points)  Let us again use the gradient descent (GD) method to find the solution ?∗ that minimizes  ?(?) in Figure 2. For our problem where w is one-dimensional, given the current guess  �̂�(?), GD performs the following update:  �̂�(?+1) ← �̂�(?) − ? ?? ?? (�̂�(?))     Page 5       where  ?? ?? (�̂�(?)) is the derivative computed at �̂�(?) and ? is the step size or learning rate.  1. Begin with the initialization �̂�(0) = 1.5,  perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  2. Begin with the initialization �̂�(0) = 0.0,  perform three iterations of GD (with ? = 0.1) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  3. Begin with the initialization �̂�(0) = 1.5,  perform three iterations of GD (with ? = 1.0) to get �̂�(3). What is �̂�(3) (a numerical value)? [4 points]  Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your  answer. Round your answer to the third decimal.  For example, 1.333333 becomes  1.333; -1.333333 becomes -1.333.  Can GD obtain the optimal solution in a few iterations? [2 points]  Naive Bayes and MLE (20 points)  Please write down the derivation of your answers, and highlight your answers clearly!  Calculation: Please perform rounding to your results after the second decimal  number, unless stated otherwise. For example, 1.245 becomes 1.25 and -1.228 becomes  -1.23. Figure 3: A two-dimensional labeled dataset with 10 data instances.  Figure 3 gives a labeled dataset {(??, ??)}?=1 ?  of 10 data instances. Each ?? is two- dimensional: the first dimension ??[1] is binary (i.e., {0,1}); the second dimension ??[2] is  a real number. The label ?? is binary (i.e., either class 0 or class 1).    Page 6       Denote by X the two-dimensional random variable of data instances and Y the binary  random variable of class labels, you are to construct the Naive Bayes classifier to predict  the class label �̂� of a future data instance ? = ?  �̂�  = ??? max ?∈{0,1} ?(? = ?|? = ?)  =  ??? max ?∈{0,1} ?(? = ?) × ∏ ?(?[?] = ?[?]|? = ?) 2 ?=1   You will begin by estimating the parameters of   ?(?)   ?(?[?]|? = ?) ∀? ∈ {0, 1}, ? ∈ {1, 2}  from the labeled dataset, using Maximum Likelihood Estimation (MLE). Note that, ?(?)  is a Bernoulli distribution; ?(?[1]|? = ?) is a Bernoulli distribution of ?[1];  ?(?[2]|? = ?) is a Gaussian distribution of ?[2].  (vi) Prior distributions [2 points]  What is ?(? = 1)? Write your derivation/calculation.  The answer is a real number.  (vii) Conditional distributions [4 points]  What is ?(?[1] = 1|? = 0)? Write your derivation/calculation.

Midterm Page 1 Homework #4 (Problem Set) CSE 3521/5521 Autumn 2021 Due: Friday, November 12 by 11:59pm Worth 75 points I. Preparation 1. Read/study the assigned reading sections from AI: A Modern...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment