4 statistics problems with pythonHomework 5 ESE 402/542 Due on 11/20/2019 (For Problems 1 and 2, no...

Question

4 statistics problems with pythonHomework 5 ESE 402/542 Due on 11/20/2019 (For Problems 1 and 2, no other package except numpy and matplotlib should be used for the programming questions. For problem 3 you can use the packages of your choice.) Problem 1. (a) In this problem we will analyze logistic regression learned in class. Sigmoid function can be written as S(x) = 1 1+e−x • For a given variable X assume P (Y = +1|X) is modeled as P (Y = +1|X) = S(β0 + β1X). Plot a 3d figure showing the relation between output and variable β0 and β1 when X = 1. Take values between [-2, 2] for both β0 and β1 with a step size of 0.1 to plot the 3d plot (b) In class, we have done binary classification with labels Y = {0, 1}. In this problem, we will be using the labels as Y = {−1, 1} as it will be easier to derive the likelihood of the P (Y |X). • Show that if Y ∈ {−1, 1} the probability of Y given X can be written as (not programming) P (Y |X) = 1 1 + e−y(β0+β1x) • We have learned that the coefficients β0 and β1 can be found using MLE estimates. Show that the Log Likelihood function for m data points can be written as(Not Programming) lnL(β0, β1) = − m∑ i=1 ln ( 1 + e−yi(β0+β1xi) ) • Plot a 3d figure showing the relation between Log Likelihood function and vari- able β0 , β1 when X = 1, Y = -1 and X = 1, Y = 1. Take values between [-2, 2] for both β0 and β1 with a step size of 0.1 to plot the 3d plot. • Based on the graph, is it possible to maximize this function? 1 Problem 2. 1. While we can formalize the Likelihood Function there is no close form expression for the coefficients β0, β1 maximizing the above log-likelihood in Problem 1. Hence, we will use an iterative algorithm to solve for the coefficients. We can see that max(− m∑ i=1 ln ( 1 + e−yi(β0+β1xi) ) ) = min( m∑ i=1 ln ( 1 + e−yi(β0+β1xi) ) ) We will describe our function loss as L = 1 m ∑m i=1 ln ( 1 + e−yi(β0+β1xi) ) . Our objective is to iteratively decrease this loss as we keep computing the optimal coefficients. Here xi ∈ R In this problem we will be working with real image data where the goal is to clas- sify if the image is 0 or 1 using logistic regression. The input X ∈ R m x d, is a matrix with dimensions [m x d], where a single data point xi ∈ Rd with d = 784. The labels matrix Y ∈ Rm, where each label yi ∈ {0, 1} • Load the data into the memory and visualize one input as an image for each of label 0 and label 1. (The data should be reshaped back to [28 x 28] to be able to visualize it.) • The data is in between 0 to 255. Normalise the data to 0 and 1 • Set yi = 1 for images labeled 0 and yi = -1 for images labeled 1. Split the data randomly into train and test with a ratio of 80:20. Why is random splitting better than sequential splitting in our case? • Initialize the coefficients using a univariate “normal” (Gaussian) distribution of mean 0 and variance 1. (Remember that coefficients are a vector of [β0, β1...βd], where d is the dimension of the input) • Compute the loss using the above mentioned Loss L. (The loss can be written as L = 1 m ∑m i=1 ln ( 1 + e−yi(β0+ ∑d−1 j=0 β(j+1)·xi,j) ) , where (i, j) represent the ith data point, where i ∈ {1, 2, ..,m} and jth dimension of the data point xi for j ∈ {0, ...d− 1}) • To minimize the loss function a widely known algorithm is going in the direction opposite to the gradients of the loss function. (It’s helpful to write the coefficients [β1, ..., βd] as a vector β, and β0 as a scalar. Now β ∈ Rd and β0 ∈ R) We can write the gradients of loss function as a matrix operation ∂L ∂β0 = − 1 m m∑ i=1 e−yi·(β0+β·x T i ) 1 + e−(yi·(β0+β·x T i )) yi = dβ0 ∂L ∂β = − 1 m m∑ i=1 e−yi·(β0+β·x T i ) 1 + e−(yi·(β0+β·x T i )) yixi = dβ 2 Write a function to compute the gradients • Update the parameters as β = β − 0.05 ∗ dβ β0 = β0 − 0.05 ∗ dβ0 (Gradient updates should be computed based on the train set) • Repeat the process for 50 iterations and report the loss after the 50th epoch. • Plot the loss for each iteration for the train and test sets • Logistic regression is a classification problem. We classify as +1 if P (Y = 1|X) ≥ 0.5. Derive the classification rule for the threshold 0.5. (Not a programming question) • For the classification rule derived compute the accuracy on the test set for each iteration and plot the accuracy The final code should be along this format import numpy as np from matplotlib import pyplot as plt def compute_loss(data, labels, B, B_0): return logloss def compute_gradients(data, labels, B, B_0): return dB, dB_0 if __name__ == '__main__': x = np.load(data) y = np.load(label) ## Split the data to train and test x_train, y_train, x_test, y_test = #split_data B = np.random.randn(1, x.shape[1]) B_0 = np.random.randn(1) lr = 0.05 for _ in range(50): ## Compute Loss loss = compute_loss(x_train, y_train, B, B_0) 3 ## Compute Gradients dB, dB_0 = compute_gradients(x_train, y_train, B, B_0) ## Update Parameters B = B - lr*dB B_0 = B_0 - lr*dB_0 ##Compute Accuracy and Loss on Test set (x_test, y_test) accuracy_test = loss_test = ##Plot Loss and Accuracy Make sure to vectorize the code. Ideally 50 iterations should run in 10 seconds or less. If possible avoid using for loops, except for the 50 iterations of gradient updates given in the sample code 4 Problem 3. Recall that in classification we assume that each data point is an i.i.d. sample from a(n unknown) distribution P (X = x, Y = y). In this question, we are going to design the data distribution P and evaluate the performance of logistic regression on data generated using P . Keep in mind that we would like to make P as simple as we could. In the following, we assume x ∈ R and y ∈ {0, 1}, i.e. the data is one-dimensional and the label is binary. Write P (X = x, Y = y) = P (X = x)P (Y = y|X = x). We will generate X = x according to the uniform distribution on the interval [0, 1] (thus P (X = x) is just the pdf of the uniform distribution). 1. Design P (Y = y|X = x) such that (i) P (y = 0) = P (y = 1) = 0.5; and (ii) the classification accuracy of any classifier is at most 0.9; and (iii) the accuracy of the Bayes optimal possible classifier is at least 0.8. 2. Using Python, generate n = 100 training data points according to the distribution you designed above and train a binary classifier using logistic regression on training data. 3. Generate and n = 100 test data points according to the distribution you designed in part 1 and compute the prediction accuracy (on the test data) of the classifier that you designed in part 2. Also, compute the accuracy of the Bayes optimal classifier on the test data. Why do you think Bayes optimal classifier is performing better? 4. Redo parts 2,3 with n = 1000. Are the results any different than part 3? Why? 5 Problem 4. K-means clustering can be viewed as an optimization problem that attempts to minimize some objective function. For the given objectives, determine the update rule for the centroid, ck of the k-th cluster Ck . In other word, find the optimal ck that minimizes the objective function. The data x contains p features. 1. Show that setting the objective to the sum of the squared Euclidean distances of points from the center of their clusters, K∑ k=1 ∑ x∈Ck p∑ i=1 (cki − xi)2 results in an update rule where the optimal centroid is the mean of the points in the cluster. 2. Show that setting the objective to the sum of the Manhattan distances of points from the center of their clusters, K∑ k=1 ∑ x∈Ck p∑ i=1 |cki − xi| results in an update rule where the optimal centroid is the median of the points in the cluster. 6

Kshitij · Accepted Answer

Annotation 2019-11-13 021225.jpg
Annotation 2019-11-13 031900.jpg
Problem-1.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 1 (a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np
",
    "from mpl_toolkits.mplot3d import axes3d # for 3-d plots
",
    "import matplotlib.pyplot as plt
",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Defining the sigmoid function
",
    "def sigmoid(f):
",
    "    return 1/(1 + np.exp(-f))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# P(Y = +1|X) = S(beta_0 + beta_1.X)
",
    "# We need to take X = 1 and beta_0 and beta_1 both between [-2, 2] with step size of 0.1 to plot the 3d plot
",
    "
",
    "# Initializing X
",
    "X = 1
",
    "
",
    "# Initializing beta_0 and beta_1, -2 to 2.1 because np.arange exclude the latter/stoping range
",
    "# np.round will round off your created array to 2 values
",
    "beta_0 = np.round(np.arange(-2, 2.1, 0.1), 2)
",
    "beta_1 = np.round(np.arange(-2, 2.1, 0.1), 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# np.meshgrid - Return coordinate matrices from coordinate vectors.
",
    "# Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, 
",
    "# given one-dimensional coordinate arrays x1, x2,..., xn.
",
    "x, y = np.meshgrid(beta_0, beta_1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-2.  -2. ]
",
      " [-1.9 -2. ]
",
      " [-1.8 -2. ]
",
      " [-1.7 -2. ]
",
      " [-1.6 -2. ]
",
      " [-1.5 -2. ]
",
      " [-1.4 -2. ]
",
      " [-1.3 -2. ]
",
      " [-1.2 -2. ]
",
      " [-1.1 -2. ]
",
      " [-1.  -2. ]
",
      " [-0.9 -2. ]
",
      " [-0.8 -2. ]
",
      " [-0.7 -2. ]
",
      " [-0.6 -2. ]
",
      " [-0.5 -2. ]
",
      " [-0.4 -2. ]
",
      " [-0.3 -2. ]
",
      " [-0.2 -2. ]
",
      " [-0.1 -2. ]
",
      " [ 0.  -2. ]
",
      " [ 0.1 -2. ]
",
      " [ 0.2 -2. ]
",
      " [ 0.3 -2. ]
",
      " [ 0.4 -2. ]
",
      " [ 0.5 -2. ]
",
      " [ 0.6 -2. ]
",
      " [ 0.7 -2. ]
",
      " [ 0.8 -2. ]
",
      " [ 0.9 -2. ]
",
      " [ 1.  -2. ]
",
      " [ 1.1 -2. ]
",
      " [ 1.2 -2. ]
",
      " [ 1.3 -2. ]
",
      " [ 1.4 -2. ]
",
      " [ 1.5 -2. ]
",
      " [ 1.6 -2. ]
",
      " [ 1.7 -2. ]
",
      " [ 1.8 -2. ]
",
      " [ 1.9 -2. ]
",
      " [ 2.  -2. ]
",
      " [-2.  -1.9]
",
      " [-1.9 -1.9]
",
      " [-1.8 -1.9]
",
      " [-1.7 -1.9]
",
      " [-1.6 -1.9]
",
      " [-1.5 -1.9]
",
      " [-1.4 -1.9]
",
      " [-1.3 -1.9]
",
      " [-1.2 -1.9]
",
      " [-1.1 -1.9]
",
      " [-1.  -1.9]
",
      " [-0.9 -1.9]
",
      " [-0.8 -1.9]
",
      " [-0.7 -1.9]
",
      " [-0.6 -1.9]
",
      " [-0.5 -1.9]
",
      " [-0.4 -1.9]
",
      " [-0.3 -1.9]
",
      " [-0.2 -1.9]]
"
     ]
    }
   ],
   "source": [
    "# This is what we need and we got it.
",
    "xy = np.array([x.flatten(), y.flatten()]).T
",
    "print(xy[:60])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1681, 2)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xy.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we have to plot the above with sigmoid as our output, let output be z
",
    "# z = sigmoid(beta_0 + beta_1)   # this also works for this case, as X = 1
",
    "z = sigmoid(x + np.dot(y, X))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0.01798621, 0.01984031, 0.02188127, ..., 0.450166  , 0.47502081,
",
       "        0.5       ],
",
       "       [0.01984031, 0.02188127, 0.02412702, ..., 0.47502081, 0.5       ,
",
       "        0.52497919],
",
       "       [0.02188127, 0.02412702, 0.02659699, ..., 0.5       , 0.52497919,
",
       "        0.549834  ],
",
       "       ...,
",
       "       [0.450166  , 0.47502081, 0.5       , ..., 0.97340301, 0.97587298,
",
       "        0.97811873],
",
       "       [0.47502081, 0.5       , 0.52497919, ..., 0.97587298, 0.97811873,
",
       "        0.98015969],
",
       "       [0.5       , 0.52497919, 0.549834  , ..., 0.97811873, 0.98015969,
",
       "        0.98201379]])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "z"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png":

Homework 5 ESE 402/542 Due on 11/20/2019 (For Problems 1 and 2, no other package except numpy and matplotlib should be used for the programming questions. For problem 3 you can use the packages of...

Answer To: Homework 5 ESE 402/542 Due on 11/20/2019 (For Problems 1 and 2, no other package except numpy and...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment