Homework 6 ESE 402/542 Due December 1, 2019 at 11:59pm Type or scan your answers as a single PDF file and submit on Canvas. Problem 1. Principal Component Analysis. Consider the following dataset: x y...

1 answer below »
Homework assignment with 4 problems attached


Homework 6 ESE 402/542 Due December 1, 2019 at 11:59pm Type or scan your answers as a single PDF file and submit on Canvas. Problem 1. Principal Component Analysis. Consider the following dataset: x y 0 1 1 1 2 1 2 3 3 2 3 3 4 5 (a) Standardize the data and derive the two principal components in sorted order. What is the new transformed dataset using the first principal component? (b) Repeat the previous analysis but this time do not standardize the original data. Is Principal Component Analysis scale invariant? Problem 2. k-means is sub-optimal. Recall that in class we defined the k-means problem as the task of minimizing the k-means objective: min c1,c2,··· ,ck n∑ i=1 ||xi − c(xi)||22, (1) where c(xi) is the closest center to xi. In this problem, we aim to show that the k-means algorithm does not always find the best solution of the above problem. For every number t > 1, show that there exists an instance of the k-means problem for which the k-means algorithm (might) find a solution whose objective value is at least t×OPT, where OPT is the minimum k-means objective. In other words, you should find a set of points x1, · · · , xn for which the k-means algorithm may (if initialized badly) output a set of centers whose objective value is at least a factor t of the optimal value of problem (1). 1 Problem 3. Polynomial Regression. Load the dataset poly data.csv. The first column is a vector of predictors x and the second column is a vector of responses y. Suppose we believe it was generated by some polynomial of the predictors with Gaussian error, and we would like to recover the true coefficients of the underlying process. A polynomial regression can be estimated by including all powers of x as predictors in the model. For example, to estimate a quadratic regression, we include the predictors x and x2 as well as t the intercept. (a) Pick a set of polynomial models. Compute the k-fold cross validation error with respect to mean squared error for each of these models. Report the value of k that you use and plot the cross-validation error as a function of polynomial power. (b) Choose a model from your initial set, and re-run on the entire data set. Report the coefficients and make a scatter plot of x and y with your fitted polynomial. Justify your selection. Problem 4. Extra Credit Load the Labeled Faces in the Wild dataset from sklearn. You can load this data as follows: from sklearn.datasets import fetch lfw people faces = fetch lfw people(min faces per person=60) For this exercise, we will use PCA on image data, in particular pictures of faces, to ex- tract features. (a) Perform PCA on the dataset to find the first 150 components. Since this is a large dataset, you should use randomized PCA instead, which can also be found on sklearn. Show the eigenfaces associated with the first 1 through 25 principal components. (b) Using the first 150 components you found, reconstruct a few faces of your choice and compare them with the original input images. 2
Answered Same DayNov 25, 2021

Answer To: Homework 6 ESE 402/542 Due December 1, 2019 at 11:59pm Type or scan your answers as a single PDF...

Ximi answered on Dec 02 2021
145 Votes
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Problem 1\n",
"data = [\n",
" [0, 1],\n",
" [1, 1],\n",
" [2, 1],\n",
" [2, 3],\n",
" [3, 2],\n",
" [3, 3],\n",
" [4, 5],\n",
"]"
]
},
{
"cell_type": "code",
"execution_coun
t": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Standardizing the data as per requirement\n",
"scaled_data = StandardScaler().fit_transform(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca = PCA(n_components=2)\n",
"pca.fit(scaled_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca.explained_variance_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Explained variance\n",
"pca.explained_variance_ratio_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Transformed data \n",
"pca.transform(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Applying on non-standardized data \n",
"pca = PCA(n_components=2)\n",
"pca.fit(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca.explained_variance_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pca.explained_variance_ratio_"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Transformed data\n",
"pca.transform(data)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"n_digits: 10, \t n_samples 1797, \t n_features 64\n",
"__________________________________________________________________________________\n",
"init\t\ttime\tinertia\thomo\tcompl\tv-meas\tARI\tAMI\tsilhouette\n",
"k-means++\t0.17s\t69432\t0.602\t0.650\t0.625\t0.465\t0.621\t0.146\n",
"random \t0.16s\t69694\t0.669\t0.710\t0.689\t0.553\t0.686\t0.147\n",
"PCA-based\t0.03s\t70804\t0.671\t0.698\t0.684\t0.561\t0.681\t0.118\n",
"__________________________________________________________________________________\n"
]
},
{
"data": {
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Problem 2\n",
"from time import time\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from sklearn import metrics\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.datasets import load_digits\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.preprocessing import scale\n",
"\n",
"np.random.seed(42)\n",
"\n",
"digits = load_digits()\n",
"data = scale(digits.data)\n",
"\n",
"n_samples, n_features = data.shape\n",
"n_digits = len(np.unique(digits.target))\n",
"labels...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here