CS 615 - Deep Learning Assignment 3 - Auto-Encoders and Multi-Layer Perceptrons Winter 2021 Introduction In this assignment we’ll implement an auto-denoiser and a multi-layer artificial neural network...

1 answer below »
need to complete part 1 theory questions and choose either NO.2 or NO.3 programming assignment


CS 615 - Deep Learning Assignment 3 - Auto-Encoders and Multi-Layer Perceptrons Winter 2021 Introduction In this assignment we’ll implement an auto-denoiser and a multi-layer artificial neural network for the task of image recognition. Programming Language/Environment While you may work in a language of your choosing, if you work in any languages other than Matlab, you must make sure you code can compile (if applicable) and run on tux. If you need a package installed on tux reach out to the professor so that he can initialize the request. Allowable Libraries/Functions In addition, you cannot use any libraries to do the training or evaluation for you. Using basic statistical and linear algebra function like mean, std, cov etc.. is fine, but using ones like train, confusion, etc.. is not. Using any ML-related functions, may result in a zero for the programming component. In general, use the “spirit of the assignment” (where we’re implementing things from scratch) as your guide, but if you want clarification on if can use a particular function, DM the professor on slack. Grading Part 1 (Theory Questions) 20pts Part 2 (Programming) 80pts Part 3 (E.C) 20pts TOTAL 120pts 1 Datasets Yale Faces Datasaet This dataset consists of 154 images (each of which is 243x320 pixels) taken from 14 people at 11 different viewing conditions (for our purposes, the first person was removed from the official dataset so person ID=2 is the first person). The filename of each images encode class information: subject< id="">.< condition=""> Data obtained from: http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html 2 1 Theory 1. One common activation function that we didn’t explicitly compute the partial of is the hyper- bolic tangent function, tanh, which is defined as: tanh(z) = ez − e−z ez + e−z (1) What is the gradient of the output of this activation function with regards to its input? Show your work in coming up with this gradient (20pts). 3 For our final assignment, you may choose to do just ONE of the following programming components. If you do both correctly, you will receive 20 extra points on this assignment. 4 2 Auto-Denoiser We discussed in class that an auto-encoder can be used for denoising! In this part we’ll take the Yale Faces dataset, apply known noise it, train a (relatively) simple auto-denoiser to learn the parameters to denoise a training set, then evaluate (both objectively, and subjectively) how this performs in denoising new image. First download and extract the dataset yalefaces.zip from Blackboard. This dataset has 154 images (N = 154) each of which is a 243x320 image (D = 77760). In order to process this data your script will need to: 1. Read in the list of files 2. Create a 154x1600 data matrix such that for each image file (a) Read in the image as a 2D array (234x320 pixels) (b) Subsample/resize the image to become a 40x40 pixel image (for processing speed). I suggest you use your image processing library to do this for you. (c) Flatten the image to a 1D array (1x1600) (d) Concatenate this as a row of your data matrix. Write a script that: 1. Divides the data by 255 so that it consists of doubles in the range of [0, 1]. 2. Generates random noise to add to the data. This should be uniform random noise on the range of ±0.1. Add this to your data, then clamp any values less than zero to zero, and any numbers greater than one to one. 3. Shuffles the observations (i.e the rows) of the data randomly, separating about 2/3 for training and the remaining for validation. 4. Train an auto-decoder. Since our outputs are continuous (kind-of), we’ll use a multi-output squared error objective function. The core architecture will have two fully connected layers (one to encode/compress, one to decode). You can play with the amount of compression and/or any activation functions. 5. Once your system is trained, apply it to your validation data. What you will need for your report • Your architecture, design, preprocessing, and hyperparameter decisions. • A plot of epoch vs J showing the training process. • Your final training and validation RMSE values. • A sample of an original training image, it’s noised version, and its denoised version. • A sample of an original validation image, it’s noised version, and its denoised version. 5 Notes/Hints • It is suggested that you save this data so that you don’t need to read the images in every time you run your script. Matlab has a way to save variables to a .mat file. I’m not sure if python has a similar ability, but you could always write out the data matrices to a CSV file after reducing their size according to the assignment description. • My toy solution give decent results with a logistic activation function slapped on the end to clamp the pixel values to [0, 1]. • I also found that I got better results when using the Adam algorithms explored in HW2 since otherwise I found myself in either local extremas or saddle points. 6 3 Multi-Layer Percepron In this programming part, you’ll implement a multi-layer perceptron. First, extract and pre-process your data the same way as in the previous part (of course, now you won’t be adding noise). Now, given a list of fully connected layer outputs, sizes, you should train and validate an MLP for the purpose of identifying the image subjects (each person in the dataset is a class). For our purposes, the architecture will be: 1. The input Layer 2. L sequences of: (a) Fully connected layer, whose input is based on the prior layer, and whose output is specified in sizes parameter. (b) An activation function. 3. The output layer with an appropriate objective function. You may choose the objective function and activation functions to your liking, however the activation functions must not be the identity function. You should run your code for three different values of the sizes list, one with just one size (this implies two fully connected layers), one with two, and one with more than two. For instance, if I provide the vector sizes = [100, 30, 20], the weights of my MLP’s first fully-connected layer will be of size D× 100, the second one would be of size 100× 30, the third one would be of size 30× 20 and the final one would be of size 20 ×K, where K is the number of classes. What you will need for your report 1. Your chosen architecture and the values of the hyperparameters that you chose. 2. A table of network configurations and their associated number of iterations till termination, training and validation accuracies. 7 Submission For your submission, upload to Blackboard a single zip file containing: 1. PDF Writeup 2. Source Code 3. readme.txt file The readme.txt file should contain information on how to run your code to reproduce results for each part of the assignment. The PDF document should contain the following: 1. Part 1: (a) Your solutions to the theory question 2. Part 2: (a) Your architecture, design, preprocessing, and hyperparameter descisions. (b) A plot of epoch vs. J (c) Final validation accuracy (d) A sample of an original training image, it’s noised version, and its denoised version. (e) A sample of an original validation image, it’s noised version, and its denoised version. 3. Part 3: (a) Your architecture, design, preprocessing, and hyperparameter descisions. (b) Table of results for different network configurations. 8
Answered 1 days AfterFeb 24, 2021

Answer To: CS 615 - Deep Learning Assignment 3 - Auto-Encoders and Multi-Layer Perceptrons Winter 2021...

Pulkit answered on Feb 26 2021
134 Votes
Solution/helper.pyimport cv2
import numpy as np
import os
from skimage import io
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
OPENCV_HAAR_CSC_PATH = "C:\\Users\\Pulkit\\AppData\\Local\\Continuum\\Anaconda3\\pkgs\\opencv-3.3.0-py36_200\\Library\\etc\\haarcascades\\haarcascade_frontalface_default.xml"
def create_data_set(x_crop= 150, y_crop=150, train_size=.8):
""" Load the Yale Faces data set, extract the faces on the images and generate labels for each image.

Returns: Train and validation samples with their labels. The training samples are flattened arrays
of size 22500 (150 * 150) , t
he labels are one-hot-encoded values for each category
"""
images_path = [ os.path.join("yalefaces", item) for item in os.listdir("yalefaces") ]
image_data = []
image_labels = []

for i,im_path in enumerate(images_path):
im = io.imread(im_path,as_grey=True)
# if( i== 10) or (i==40) or (i==50):
# io.imshow(im)
# io.show()
image_data.append(np.array(im, dtype='uint8'))



label = int(os.path.split(im_path)[1].split(".")[0].replace("subject", "")) -1


image_labels.append(label)
faceDetectClassifier = cv2.CascadeClassifier(OPENCV_HAAR_CSC_PATH)

cropped_faces = []
for im in image_data:
facePoints = faceDetectClassifier.detectMultiScale(im)
x,y = facePoints[0][:2]
cropped = im[y: y + y_crop, x: x + x_crop]
cropped_faces.append(cropped/255)

X_ = np.array(cropped_faces).astype('float32')
enc = LabelEncoder()
y_ = enc.fit_transform(np.array(image_labels))
y_ = np_utils.to_categorical(y_)
X_train, X_test, y_train, y_test = train_test_split(X_, y_, train_size=train_size, random_state = 22)
return (X_train).reshape((X_train.shape[0],X_train.shape[1]*X_train.shape[2])), (X_test).reshape((X_test.shape[0],X_test.shape[1]*X_test.shape[2])), y_train, y_test



Solution/network.pyimport tensorflow as tf
from datetime import datetime
import os
import numpy as np
class FaceDetector(object):

def __init__(self, dropout=.2, epochs=3, batch_size=5, learning_rate = .00001):

self.layer = Layers()
self.learning_rate = learning_rate
self.dropout = dropout
self.epochs = epochs
self.batch_size = batch_size

def neural_net(self,x):
"""
Multi-layer perceptron.
Returns a multi-layer-perceptron to use with tensorflow

Positional arguments:

x -- tensorflow place holder for input data

"""
# Hidden fully connected layer with 512 neurons
layer_1 = tf.layers.dense(x, 512)
# Hidden fully connected layer with 512 neurons
layer_2 = tf.layers.dense(layer_1, 512)
# Output fully connected layer with a neuron for each class
out_layer = tf.layers.dense(layer_2, self.num_classes)

return out_layer

def build_model(self, input_size, output_size):

"""
Build a tensorflow model for multi-label classification

Positional arguments:

input_size -- dimension of the input samples

output_size -- dimension of the labels


"""
input_x = tf.placeholder(tf.float32, [None, input_size], name="input_x")
input_y = tf.placeholder(tf.int32, [None, output_size], name="input_y")

y_pred= self.neural_net(input_x)

with tf.name_scope('cross_entropy'):

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=input_y,logits=y_pred))

optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate)
train = optimizer.minimize(cross_entropy)


return train, cross_entropy, input_x, input_y, y_pred

def fit(self, X_train, y_train, X_valid=None, y_valid=None):
"""
Fit a tensorflow model

Positional arguments:

X_train -- numpy ndarray of training input

y_train -- one-hot-encoded training labels

Keyword arguments:

X_valid -- numpy ndarray of validation input

y_valid -- one-hot-encoded validation labels



"""
self.num_classes = y_train.shape[1]

print(y_train.shape)
train, cross_entropy, input_x, input_y , y_pred= self.build_model(X_train.shape[1], self.num_classes )
init = tf.global_variables_initializer()
steps = X_train.shape[0]
with tf.Session() as sess:


saver = tf.train.Saver()
sess.run(init)
total_parameters = 0
for variable in tf.trainable_variables():
# shape is an array of tf.Dimension
print("")
shape = variable.get_shape()
print(shape)
print(len(shape))
variable_parameters = 1
for dim in shape:
print(dim)
variable_parameters *= dim.value
print(variable_parameters)
total_parameters += variable_parameters
print("")
print("total_parameters",total_parameters)
while True:
for i in range(0, steps, self.batch_size):

x_batch_train = X_train[i: i+self.batch_size]
y_batch_train = y_train[i: i+self.batch_size]

tr, ce, pr =sess.run([train,cross_entropy, y_pred],feed_dict={input_x:x_batch_train, input_y:y_batch_train})
print("{} iterations: {} loss: {} ".format(str(datetime.now()),i, ce))

if X_valid is not None:

print("\n\nEvaluation...")
tr, ce =sess.run([train,cross_entropy],feed_dict={input_x:X_valid, input_y:y_valid})

print("{} iterations: {} loss: {} ".format(str(datetime.now()),i, ce))
print("\n")
self.epochs -= 1
if self.epochs ==0:

if not os.path.exists(os.path.join(os.getcwd(), 'saved_model')):
os.makedirs(os.path.join(os.getcwd(), 'saved_model'))
saver.save(sess, os.path.join(os.getcwd(), 'saved_model','my_test_model'))

break

def predict(self, X_valid, y_valid):
"""
Returns a dictionnary containing the model's predictino as well as the ground truth,
encoded as...
SOLUTION.PDF

Answer To This Question Is Available To Download

Submit New Assignment

Copy and Paste Your Assignment Here