Assignment 1: Fun with Basic C and Handwriting Recognition In this assignment you will review your C skills from A48 (loops, arrays, dynamic memory allocation, strings) and practice some of your newly...


Assignment 1: Fun with Basic C and Handwriting Recognition

In this assignment you will review your C skills from A48 (loops, arrays, dynamic memory allocation, strings) and practice some of your newly gained skilled (command line arguments, file IO).All assignments in this term's B09 will center around implementing solutions to image recognition problems, an important class of machine learning problems. Image recognitionis the ability of software to identify objects,places, people, writing and actions in images. It is the basic building block of many applications, ranging from photo apps like Google Lens or Face recognition to self-driving cars.The specific problem that we will focus on in our assignments is handwriting recognition. More specifically, you will develop programs that can read in images of hand-written digits and automatically determine which digit the image shows. The images below show some examples of input images and the labels that your program should generate.Screen Shot 2020-12-28 at 13.03.30.pngYou will accomplish this by using machine learning. Generally speaking, the goal of your machine learning algorithm is to use a set of sample images that are already correctly labelled (called thetraining set) and try to extract knowledge from it that it can use to later label new images (called thetest set) that it has not yet seen. In machine learning lingo, this is called aclassification problem: we are trying to assign an input item (an image in our case) to one of tenclasses(corresponding to the digits 1 to 10) based onfeaturesof the input items (the pixels of each image make up its key features).In A1 we will use a simple machine learning technique for classification calledk-nearest-neighbour (kNN).The basic idea is to label a new image by finding thekmost similar images in the training set and assign the new image the label that is most common among thoseksimilar images. That's it! You have just mastered your first machine learning algorithm :-)The only thing you need before you can get started is a bit of background on how images are represented, how we determine the similarity of two images and a more detailed specification for the hand-writing program we are asking you to write.

The PGM Image Format

The PGM (or Portable Gray Map) image format allows for images that are encoded in human-readable ASCII text (as opposed to a binary format), so the contents can be viewed in a text editor. You can think of an image as a matrix of pixels. Each pixel for a grayscale image is encoded by one integer number representing the grayscale of the pixel. A PGM image file contains information on all the pixels in the image (in the body of the image file), preceded by a header. The 31 lines below are a sample pgm file representing the digit '9'. You can download ithere

Preview the document

, if you want to view it in a text editor. The first 3 lines comprise the header and the remaining lines the body.



P2
28 28
255
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 36 56 137 201 199 95 37 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 45 152 234 254 254 254 254 254 250 211 151 6 0 0 0 0 0
0 0 0 0 0 0 0 0 0 46 153 240 254 254 227 166 133 251 200 254 229 225 104 0 0 0 0 0
0 0 0 0 0 0 0 0 153 234 254 254 187 142 8 0 0 191 40 198 246 223 253 21 0 0 0 0
0 0 0 0 0 0 8 126 253 254 233 128 11 0 0 0 0 210 43 70 254 254 254 21 0 0 0 0
0 0 0 0 0 0 72 243 254 228 54 0 0 0 0 3 32 116 225 242 254 255 162 5 0 0 0 0
0 0 0 0 0 0 75 240 254 223 109 138 178 178 169 210 251 231 254 254 254 232 38 0 0 0 0 0
0 0 0 0 0 0 9 175 244 253 255 254 254 251 254 254 254 254 254 252 171 25 0 0 0 0 0 0
0 0 0 0 0 0 0 0 16 136 195 176 146 153 200 254 254 254 254 150 16 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 162 254 254 241 99 3 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 118 250 254 254 90 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 100 242 254 254 211 7 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 54 241 254 254 242 59 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 131 254 254 244 64 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 13 249 254 254 152 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 12 228 254 254 208 8 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 78 255 254 254 66 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 209 254 254 137 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 227 255 233 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 113 255 108 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


Image Header

You can think of the image as having two parts, aheaderand abody. Theheaderconsists of four entries:

P2 28 28 255

P2is a "magic number". It indicates what type of image this is. For this assignment it will always be P2. P2 stands for ASCII encoded grayscale images.
Next comes the number of columns and the number of rows in the image (28x28). This specifies that the image consists of28rows and28columns of pixels.Finally, we have the maximum gray value255. Each of the 28 x 28 pixels will be represented by one integer number which specifies the gray value of that pixel,where the maximum white value (255in this case) represents white, while0always represents the color black.Note that in the sample file the four components (the magic number, # columns, # rows, maximum gray value) of the header are on three different lines. This formatting is to improve human readability. The pgm standard only requires that the four components are all separated by at least one whitespace (e.g. a blank, a TAB, a newline).

Image Body

Theimage bodycontains the actual picture information.You can see that the body of the sample image above has 28 rows, and each row has 28 numbers, corresponding to the 28 pixels in each row (since the file has 28 columns).Each number represents the grayscale ranging from 0 (black) to the max value 255 (white) for the corresponding pixel.Note that pixel values must be separated by at least one space, but there could be additional whitespace, which would be ignored by an image viewer. In the sample pgm file above there is one line for each row of the matrix, i.e. we have all the pixels of one row of the matrix in one line. We have done that for better readability, but the format does not require this. E.g. a valid pgm image could also have all the pixels of the image (i.e. all the rows in the matrix) in one line, or on the other extreme each pixel in the file could be on a separate line. The important thing is that the pixels are listed in the right order, starting with the elements of the first row of the matrix, then the second row, and so on.

How do you determine the similarity between two images?

In order to find the correct label for an input image the kNN algorithms needs to find the k most similar images in the training set. That means we need some definition of "similarity" between two images. In machine learning research, people have experimented with many different "distance functions" that determine the distance between two data points. One that is commonly used and that we will use in this assignment is the Euclidean distance, also referred to as theLaTeX: l^2l2norm. For two vectorsLaTeX: ppandLaTeX: qqthe Euclidean distanceLaTeX: d(p,q)d(p,q)is defined as:Screen Shot 2020-12-29 at 12.01.47.pngIf you think of an image as a vector that consists of its pixel values, than you can apply the formula above to compute the Euclidean distance between two images. The most similar image will be the one with the smallest Euclidean distance.

Your Handwriting Recognition Program

Your job is to write a program that takes three command line arguments:- the valuekof the kNN algorithm- the name of a file that contains the filenames of the images that make up the training set- the name of a file that contains the filenames of the images that make up the test data setYou will complete the starter code that we provide such that your program will read all the images in the training set and then use theknn_predictfunction to predict the labels for all the images in the test data set, and compare for each test image the predicted lable to the correct label (see the section below on how your program knows the correct label for an image). Once you have completed comparing the prediction for all the test images, output to STDOUT one integer representing thetotal number of correct predictions
.

We are providing starter code and image files here: (link coming soon).

How does your program know the correct label for an image?


To make use of the training images and to know whether the label the algorithm produces for a test image is correct you need to know the "ground truth", i.e. you need to know for each image the correct label. We have encoded the correct label for each image in its filename. Filenames of all images we provide are of the form[somestring]-[label].pgm.


For example, an image with the namefooboo-4.pgmwill contain an image depicting the digit 4. The label will always be the digit between a hyphen ("-") and the file ending ".pgm". You may assume that the filename contains only one ".", but it may contain multiple hyphens.


Some important hints


Here are a few things to remember as you implement and test your solution:



  1. Think about what you need to store in memory (e.g. in arrays and what size the arrays would have to be) and what not.

  2. Runtar xvzf datasets.tgzonce you have extracted the starter to uncompress all the image files (60K training, 10K testing).

  3. Sample lists have been provided to you to test with, including ones for the full training/testing datasets and ones with 1000 images from each for testing. You will probably want to use command line utilities to make even smaller ones for your initial debugging.

  4. The final number of correct predictions in the testing set should be output to STDOUT, so you should use printf. Your program should not produce any other output, besides this. The only exception is an error message, in case there is a problem with the command line arguments or an error from the standard library functions.

  5. Start by testing with a small number of files and a simple case (e.g. where the test and the training set are identical) to make sure things work as expected.

  6. You may assume that the input text file provided to your program has at least 1 image path in it, and all the paths provided are valid.

  7. If you want to view pgm files, you need to use an image viewer that supports pgm. It seems that on Macs new versions of Preview support pgm, on linux machines gimp can open pgm. Alternatively, you can convert a pgm file to a a more common format, like jpg, that any image viewer supports using a utility called mogrify. mogrify is part of a software suite called imagemagick. If you want to run this on your own computer, you will have to download imagemagickhere(Links to an external site.)and install it. The following command will convert a pgm image named input.pgm to a jpg image named input.jpg:

    mogrify -format jpg -compress none input.pgm


  8. Your program should support any type of valid pgm P2 image as an input. In particular, there might be an arbitrary number of whitespaces separating the different fields, and the row, column and max color fields could be any positive integer value.
    If you read theformal specifications(Links to an external site.)you will see that pgm supports a few things we have not talked about, including for example comments (lines that start with a #). Your program is not required to support any pgm features that are not discussed in this assignment handout.

  9. In practice, the values for k are small (between 1 to 5). So you do not need to optimize the algorithm or data structure you use to keep track of the top k nearest neighbours, e.g. no fancy sorting is needed. Simple is best!

  10. Test thoroughly before submitting a final version. We are using automated grading tools, so your program must compile and run according to the specifications. In particular, your program must be able to run on any valid pgm input file and produce output exactly as specified.

  11. If your program encounters any errors (e.g. files that do not exist, wrong number of command line arguments, filename has the wrong format, etc.) your program should return 1 and exit.



Starter code


You can download directly on the machine you're working on by running the commands below. If you're working remotely on the lab machines, run these after SSHing in and going to thecscb09w21_spacedirectory (or something similar).


# Create A1 directory
mkdir A1 && cd A1
# Download zipped starter
wget
http://www.cs.toronto.edu/~mustafa/B09/A1_Starter.zip(Links to an external site.)

# Extract starter (do this inside A1 directory)
unzip A1_Starter.zip


Alternatively, it is available here:A1_Starter.zip
.


Submission and Marking


We are using automated grading tools to provide functional feedback, so it's important that your submission be fully submitted and compile cleanly.


Your program must compile on the lab machines or mathlab, so please test it there before submission. We will be usinggccto compile program with the flags-Walland-std=c99:

gcc -Wall -std=c99 -lm -o classifier classifier.c knn.c



Once you are done verifying the program runs, if you are testing with the whole dataset it might help to compile with the-O3flag to enable compiler optimizations and make your program run faster.


Your program should not produce any error or warning messages when compiled. Programs that do not compile will receive a 0. Programs that produce warning messages will be penalized.


You will be submitting your assignments via Markus. For this assignment you will need to submit two files:classifer.candknn.c



https://markus.utsc.utoronto.ca/cscb09w21Links to an external site.


You want an extra challenge? (no credit, just for fun)


If you already have advanced knowledge of C and you are looking for an extra challenge (no credit) here are two things that you could look into:



  • Time your program to see how long it runs. Try to figure out where most of the time is spent. Do you have ideas for speeding up the program?

  • Can you think of any ideas for improving the prediction accuracy? For example, in practice there is lot of pre-processing to images before you run training and prediction to achieve better accuracy. One example is that the digit in one image might be slightly shifted to the left or to the right, compared to another image. When comparing them the pixel-wise similarity might not be high. You could try shifting each image a bit up/ down/ left /right and then compare and find the shortest distance. You could also look into other distance functions. Any other ideas you could come up with? Maybe look at the images that get classified incorrectly and get ideas from there?



Feb 09, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here