Last updated on Tuesday, September 18, 2018 2018 IFN680 - Assignment Two (Siamese network) Assessment information • Code and report submission due on Sunday 28th October, 23.59pm • Use Blackboard to...

Artificial Inteligence


Last updated on Tuesday, September 18, 2018 2018 IFN680 - Assignment Two (Siamese network) Assessment information • Code and report submission due on Sunday 28th October, 23.59pm • Use Blackboard to submit your work • Recommended group size: three people per submission. Smaller group sizes allowed (1 or 2 people OK. Completion of the same tasks required). Overview • You will implement a deep neural network classifier to predict whether two images belong to the same class. • The approach you are asked to follow is quite generic and can be applied to problems where we seek to determine whether two inputs belong to the same equivalence class. • You will write code, perform experiments and report on your results. Introduction Despite impressive results in object classification, verification and recognition, most deep neural network based recognition systems become brittle when the view point of the camera changes dramatically. Robustness to geometric transformations is highly desirable for applications like wild life monitoring where there is no control on the pose of the objects of interest. The images of different objects viewed from various observation points define equivalence classes where by definition two images are said to be equivalent if they are views from the same object. These equivalence classes can be learned via embeddings that map the input images to high dimensional vectors. During training, equivalent images are mapped to vectors that get pulled closer together, whereas if the images are not equivalent their associated vectors get pulled apart. Background Common machine learning tasks like classification and recognition involve learning an appearance model. These tasks can be interpreted and even reduced to the problem of learning manifolds from a training set. Useful appearance models create an invariant representation of the objects of interest under a range of conditions. A good representation should combine invariance and discriminability. For example, in facial recognition where the task is to compare two images and determine whether they show the same person, the output of the system should be invariant to the pose of the heads. More generally, the category of an object contained in an image should be invariant to viewpoint changes. This assignment borrows ideas from a system developed for a manta ray recognition system. The motivation for our research work is the lack of fully automated identification systems for manta rays. The techniques developed for such systems can also potentially be applied to other marine species that bear a unique pattern on their surface. The task of recognizing manta rays is challenging because of the heterogeneity of photographic conditions and equipment used in acquiring manta ray photo ID images like those in the figures below. Two images of the same Manta ray Many of those pictures are submitted by recreational divers. For those pictures, the camera parameters are generally not known. The state of the art for manta ray recognition is a system that requires user input to manually align and normalize the 2D orientation of the ray within the image. Moreover, the user has to select a rectangular region of interest containing the spot pattern. The images have also to be of high quality. In practice, marine biologists still prefer to use their own decision tree that they run manually. In order to develop robust algorithms for recognizing manta spot patterns, a research student (Olga, one of the tutors of this unit) and I have considered the problem of recognizing artificially generated patterns subjected to projective transformations that simulate changes in the camera view point. Artificial data allowed us to experiment with a large amount of patterns and compare different network architectures to select the most suitable for learning geometric equivalence. Our experiments have demonstrated that Siamese1 convolutional neural networks are able to discriminate between patterns subjected to large homographic transformations. Promising results have been also obtained with real images of manta rays. Training such a complex neural network requires access to good computing facilites. This is why in this assignment, you will work with a simpler dataset. Namely the MNIST dataset. You will build a classifier to predict whether two images correspond to the same digit or not. Note this is not the same problem as building a digit classifier! Learning equivalence classes A Siamese network consists of two identical subnetworks that share the same weights followed by a distance calculation layer. The input of a Siamese network is a pair of images Pi and Pj. If the two images are deemed from the same equivalence classes, the pair is called a positive pair, whereas for a pair of images from different equivalence classes, the pair is called a negative pair. The input images Pi and Pj are fed to the twin subnetworks to produce two vector representations f(Pi) and f(Pj) that are used to calculate a proxy distance. The training of a Siamese network is done with a collection of positive and negative pairs. Learning is performed by optimizing a contrastive loss function (see details later). The aim of the training algorithm is to minimize the distance between a pair of images from the same equivalence class while maximizing the distance between 1 Siamese network are defined in the next section. a pair of images from different equivalence classes. With a conventional neural network classifier, we need to know in advance the number of classes. In practice, this is not always possible. Consider a face recognition system. If its use is limited to a fix group of people of let say 10,000 people, you can train a convolutional neural network with 10,000 outputs where each output corresponds to a person. But this approach is not suitable when the group of people is more fluid. We want to be able to use the same network without having to retrain it when new people arrive. What is needed is a neural network that can take two input images and predict whether these are images of the same person. This neural network learns to compare pair of faces in general, and does not learn about specific people. This is a significant difference. You do not have to retrain the neural network if your population changes. Siamese network architecture A Siamese network consists of two identical subnetworks that share the same weights followed by a distance calculation layer. The input of a Siamese network is a pair of images (Pi, Pj) and a label yij . If the two images are deemed from the same equivalence classes, the pair is called a positive pair, and the target value is yij = 0. Whereas for a pair of images from different equivalence classes, the pair is called a negative pair, and the target value is yij = 1. The target value yij can be interpreted as the desired distance between the embedding vectors. The input images (Pi, Pj) are fed to the twin subnetworks to produce two vector representations f(Pi) , f(Pj) that are used to calculate a proxy distance. The training of a Siamese network is done on a collection of positive and negative pairs. Learning is performed by optimizing a contrastive loss function; where D(Pi, Pj) is the Euclidian distance between f(Pi) and f(Pj). The margin m > 0 determines how far the embeddings of a negative pair should be pushed apart. Required tasks and experiments • Load the MNIST dataset (use keras.datasets.mnist.load_data). • Split the dataset such that ◦ the digits in [2,3,4,5,6,7] are used for training and testing ◦ the digits in [0,1,8,9] are only used for testing. None of these digits should be used during training. • Implement and test the contrastive loss function described earlier in this document. • Build a Siamese network. • Train your Siamese network on your training set. Plot the training and validation error vs time. • Evaluate the generalization capability of your network by ◦ testing it with pairs from [2,3,4,5,6,7] x [2,3,4,5,6,7] ◦ testing it with pairs from [2,3,4,5,6,7] x [0,1,8,9] ◦ testing it with pairs from [0,1,8,9] x [0,1,8,9] • Present your results in tables and figures. Implementation hints • You need the functional model of Keras to implement the Siamese network. Refer to https://keras.io/getting-started/functional-api-guide/ for examples. • For the shared network of the Siamese network, you can adopt the CNN network architecture used in the solution Week 07 prac. • There are plenty of examples of loss functions at https://github.com/keras-team/keras/blob/master/keras/losses.py • Keep 80% of the [2,3,4,5,6,7] digits for training (and 20% for testing). • Keep 100% of the [0,1,8,9] digits for testing. https://keras.io/getting-started/functional-api-guide/ https://github.com/keras-team/keras/blob/master/keras/losses.py Submission You should submit via Blackboard • A report in pdf format strictly limited to 8 pages in total in which you present your experimental results using tables and figures. Only one person per group has to submit the assignment. Make sure you list the members of your group in the report and in the code. The feedback will be given to the person submitting the assignment and is expected to be shared with the other team members. • Your Python file my_submission.py containing all your code and instructions on how to repeat your experiments. Marking criteria • Report: 10 marks • Structure (sections, page numbers), grammar, no typos. • Clarity of explanations. • Figures and tables (use for explanations and to report performance). Levels of Achievement 10 Marks 7 Marks 5 Marks 3 Marks 1 Mark Report written at the highest professional standard with respect to spelling, grammar, formatting, structure, and language terminology. Report is very- well written and understandable throughout, with only a few insignificant presentation errors. Methodology, experiments are clearly presented. The report is generally well- written and understandable but with a few small presentation errors that make one of two points unclear
Sep 21, 2020
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here