Artificial InteligenceLast updated on Tuesday, September 18, 2018 2018 IFN680 - Assignment Two ...

Question

Artificial InteligenceLast updated on Tuesday, September 18, 2018 2018 IFN680 - Assignment Two  (Siamese network) Assessment information   • Code and report submission due on Sunday 28th October,  23.59pm   • Use Blackboard to submit your work • Recommended group size: three people per submission. Smaller group sizes allowed (1 or 2  people OK.  Completion of the same tasks required). Overview • You will implement a deep neural network classifier to predict whether two images belong  to the same class.  • The approach you are asked to follow is quite generic and can be applied to problems where we seek to determine whether two inputs belong to the same equivalence class. • You will write code, perform experiments and report on your results. Introduction Despite impressive results in object classification, verification and recognition, most deep neural network based recognition systems become brittle  when the view point  of the camera changes dramatically.  Robustness to geometric transformations is highly desirable for applications like wild life monitoring where there is  no control on the pose of the objects of interest.  The images of different  objects  viewed from various  observation  points  define  equivalence classes where by definition two images are said to be  equivalent if they are views from the same object.  These  equivalence  classes can  be  learned  via  embeddings  that  map  the  input  images  to  high dimensional vectors. During training, equivalent images are mapped to vectors that get pulled closer together, whereas if the images are not equivalent their associated vectors get pulled apart. Background Common machine learning tasks like classification and recognition involve learning an appearance model. These tasks  can be interpreted and even reduced to the problem of learning manifolds from a training set.  Useful appearance models create an invariant representation of the objects of interest under a range of conditions. A good representation should combine invariance and discriminability. For example, in facial recognition where the task is to compare two images and determine whether they show the same person, the output of the system should be invariant to the pose of the heads. More generally, the category of an object contained in an image should be invariant to viewpoint changes.  This assignment borrows ideas from a system developed for a manta ray recognition system. The motivation for our research work is the lack of fully automated identification systems for manta rays. The techniques developed for such systems can also potentially be applied to other marine species  that  bear  a  unique  pattern  on  their  surface.  The  task  of  recognizing  manta  rays  is challenging  because  of  the  heterogeneity  of  photographic  conditions  and  equipment  used  in acquiring manta ray photo ID images like those in the figures below. Two images of the same Manta ray Many  of  those  pictures  are  submitted  by  recreational  divers.  For  those  pictures,  the  camera parameters are generally not known.  The state of the art for manta ray recognition is a system that requires user input to manually align and normalize the 2D orientation of the  ray within the image. Moreover, the user has to select a rectangular region of interest containing the spot pattern. The images have also to be of high quality. In practice, marine biologists still prefer to use their own decision tree that they run manually.  In order to develop robust algorithms for recognizing manta spot patterns, a research student (Olga, one  of  the  tutors  of  this  unit)  and  I   have  considered  the  problem of  recognizing  artificially generated patterns subjected to projective transformations that simulate changes in the camera view point.  Artificial  data  allowed  us  to  experiment  with  a  large  amount  of  patterns  and  compare different network architectures to select the most suitable for learning geometric equivalence. Our experiments  have  demonstrated  that   Siamese1 convolutional  neural  networks  are  able  to discriminate between patterns subjected to large homographic transformations. Promising results have been also obtained with real images of manta rays. Training such a complex neural network requires access to good computing facilites.  This is why in this assignment, you will work with a simpler dataset. Namely the MNIST dataset. You will build a classifier to predict whether two images correspond to the same digit or not. Note this is not the same problem as building a digit classifier! Learning equivalence classes A Siamese network consists of two identical subnetworks that share the same weights followed by a distance calculation layer.  The input of a Siamese network is a pair of images Pi  and Pj.  If the two images are deemed from the same equivalence classes, the pair is called a  positive pair, whereas for a pair of images from different equivalence classes, the pair is called a negative pair. The input images Pi and Pj are fed to the twin subnetworks to produce two vector representations f(Pi)  and f(Pj) that are used to calculate a proxy distance. The training of a Siamese network is done with a collection of positive and negative pairs. Learning is performed by optimizing a contrastive loss function (see details  later).  The aim of the training algorithm is  to  minimize the distance between a pair of images from the same equivalence class while  maximizing the distance between 1 Siamese network are defined in the next section. a pair of images from different equivalence classes.  With a conventional neural network classifier, we need to know in advance the number of classes. In practice, this is not always possible. Consider a face recognition system. If its use is limited to a fix group of people of let say 10,000 people, you can train a convolutional neural network with 10,000 outputs where each output corresponds to a person. But this approach is not suitable when the group of people is more fluid. We want to be able to use the same network without having to retrain it  when new people arrive.  What is needed is  a neural network that can take two input images and predict whether these are images of the same person. This neural network learns to compare pair of faces in general, and does not learn about specific people.   This is a significant difference. You do not have to retrain the neural network if your population changes. Siamese network architecture A Siamese network consists of two identical subnetworks that share the same weights followed by a  distance calculation layer. The input of a Siamese network is a pair of images (Pi, Pj) and a label  yij . If the two images are deemed from the same equivalence classes, the pair is called a positive pair, and the target value is yij  = 0. Whereas for a pair of images from different equivalence classes, the pair is called a negative pair, and the target value is yij  =  1. The target value yij  can be interpreted as the desired distance between the embedding vectors. The input images (Pi, Pj)  are fed to the twin subnetworks to produce two vector representations f(Pi) , f(Pj) that are used to calculate a proxy distance. The training of a Siamese network is done on a collection of positive and negative pairs. Learning is performed by optimizing a contrastive loss function; where  D(Pi, Pj)  is the Euclidian distance between  f(Pi)  and f(Pj).  The margin m > 0 determines how far the embeddings of a negative pair should be pushed apart.  Required tasks and experiments • Load the MNIST dataset (use keras.datasets.mnist.load_data).  • Split the dataset such that  ◦ the digits in [2,3,4,5,6,7]  are used for training and testing ◦ the digits  in [0,1,8,9] are only used for testing.  None of these digits  should be used during training. • Implement and test the contrastive loss function described earlier in this document. • Build a Siamese network.  • Train your Siamese network on your training set. Plot the training and validation error vs time. • Evaluate the generalization capability of your network by  ◦ testing it with pairs from [2,3,4,5,6,7] x [2,3,4,5,6,7]  ◦ testing it with pairs from [2,3,4,5,6,7] x [0,1,8,9]  ◦ testing it with pairs from [0,1,8,9] x [0,1,8,9] • Present your results in tables and figures. Implementation hints • You need the functional model of Keras to implement the Siamese network.  Refer to https://keras.io/getting-started/functional-api-guide/  for examples. • For  the  shared  network  of  the  Siamese  network,  you  can  adopt  the  CNN  network architecture used in  the solution Week 07 prac. • There are plenty of examples of loss functions at  https://github.com/keras-team/keras/blob/master/keras/losses.py • Keep 80% of the [2,3,4,5,6,7] digits for training (and 20% for testing). • Keep 100% of the [0,1,8,9] digits for testing. https://keras.io/getting-started/functional-api-guide/ https://github.com/keras-team/keras/blob/master/keras/losses.py Submission You should submit via Blackboard  • A report in pdf format strictly  limited to 8 pages in total   in which you present your experimental results using tables and figures. Only one person per group has to submit the assignment. Make sure you list the members of your group in the report and in the code. The feedback will be given to the person submitting the assignment and is expected to be shared with the other team members. • Your Python file  my_submission.py containing all your code and instructions on how to repeat your experiments. Marking criteria • Report:   10 marks  • Structure (sections, page numbers), grammar, no typos. • Clarity of explanations.  • Figures and tables  (use for explanations and to report performance). Levels of Achievement 10 Marks 7 Marks 5 Marks 3 Marks 1 Mark Report written at  the highest  professional  standard with  respect to spelling, grammar,  formatting,  structure, and  language  terminology. Report is very- well written and  understandable  throughout, with  only a few  insignificant  presentation  errors. Methodology,  experiments are  clearly presented. The report is  generally well- written and  understandable  but with a few  small presentation  errors that make  one of two points  unclear

Last updated on Tuesday, September 18, 2018 2018 IFN680 - Assignment Two (Siamese network) Assessment information • Code and report submission due on Sunday 28th October, 23.59pm • Use Blackboard to...

Get Answer To This Question

Related Questions & Answers

Submit New Assignment