Detecting Kinship from a Pair of Images
November 21, 2011 — Matthias Odisio, Mathematica Algorithm R&D
We just attended the International Conference on Image Processing (ICIP) in Brussels and the 14th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in Toronto. Both events proved to be great occasions to demonstrate how Mathematica can aid research in image processing, from investigation and prototyping to deployment. Last but not least, we seized the opportunity to listen to the experts in the field to extend the functionality of Mathematica in the right direction.
Last year, Ruogu Fang, Kevin D. Tang, Noah Snavely, and Tsuhan Chen received the best paper award at ICIP for their study of kinship detection from pairs of images. I decided to build a detector by following their proposed framework with Mathematica.
A lot has been said and studied about our human ability to recognize faces; computer programs can be quite good at it, too. Similarly, when looking at faces, we may display a canny ability to detect an ancestry relationship, or kinship.
In these pictures, is the person on the left the father of the person on the right? Yes, and we can figure it out with Mathematica.
I developed a Mathematica kinship detector according to the following methodology, which comprises three steps:
- Collect a set of pairs of parent and child images. Each image is roughly cropped to a nearly frontal view of the face.
- Extract facial features from each image. The features need to be potential good discriminants.
- Build a classifier from the set of features vector. The classifier accepts as input differences between parents’ and children’s features. The construction process yields both a list of the most discriminative features as well as the final trained classifier.
On with the first step. The authors have made their dataset available as a ZIP file. I grabbed it and unzipped all the pictures on my machine. Importing the pictures in Mathematica is straightforward.
These are low resolution images; from what I can see, the dataset comprises a lot of celebrities—not that it would matter for a computer, but it biases our own perception. I quickly designed this interface to visualize the dataset (nearly all the images have dimensions of 100 by 100):
Good sign: I could start discerning some kinship patterns myself. Moreover, there is a lot of variability in the images, which is due to factors not related to kinship, such as lighting, scale of the face, compression artifacts, acquisition noise, and lifespan moment of the acquisition. Given this variability and the size of the dataset (it only consists of about 150 pairs), it may be difficult to build a robust classifier. Poses and expressions vary too, but arguably they could be an indicator of kinship.
We need now to represent each image through a set of facial features. Let’s consider various shape and appearance descriptors as our prospective image features. We make use of facial landmarks that we can also find on the images. The authors used a semi-automatic, semi-manual method for the labeling. I figured that with less than 300 images, it was likely faster and more accurate, but also more tedious, to go all manual—expert labeling, as it’s called.
Thanks to the complete integration of the Image object in Mathematica, all I had to do was display in sequence a magnified version of each image, click the facial landmarks of interest, and then export the coordinates with sub-pixel precision to a text file so that I could easily recover the data later. Below is how it looks. Note that the two points in the hair define a rectangle from which the hair color can be extracted.
From the coordinates of the landmarks, we can compute a variety of properties describing the face. We can try many measurements and let the module in charge of building the classifier figure out which ones are relevant, that is, discriminative for kinship detection. Surely there could be other data-based approaches relying more on knowledge and less on information, but these days that’s the direction the pendulum swings.
Our model of a face includes descriptors of the appearance (color of the eyes, skin, hair, etc.), of the shape (normalized length of the nose, eye to mouth normalized distance, etc.), and of the overall texture expressed as the histogram of the gradient magnitude, which indicates how smooth or rough an image appears.
It was easy to implement in Mathematica, and even the required bookkeeping could be resolved elegantly.
For example, the hair color is extracted as follows:
I exported the 26 computed features in text files; we shall just import them in the notebook. The features are arranged according to the following list:
That’s it. Each image can be modeled by a vector of features. Finally we can start constructing the classifier. First we shall assess individually each feature with respect to its intrinsic performance at detecting a kinship relation. To follow the original authors, I went with a classifier called k-nearest neighbors (k-NN). As its name suggests, the k-NN classifier functions in two steps, first by finding the k training samples that are the closest to the query sample, and then by returning the class that is the most common class among the neighbors. For example, with k set to 5, if the 5 training samples that are the closest to the query sample consist of 4 positive kinship samples and 1 negative kinship sample, the query sample will be detected as a positive kinship sample.
It is simple to design a k-NN classifier in Mathematica. The function Nearest takes as input training samples and the classes they belong to and returns a function that can be used to give the nearest neighbors’ classes of a particular query. In our application, the training samples that correspond to a kinship pair belong to the class 1, and the other training samples belong to the class -1. Using this convention, classifying a test sample boils down to testing whether the sum of all the neighbors’ classes is positive. The function kNNFunction takes as its argument two sets of data, one set of positive instances that constitute the class 1, and one set of negative instances that constitute the class -1:
To understand how k-NN classifiers work, let’s look at a toy example where we classify points of the 2D plane. We consider two classes of points, which correspond to two different distributions. The functions randomsamples1 and randomsamples2 create n 2D points drawn from normal distributions.
Let’s set k to 5 and study the query point (0, 0), displayed in green below. Out of its 5 nearest neighbors (highlighted in yellow), 3 are blue. Blue points represent the positive, or 1, class, and red points the negative, or -1 class. Hence it is classified as belonging to the blue class, and the classifier returns True:
As a test of the classifier, let’s draw 10 samples from the class 1 and classify them. A perfect classification would yield only True responses, so-called true positives. Here we get, for the 10 responses, 9 that are true positives and 1 that is false negative:
Similarly, a perfect classification of 10 samples drawn from the class -1 would yield only False responses, so-called true negatives. What we get are 6 true negatives and 4 false positives.
To better assess the classifier’s performance, I have run the same two experiments, but using 100 samples from the class 1 and 100 samples from the class -1. From the 100 samples from class 1, 75 were correctly detected (true positive) and 25 were not (false negative), while from the 100 samples from class 2, 82 were correctly not detected (true negative) and 18 incorrectly were not (false positive). Overall the accuracy of the classifier is estimated to be just below 80%—the formula is simply
(75 + 82) / (100 + 100).
Before going back to our application, we need to introduce another classic tool to test our classifiers, the 5-fold cross validation. This procedure can be effective in producing a more realistic estimate of a classifier’s accuracy when it is not possible to draw an arbitrary number of samples for testing—precisely the case in our kinship detection study where the dataset is small. In this process, the data samples are randomly partitioned into 5 sets, 1 being used for testing and the other 4 sets being used for training. This is repeated 5 times by rotating the set used for testing, and finally averaging the 5 responses.
It was quick to implement this rudimentary classification package, so let’s go back to kinship detection. How to select the most discriminative features? We are going to first pick the features that alone perform better than chance. The classifier’s input is the difference between the parents’ features and the children’s features. In addition to the positive samples we already have, we also create negative samples by randomly pairing parents and children:
In my experiments, I considered the nearest 11 neighbors. This number happens to yield the best results. This table shows the classification accuracy for the 20 features that perform individually better than chance:
From this pool of feature candidates, the final classifier is built in a greedy fashion. Starting with the best individual feature, the feature yielding the best accuracy when combined with the previously retained features is selected; this process iterates until all the features have been selected.
The agglomerated features together with their combined accuracies are:
Based on this table, the final classifier could be built using the 7 features that, when combined, are best at kinship detection. That would lead to a feature vector of length 18, yielding an estimated accuracy just above 73%. Since the database is rather small, I felt selecting only the top 4 features was a proper trade-off between accuracy and risk of over-fitting. The retained features are: the color of the skin, the color of the mouth, the intensity of the right eye, and the histogram of the gradient magnitude. Even though features related to the facial shape are not retained, they would be necessary, would one want to achieve the best estimated accuracy.
With these 4 features, each image is described by a vector of length 15, for an overall accuracy of 73%. The authors’ classifier comprises 6 features resulting in a vector of length 10 for an accuracy of 71%; my implementation reached 71% accuracy using 3 features resulting in a vector of length 7. We are in the same ballpark, and it is not a surprise that our results differ slightly: we used different landmarks and different parameters for the feature extraction, as well as different trainers and random number generators.
Now is time to try our detector:
This is an example where the detector is incorrect:
This interface allows us to see how a pair of images is detected:
There is much more to explore on this fascinating topic, but that’s beyond the scope of this blog post. How well do we, human beings, perform at this task? Asking people to make perceptual judgments would provide a comparison, but this dataset cannot be used for this purpose, as it contains too many celebrities for whom different cognitive processes are likely to be involved. I leave this endless topic here for now. If you wish to go further with your own ideas, you may want to download the attached ZIP file, which will allow you to reproduce the described study and enrich it.
R. Fang, K. D. Tang, N. Snavely, T. Chen. “Towards Computational Models of Kinship Verification.” In IEEE International Conference on Image Processing, 1577–1580, 2010.