Seeing Skin with Mathematica
May 9, 2013 — Matthias Odisio, Mathematica Algorithm R&D
Detecting skin in images can be quite useful: it is one of the primary steps for various sophisticated systems aimed at detecting people, recognizing gestures, detecting faces, content-based filtering, and more. In spite of this host of applications, when I decided to develop a skin detector, my main motivation lay elsewhere. The research and development department I work in at Wolfram Research just underwent a gentle reorganization. With my colleagues who work on probability and statistics becoming closer neighbors, I felt like developing a small application that would make use of both Mathematica‘s image processing and statistics features; skin detection just came to my mind.
Skin tones and appearances vary, and so do flavors of skin detectors. The detector I wanted to develop is based on probabilistic models of pixel colors. For each pixel of an image given as input, the skin detector provides a probability that the pixel color belongs to a skin region.
Let’s first look at our detection task wearing probability goggles. We would like to estimate the probability that a pixel belongs to skin given its color. Using Bayes’s rule, this can be expressed as (we’ll call this equation 1):
Please note that in this blog post, probabilities are denoted with an uppercase P[.].
The three terms on the right-hand side of this equation can be expressed in computable forms. That’s where the image processing goggles and these two training datasets will come in handy: one that consists of pixels that belong to skin regions, and one that consists of pixels that do not. We will train a statistical model and derive a computable formula for P[color|skin]. Estimating a priori P[skin] can be arbitrary in the sense that it depends on the final application that the skin detector is built for; we will make a simple choice and estimate the a priori probability of skin as the proportion of skin pixels from the two training datasets. The third term, P[color], is at first more problematic because robustly modeling the probabilities of every possible color requires an immense training dataset. Thankfully, the law of total probability allows us to work around this problem by decomposing this term as:
Finally, our probabilistic skin detector will be an implementation of this formula (we’ll call this equation 2):
Let’s now put on image processing goggles, create the two required datasets, and train our probabilistic models from it.
I know firsthand that creating datasets of images is a fine art that requires a lot of dedicated effort. Frankly, this time we won’t try so hard, gathering a mere dozen images containing skin regions instead. It’s likely about a few hundred images short of being able to derive statistically meaningful results with complex skin models!
Using Mathematica‘s Graphics menu, I can manually select regions with skin in an image and repeat the process for the images of the datasets—admittedly not the best part of my day:
There have been reports that the standard RGB color space is not the best color space to implement models of skin and non-skin colors. Most of the difficulties arise from how different skin can look like depending, for example, on complexion or changes in illuminating conditions.
As you can see, the skin and non-skin colors (displayed in red and green, respectively) in the datasets do not overlap much, but they do span a large portion of the RGB 3D space:
Let’s experiment with another color space where such changes in skin appearance are supposedly modeled more robustly.
We can transfer to the CIE XYZ color space and then decompose a skin color into two chromaticity coordinates and one luminance coordinate. To build models more robust to changes in illuminating conditions, we only keep the two chromaticity coordinates, x and y, defined as x = X / (X + Y + Z) and y = Y / (X + Y + Z):
This function colorConvertxy gives a two-channel xy image:
The extraction of the list of skin xy pairs is performed seamlessly using the image, the mask of skin regions, and the functions PixelValuePositions and PixelValue. Similarly, one can extract the list of non-skin xy pairs.
In the 2D xy chromaticity space, the skin coordinates for all the training images are not as widespread as in the RGB color space, and the distinction between skin and non-skin is more apparent. In the following representation of the skin and non-skin color distributions, skin colors are displayed in shades of red and non-skin regions in shades of green:
Working in the 2D xy chromaticity plane looks promising. Let’s move ahead and implement data-based statistical models for equation 2. For reading convenience, here is the formula again:
The proportion of skin pixels is about 13% in the million-pixel training datasets. We’ll keep this empirical value in our model for the a priori probability P[skin]:
To model the probability density functions of P[color|skin] and P[color|nonskin], a mixture of Gaussian distributions come to mind. These distributions could be suited to the xy data represented above. However, on my laptop computations are a bit snappier when selecting a model based on smooth kernel distributions instead:
Finally, the probability that a given xy color corresponds to skin is implemented as the following skin-ness function:
How does this implemented model perform? Let’s apply the function to a few test images:
Not bad, is it? With this other test image below, it is interesting that the blurred region at the boundary between the red T-shirt and the green foliage has been incorrectly given a very high skin-ness.
We’re nearly done. Actually detecting skin requires deciding whether a pixel belongs to skin. To create such a binary image, we could threshold the probabilistic images we just obtained. How should the threshold be chosen? If the threshold is too high, actual skin pixels may not be detected. If the threshold is too low, non-skin pixels may be incorrectly detected. This is moving on to receiver operating characteristics (ROC) graphs and analysis, and while it would be a pleasant recreation with Mathematica, I feel it deserves more attention. Another blog post, another time.
An alternative strategy is to compare the probabilities of a pixel being skin or non-skin. One last time, let’s wear the probability goggles and revisit equation 1: P[skin|color] == P[color|skin]*P[skin]/P[color]. Similarly, we can express the a posteriori probability of not belonging to a skin region:
To classify a color as skin or as non-skin, we just need to find the greater of the two a posteriori probabilities. The term P[color] can be eliminated, and thereby a pixel will be detected as skin if: P[color|skin]*P[skin] > P[color|nonskin]*P[nonskin].
Here comes today’s skin detector:
A quick assessment on test images suggests it can perform quite well:
Let’s leave this topic for now, and wrap it up with a skin detector app:
My initial intention was to build a skin detector for the mere pleasure of exploring the interplay between probability, statistics, and image processing. Pretty much every step that leads to our final skin detector (e.g. training dataset, choice of statistical distributions, classifier, quantitative assessment) can be studied further and improved rather easily using the large breadth of Mathematica‘s features.