Removing Haze from a Color Photo Image Using the Near Infrared with the Wolfram Language
For most of us, taking bad pictures is incredibly easy. Band-Aid or remedy, digital post-processing can involve altering the photographed scene itself. Say you’re trekking through the mountains taking photos of the horizon, or you’re walking down the street and catch a beautiful perspective of the city, or it’s finally the right time to put the new, expensive phone camera to good use and capture the magic of this riverside… Just why do all the pictures look so bad? They’re all foggy! It’s not that you’re a bad photographer—OK, maybe you are—but that you’ve stumbled on a characteristic problem in outdoor photography: haze.
What is haze? Technically, haze is scattered light, photons bumped around by the molecules in the air and deprived of their original color, which they got by bouncing off the objects you are trying to see. The problem gets worse with distance: the more the light has to travel, the more it gets scattered around, and the more the scene takes that foggy appearance.
What can we do? What can possibly help our poor photographer? Science, of course.
Wolfram recently attended and sponsored the 2014 IEEE International Conference on Image Processing (ICIP), which ended October 30 in Paris. It was a good occasion to review the previous years’ best papers at the conference, and we noticed an interesting take on the haze problem proposed by Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, and Sabine Süsstrunk [1]. Let’s give their method a try and implement their “dehazing” algorithm.
The core idea behind the paper is to leverage the different susceptibilities of the light being scattered, which depend on the wavelength of the light. Light with a larger wavelength, such as red light, is more likely to travel around the dust, the smog, and all the other particles present in the air than shorter wavelength colors, like green or blue. Therefore, the red channel in an image carries better information about the non-hazy content of the scene.
But what if we could go even further? What prevents us from using the part of the spectrum slightly beyond the visible light? Nothing really—save for the fact we need an infrared camera.
Provided we are well equipped, we can then use the four channels of data (near infrared, red, green, and blue) to estimate the haze color and distribution and proceed to remove it from our image.
In order to get some sensible assessments, we need a sound model of how an image is formed. In a general haze model, the content of each pixel is composed of two parts:
- The light reflected by the objects in the scene (which will be called J)
- The light scattered by the sky (A)
It is a good approximation to say that the “color of the air” A is constant for a specific place and time, while the “real color” J is different for each pixel. Depending on the amount of air the light had to travel through, a fraction (t) of the real color is transmitted to the camera, and the remaining portion (1-t) is replaced by scattered light.
We can summarize these concepts in a single haze equation:
We need to determine J, t, and A. Let’s first estimate the global air-light color A. For a moment we will assume that portions of the image are extremely hazed (no transmission, i.e. t = 0). Then we can estimate the color A simply from the pixel values of those extremely hazed regions.
On the image below, a mouse click yields A = .
However, our assumption that the transmission is zero in the haziest regions is clearly not verified, as we can always distinguish distant objects through the haze. This means that for images where haze is never intense, it is not possible to pick A with a click of the mouse, and we have to resort to some image processing to see how we can produce a solid estimation for images with all types of haze.
Let’s say first that it has proven difficult to obtain good dehazing results on our example images when reproducing the ICIP paper’s method for estimating the air-light color. As an alternative method, we estimate the air light color using the concept of dark channel.
The so-called dark channel prior is based on the observation that among natural images, it is almost always the case that within the vicinity of each pixel, one of the three channels (red, green, or blue) is much darker than the others, mainly because of the presence of shadows, dark surfaces, and colorful objects.
If for every pixel at least one channel must be naturally dark, we can assume that where this condition does not hold is due to the presence of scattered light—that is, the hazed region we’re looking for. So we look for a good estimation for A intersecting the brightest pixels of our images (maximum haze or illumination) within the region defined by a high value in the dark channel (highest haze).
We extract the positions of the brightest pixels in the dark channel images, extract the corresponding pixel values in the hazed image, and finally cluster these pixel values:
The selected pixels marked in red below will be clustered; here they all belong to a single region, but it may not be the case on other images:
We are looking for the cluster with the highest average luminance:
This is our estimate of the air-light color:
Looking once more at the equation (1), we’ve made some progress, because we are only left with computing the transmission t and the haze-free pixel value J for each pixel:
Since we choose an optimization approach to solve this problem, we first compute coarse estimates, t0 and J0, that will serve as initial conditions for our optimization system.
On to finding a coarse estimate for the transmission t0. Here’s the trick and an assumption: If we assume the transmission does not change too much within a small region of the image (that we are calling Ω), we can think of t0 to be locally constant. Dividing both sides of equation (1) by A and applying the local minimum operator min both on the color channels and the pixels in each region Ω yields:
But is exactly the definition of the dark channel of the haze-free image J and, since Ak is a positive number, we infer that this term of the equation is practically zero everywhere, given our prior assumption that natural images have at least one almost zero channel in the pixels of any region. Using this simplification yields:
This is the t0 image. The darker the image area, the hazier it is assumed to be:
Now the real transmission map cannot be that “blocky.” We’ll take care of that in a second. In the ICIP 2013 paper, there is another clever process to make sure we keep a small amount of haze so that the dehazed image still looks natural. This step involves information from the near-infrared image; we describe this step in the companion notebook that you can download at the bottom of this post. Here is an updated transmission map estimate after this step:
To further refine this estimate by removing these unwanted block artifacts, we apply a technique named guided filtering. It is beyond the scope of the blog post to describe the details of a guided filter. Let’s just say that here, the guided filtering of the transmission map t0 using the original hazed image as a guide allows us, by jointly processing both the filtered image and the guided image, to realign the gradient of t0 with the gradient of the hazed image—a desired property that was lost due to the blocking artifacts. The function ImageGuidedFilter is defined in the companion notebook a the end of this post.
As too much dehazing would not look realistic, and too little dehazing would look too, well, hazed, we adjust the transmission map t0 by stretching it to run from 0.1 to 0.95:
Thanks to our estimates for the air-light color A and the transmission map t0, another manipulation of equation (1) gives us the estimate for the dehazed image J0:
You can compare with the original image just by positioning your mouse on top of the graphic:
It’s a good start, but a flat subtraction may be too harsh for certain areas of the image or introduce some undesirable artifacts. In this last part, we will use some optimization techniques to try to reconcile this gap and ask for the help of the infrared image to keep a higher level of detail even in the most hazed region.
The key is in the always useful Bayes’ rule for inference. The question we are asking ourselves here is which pair of t and J is the most likely to produce the observed images IRGB and INIR (the near-infrared image).
In the language of probability, we want to calculate the joint distribution
Using the Bayes’ theorem, we rewrite it as:
And simplify it assuming that the transmission map t and the reflectance map J are uncorrelated, so their joint probability is simply the product of their individual ones:
In order to write this in a form that can be optimized, we now assume that each probability term is distributed according to:
That is, it peaks in correspondence with the “best candidate” . This allows us to exploit one of the properties of the exponential function—e-ae-be-c… = e-(a+b+c+…)—and, provided that the addends in the exponent are all positive, to move from the maximization of a probability to the minimization of a polynomial.
We are now left with the task of finding the “best candidate” for each term, so let’s dig a bit into their individual meaning guided by our knowledge of the problem.
The first term is the probability to have a given RGB image given specific t and J. As we are working within the framework of equation (1)—the haze model IRGB = Jt + A(1 – t)—the natural choice is to pick:
||IRGB – Jt + A(1 – t) ||
The second term relates the color image to the infrared image. We want to leverage the infrared channel for details about the underlying structure, because it is in the infrared image that the small variations are less susceptible to being hidden by haze. We do this by establishing a relationship between the gradients (the 2D derivatives) of the infrared image and the reconstructed image:
||▽J – ▽INIR||
This relation should take into account the distance between the scene element and the camera—being more important for higher distances. Therefore we multiply it by a coefficient inversely related to the transmission:
The last two terms are the transmission and reflection map prior probabilities. This corresponds to what we expect to be the most likely values for each pixel before any observation. Since we don’t have any information in this regard, a safe bet is to assume them equal to a constant, and since we don’t care about which constant, we just say that their derivative is zero everywhere, so the corresponding terms are simply:
||▽t||
And:
||▽J||
Putting all these terms together brings us to the final minimization problem:
Where the regularization coefficients λ1,2,3 and the exponents α and β are taken from the ICTP paper.
To resolve this problem, we can insert the initial condition t0 and J0, move a bit around, and see if we are doing better. If that is the case, we can then use the new images (let’s call them t1 and J1) for a second step and calculate t2 and J2. After many iterations, when we feel the new images are not much better than those of the previous step, we stop and extract the final result.
This new image J tends to be slightly darker than the original one; in the paper, a technique called tone mapping is applied to correct for this effect, where the channel values are rescaled in a nonlinear fashion to adjust the illumination:
V’ = Vγ
During our experiments, we found instead that we were better off applying the tone mapping first, as it helped during the optimization.
To help us find the correct value for the exponent γ, we can look at the difference between the low haze—that is, high transmission—parts of the original image IRGB and the reflectance map J0. We want to find a value for γ that makes this difference the smallest possible and adjust J0 accordingly:
We now implement a simplified version of the steepest descent algorithm to solve the optimization problem of equation (6). The function IterativeSolver is defined in the companion notebook a the end of this post.
When that optimization is done, our final best guess for the amount of haze in the image is:
And finally, you can see the unhazed result below. To compare it with the original, hazy image, just position your mouse on top of the graphics:
We encourage you to download the companion CDF notebook to engage deeper in dehazing experiments.
Let’s now leave the peaceful mountains and the award-winning dehazing method from ICIP 2013 and move to Paris, where ICIP 2014 just took place. Wolfram colleagues staffing our booth at the conference confirmed that dehazing (and air pollution) is still an active research topic. Attending such conferences has proven to be excellent opportunities to demonstrate how the Wolfram Language and Mathematica 10 can facilitate research in image processing, from investigation and prototyping to deployment. And we love to interact with experts so we can continue to develop the Wolfram Language in the right direction.
Download this post as a Computable Document Format (CDF) file.
References:
[1] C. Feng, S. Zhuo, X. Zhang, L. Shen, and S. Süsstrunk. “Near-Infrared Guided Color Image Dehazing,” IEEE International Conference on Image Processing, Melbourne, Australia, September 2013 (ICIP 2013).
[2] K. He, J. Sun, X. Tang. “Single Image Haze Removal Using Dark Channel Prior,” IEEE Conference on Computer Vision and Pattern Recognition, Miami, Florida, June 2009 (CVPR’09).
Images taken from:
[3] L. Schaul, C. Fredembach, and S. Süsstrunk. “Color Image Dehazing Using the Near-Infrared,” IEEE International Conference on Image Processing, Cairo, Egypt, November 2009 (ICIP’09).
The idea is great but it seems that you do not take into account the fact that by default you have the image in the nonlinear sRGB colorspace, hence the color intensity values are not additive. I think that your results can be improved by switching into the linear RGB colorspace.
In scanning the article trice, I did not find out whether a fourth channel( near IR) is really used in the algorithm. The ‘dark channel’-idea alone should do the job. Taking into account that 4-channel digital cameras are not very common, such a ‘dark channel only’ algorithm would be of much higher general interest compared with one that needs access to neat IR data.