What is haze? Technically, haze is scattered light, photons bumped around by the molecules in the air and deprived of their original color, which they got by bouncing off the objects you are trying to see. The problem gets worse with distance: the more the light has to travel, the more it gets scattered around, and the more the scene takes that foggy appearance.

What can we do? What can possibly help our poor photographer? Science, of course.

Wolfram recently attended and sponsored the 2014 IEEE International Conference on Image Processing (ICIP), which ended October 30 in Paris. It was a good occasion to review the previous years’ best papers at the conference, and we noticed an interesting take on the haze problem proposed by Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, and Sabine Süsstrunk [1]. Let’s give their method a try and implement their “dehazing” algorithm.

The core idea behind the paper is to leverage the different susceptibilities of the light being scattered, which depend on the wavelength of the light. Light with a larger wavelength, such as red light, is more likely to travel around the dust, the smog, and all the other particles present in the air than shorter wavelength colors, like green or blue. Therefore, the red channel in an image carries better information about the non-hazy content of the scene.

But what if we could go even further? What prevents us from using the part of the spectrum slightly beyond the visible light? Nothing really—save for the fact we need an infrared camera.

Provided we are well equipped, we can then use the four channels of data (near infrared, red, green, and blue) to estimate the haze color and distribution and proceed to remove it from our image.

In order to get some sensible assessments, we need a sound model of how an image is formed. In a general haze model, the content of each pixel is composed of two parts:

- The light reflected by the objects in the scene (which will be called
**J**) - The light scattered by the sky (
**A**)

It is a good approximation to say that the “color of the air” **A** is constant for a specific place and time, while the “real color” **J** is different for each pixel. Depending on the amount of air the light had to travel through, a fraction (**t**) of the real color is transmitted to the camera, and the remaining portion (1-**t**) is replaced by scattered light.

We can summarize these concepts in a single haze equation:

We need to determine **J**, **t**, and **A**. Let’s first estimate the global air-light color **A**. For a moment we will assume that portions of the image are extremely hazed (no transmission, i.e. **t** = 0). Then we can estimate the color **A** simply from the pixel values of those extremely hazed regions.

On the image below, a mouse click yields **A** = .

However, our assumption that the transmission is zero in the haziest regions is clearly not verified, as we can always distinguish distant objects through the haze. This means that for images where haze is never intense, it is not possible to pick **A** with a click of the mouse, and we have to resort to some image processing to see how we can produce a solid estimation for images with all types of haze.

Let’s say first that it has proven difficult to obtain good dehazing results on our example images when reproducing the ICIP paper’s method for estimating the air-light color. As an alternative method, we estimate the air light color using the concept of dark channel.

The so-called dark channel prior is based on the observation that among natural images, it is almost always the case that within the vicinity of each pixel, one of the three channels (red, green, or blue) is much darker than the others, mainly because of the presence of shadows, dark surfaces, and colorful objects.

If for every pixel at least one channel must be naturally dark, we can assume that where this condition does not hold is due to the presence of scattered light—that is, the hazed region we’re looking for. So we look for a good estimation for **A** intersecting the brightest pixels of our images (maximum haze or illumination) within the region defined by a high value in the dark channel (highest haze).

We extract the positions of the brightest pixels in the dark channel images, extract the corresponding pixel values in the hazed image, and finally cluster these pixel values:

The selected pixels marked in red below will be clustered; here they all belong to a single region, but it may not be the case on other images:

We are looking for the cluster with the highest average luminance:

This is our estimate of the air-light color:

Looking once more at the equation (1), we’ve made some progress, because we are *only* left with computing the transmission **t** and the haze-free pixel value **J** for each pixel:

Since we choose an optimization approach to solve this problem, we first compute coarse estimates, **t0** and **J0**, that will serve as initial conditions for our optimization system.

On to finding a coarse estimate for the transmission **t0**. Here’s the trick and an assumption: If we assume the transmission does not change too much within a small region of the image (that we are calling Ω), we can think of **t0** to be locally constant. Dividing both sides of equation (1) by **A** and applying the local minimum operator *min* both on the color channels and the pixels in each region Ω yields:

But is exactly the definition of the dark channel of the haze-free image **J** and, since **A _{k}** is a positive number, we infer that this term of the equation is practically zero everywhere, given our prior assumption that natural images have at least one almost zero channel in the pixels of any region. Using this simplification yields:

This is the **t0** image. The darker the image area, the hazier it is assumed to be:

Now the real transmission map cannot be that “blocky.” We’ll take care of that in a second. In the ICIP 2013 paper, there is another clever process to make sure we keep a small amount of haze so that the dehazed image still looks natural. This step involves information from the near-infrared image; we describe this step in the companion notebook that you can download at the bottom of this post. Here is an updated transmission map estimate after this step:

To further refine this estimate by removing these unwanted block artifacts, we apply a technique named guided filtering. It is beyond the scope of the blog post to describe the details of a guided filter. Let’s just say that here, the guided filtering of the transmission map **t0** using the original hazed image as a guide allows us, by jointly processing both the filtered image and the guided image, to realign the gradient of **t0** with the gradient of the hazed image—a desired property that was lost due to the blocking artifacts. The function `ImageGuidedFilter` is defined in the companion notebook a the end of this post.

As too much dehazing would not look realistic, and too little dehazing would look too, well, hazed, we adjust the transmission map **t0** by stretching it to run from 0.1 to 0.95:

Thanks to our estimates for the air-light color **A** and the transmission map **t0**, another manipulation of equation (1) gives us the estimate for the dehazed image **J0**:

You can compare with the original image just by positioning your mouse on top of the graphic:

It’s a good start, but a flat subtraction may be too harsh for certain areas of the image or introduce some undesirable artifacts. In this last part, we will use some optimization techniques to try to reconcile this gap and ask for the help of the infrared image to keep a higher level of detail even in the most hazed region.

The key is in the always useful Bayes’ rule for inference. The question we are asking ourselves here is which pair of **t** and **J** is the most likely to produce the observed images ** I_{RGB}** and

In the language of probability, we want to calculate the joint distribution

Using the Bayes’ theorem, we rewrite it as:

And simplify it assuming that the transmission map **t** and the reflectance map **J** are uncorrelated, so their joint probability is simply the product of their individual ones:

In order to write this in a form that can be optimized, we now assume that each probability term is distributed according to:

That is, it peaks in correspondence with the “best candidate” . This allows us to exploit one of the properties of the exponential function—*e ^{-a}e^{-b}e^{-c}*… =

We are now left with the task of finding the “best candidate” for each term, so let’s dig a bit into their individual meaning guided by our knowledge of the problem.

The first term is the probability to have a given RGB image given specific **t** and **J**. As we are working within the framework of equation (1)—the haze model *I*^{RGB} = *Jt* + *A*(1 – *t*)—the natural choice is to pick:

||*I*_{RGB} – *Jt* + *A*(1 – t) ||

The second term relates the color image to the infrared image. We want to leverage the infrared channel for details about the underlying structure, because it is in the infrared image that the small variations are less susceptible to being hidden by haze. We do this by establishing a relationship between the gradients (the 2D derivatives) of the infrared image and the reconstructed image:

||▽*J* – ▽*I*_{NIR}||

This relation should take into account the distance between the scene element and the camera—being more important for higher distances. Therefore we multiply it by a coefficient inversely related to the transmission:

The last two terms are the transmission and reflection map prior probabilities. This corresponds to what we expect to be the most likely values for each pixel before any observation. Since we don’t have any information in this regard, a safe bet is to assume them equal to a constant, and since we don’t care about which constant, we just say that their derivative is zero everywhere, so the corresponding terms are simply:

||▽*t*||

And:

||▽*J*||

Putting all these terms together brings us to the final minimization problem:

Where the regularization coefficients λ_{1,2,3} and the exponents α and β are taken from the ICTP paper.

To resolve this problem, we can insert the initial condition **t0** and **J0**, move a bit around, and see if we are doing better. If that is the case, we can then use the new images (let’s call them **t1** and **J1**) for a second step and calculate **t2** and **J2**. After many iterations, when we feel the new images are not much better than those of the previous step, we stop and extract the final result.

This new image **J** tends to be slightly darker than the original one; in the paper, a technique called tone mapping is applied to correct for this effect, where the channel values are rescaled in a nonlinear fashion to adjust the illumination:

*V’* = *V ^{γ}*

During our experiments, we found instead that we were better off applying the tone mapping first, as it helped during the optimization.

To help us find the correct value for the exponent *γ*, we can look at the difference between the low haze—that is, high transmission—parts of the original image ** I_{RGB}** and the reflectance map

We now implement a simplified version of the steepest descent algorithm to solve the optimization problem of equation (6). The function ` IterativeSolver` is defined in the companion notebook a the end of this post.

When that optimization is done, our final best guess for the amount of haze in the image is:

And finally, you can see the unhazed result below. To compare it with the original, hazy image, just position your mouse on top of the graphics:

We encourage you to download the companion CDF notebook to engage deeper in dehazing experiments.

Let’s now leave the peaceful mountains and the award-winning dehazing method from ICIP 2013 and move to Paris, where ICIP 2014 just took place. Wolfram colleagues staffing our booth at the conference confirmed that dehazing (and air pollution) is still an active research topic. Attending such conferences has proven to be excellent opportunities to demonstrate how the Wolfram Language and *Mathematica* 10 can facilitate research in image processing, from investigation and prototyping to deployment. And we love to interact with experts so we can continue to develop the Wolfram Language in the right direction.

Download this post as a Computable Document Format (CDF) file.

References:

[1] C. Feng, S. Zhuo, X. Zhang, L. Shen, and S. Süsstrunk. “Near-Infrared Guided Color Image Dehazing,” IEEE International Conference on Image Processing, Melbourne, Australia, September 2013 (ICIP 2013).

[2] K. He, J. Sun, X. Tang. “Single Image Haze Removal Using Dark Channel Prior,” IEEE Conference on Computer Vision and Pattern Recognition, Miami, Florida, June 2009 (CVPR’09).

Images taken from:

[3] L. Schaul, C. Fredembach, and S. Süsstrunk. “Color Image Dehazing Using the Near-Infrared,” IEEE International Conference on Image Processing, Cairo, Egypt, November 2009 (ICIP’09).

]]>Professor of Materials Science and Engineering Emeritus, University of Illinois

**Mark Kotanchek**

CEO, Evolved Analytics LLC

**John Michopoulos**

Head of Computational Multiphysics Systems Laboratory, Naval Research Laboratory

**Rodrigo Murta**

Retail Intelligence Manager, St Marche Supermercados

Professor of Mathematics, Randolph-Macon College

**Yves Papegay**

Research Scientist, French National Institute for Research in Computer Science and Control

**Chad Slaughter**

System Architect, Enova Financial

Earlier this year, European Innovator Award winners were announced at the European Wolfram Technology Conference in Frankfurt, Germany:

Associate Professor, Department of Analysis, University of Szeged

Professor, Institute of Earth and Environmental Sciences, University of Potsdam

Congratulations to all of our 2014 Wolfram Innovator Award winners! Read more about our deserving recipients and their accomplishments.

]]>In Tweet-a-Program’s first few exciting months, we’ve already seen a number of awesome fractal examples like these:

To win, tweet your submissions to @WolframTaP by the end of the week (11:59pm PDT on Sunday, November 23). So that you don’t waste valuable code space, we don’t require a hashtag with your submissions. However, we do want you to share your code with your friends by retweeting your results with hashtag #MandelbrotWL.

We can’t wait to see what you come up with!

]]>In previous years, One-Liner submissions were allowed 140 characters and 2D typesetting constructs. This year, in the spirit of Tweet-a-Program, we limited entries to 128-character, *tweetable* Wolfram Language programs. That’s right: we challenged them to write a useful or entertaining program that fits in a single tweet.

And the participants rose to the occasion. Entries were blind-judged by a panel of Wolfram Research developers, who awarded two honorable mentions and first, second, and third prizes.

One honorable mention went to Michael Sollami for his “Mariner Valley Flyby,” which takes you on a flight through the terrain of the Mariner Valley on Mars. The judges were greatly impressed by the idea and the effect. Unfortunately, a small glitch in the program is visible at the start of the output, due to an error in the code. Since Michael’s submission is right up against the 128-character limit, it would have taken some clever tweaking to fix it.

An honorable mention also went to Filip Novotný for a program that rolls a “marble” in the direction that you tilt your laptop. Yeah, yeah, we’ve all seen that before; every laptop has an accelerometer these days. But… Filip’s code doesn’t use the accelerometer. Instead, it tracks the view seen by the laptop camera and infers from it the laptop’s tilt. All in 128 characters.

Filip’s entry was also awarded a dishonorable mention for the impressively dense syntax form a@@@b@c@d@e~f~g@h, which kept his entry under the 128-character limit, and kept the judges busy trying to decipher it.

Second place went to Jesse Friedman for his “Spoonerism Generator.” Each time you evaluate his code, you get a different wacky rendering of Poe’s poem *The Raven*, where the first letters of words that begin with consonants are scrambled.

Incredibly, Jesse claimed one of the few 6-letter Internationalized Resource Identifiers in existence just for the competition, by finding an unusual character (Ｒ) that had not yet been snapped up. At 13 years old, Jesse is the youngest prize winner by far in the One-Liner Competition.

First place went to Alex Hirsbrunner for his “Boeing 767 Flight Range,” a One-Liner entry that actually does something useful. His code makes a world map showing how far a Boeing 767 can fly from the conference location. The judges were impressed by his combined use of `SemanticInterpretation`, to get at obscure information (the range of the aircraft), and `GeoGraphics`, to make a nice presentation of it. He even had enough characters to spare to include a title in the graphic, so the whole thing is self-explanatory.

Thanks to all participants for entertaining us with their abundant creativity. If you have thoughts of attending the Wolfram Technology Conference in 2015, get started now honing your Wolfram Language skills for next year’s One-Liner Competition.

You can see all of this year’s One-Liner entries by downloading this notebook.

]]>If you want to follow along, you can download a trial of *SystemModeler*. It’s also available with a student license, or you can buy a home-use license. All hardware used in this blog post can be bought for less than $50.

After downloading the library from the *SystemModeler* Library Store, installing it as as easy as double-clicking the package and accepting the license agreement. With the library, you can connect any Firmata-capable board to *SystemModeler*. This includes all Arduino boards.

I’m using an Arduino Uno board in this blog post. The easiest example I can think of is blinking the LED that’s on the board.

*Arduino Uno board with internal LED highlighted*

To do this, I construct a simple model with a Boolean signal that’s transmitted to the digital pin 13 on my board, where the LED is. The LED will blink with the same duration as the pulse I send in.

*The model in SystemModeler using the ModelPlug components digital pin and the Arduino board*

Next up I can place an LED on a breadboard and connect it to pin 9 on my Arduino:

*Schematic of connecting an external LED to port 9 on an Arduino Uno Board *

When I connect a sine function to the analog output on my board, the real-valued signal is converted to the voltage needed at the pin where the LED is connected.

*A sine wave connected to a pin dimming an LED*

I can now see how the LED varies between full light and no light with the sine wave. Without a single line of “Arduino coding” on my part!

If we take a step back and look at the big picture, there are basically four different scenarios we can image: one where we connect simulated input to simulated components, one where we connect simulated input to real hardware (as I did in the previous two examples), one where we connect hardware input to simulated components, and a fourth scenario where everything is in hardware.

Below I structured the four scenarios in a grid.

*Different scenarios where the ModelPlug library can be useful*

The first case, where both input and components are modeled, is readily available in an out-of-the-box installation of *SystemModeler*, and for many uses this is all that is needed. That’s why I didn’t highlight it in the grid. When you are connecting hardware input to hardware components, it can be controlled and facilitated from *SystemModeler*, and signals can be filtered and processed in *SystemModeler*. The most interesting scenarios, in my mind, are where some parts are simulated and some parts exist in hardware. Let’s look at some examples where the components are simulated and the input comes from hardware.

Here I set up my Uno board to read the values from pin 14. There I’ve connected a light-dependent resistor that will read light levels.

*Uno board with photoresitor connected to the analog input*

In my model in *SystemModeler*, I now connect this analog signal from pin 14 to another component.

This component will take the values from the light-dependent resistor and compare them to a threshold value. If the value is greater than 0.02 and the simulation time is larger than 5, it will terminate the simulation.

*Compare the light value to a threshold and see if the simulation should be terminated*

If I cover the light-dependent resistor with my hand for a short while, it will get the dark, the values will be lower than 0.02, and the simulation will end.

Here’s an example where both inputs and outputs are hardware: instead of using the signal from the light-sensitive resistor to terminate the simulation, I can filter it with a lowpass filter in *SystemModeler*. I then scale it to turn a servo between -90° and 90°. Whenever I hold my hand over the light-dependent resistor, the servo will turn, and when I release it, the servo will turn back.

*Analog signal connected via software components to a servo*

This page requires that JavaScript be enabled in your browser. Learn how »

The previous example shows you that not only can you connect “hardware to hardware” via ModelPlug, but also how to realize functionality in components defined by equations. Instead of connecting resistors and capacitors in a circuit to lowpass filter the signal, or write a program that filters the signal, I use model components to do the filtering. This enables you to prototype very quickly.

To take this further, I’m going to use a bigger software model and connect it to analog input values. I’ll use a model of an inverted pendulum pushed on a cart. The complete system, including motor, gear, and 3D components for pendulum and the cart, are software models. To this model I’ll connect signals generated by an accelerometer connected to the Arduino. In the model below, the analog signal will enter from the left. The first highpass filter will filter out stationary trends in the signal, and the lowpass filter will smooth out the signal.

At the tip of the pendulum, I’ll connect a force component that will convert the analog signal to a force pushing the top of the pendulum.

*Mechanical system with force attached*

Finally I connect an accelerometer to an analog pin.

*Accelerometer connected to inverted pendulum*

Now when I quickly move my accelerometer, a disturbance is generated and the control system for the pendulum will have to try to adapt. Can you move it fast enough to knock the pendulum down?

This page requires that JavaScript be enabled in your browser. Learn how »

The ModelPlug library is free and available for download from the *SystemModeler* Library Store. It works on Mac, Windows, and Linux. Try it out and let us know what you think, in the comments or in the *SystemModeler* group on Wolfram Community.

Download the *SystemModeler* Modelica (.mo) file via compressed (.zip) file.

To guide us through the computational science of pandemics, I have reached out to Dr. Marco Thiel, who was already describing various Ebola models on Wolfram Community (where readers could join the open discussion). We have worked with him to code the global pandemic model below, a task made considerably easier by many of the new features recently added to the Wolfram Language. Marco is an applied mathematician with training in theoretical physics and dynamical systems. His research was featured on *BBC News*, and due to its applied mathematical nature, concerns very diverse subjects, from the stability of our solar system to patterns in the mating behavior of fireflies to forensic mathematics, and much more. Dealing with this diversity of real-world problems, Marco and his colleagues and students at the University of Aberdeen have made Wolfram technologies part of their daily lives. For example, the core of code for this blog entry was written by India Bruckner, a very bright young pupil from Aberdeen’s St Margaret’s School for Girls, with whom Marco had a summer project.

The current Ebola outbreak “is the deadliest, eclipsing an outbreak in 1976, the year the virus was discovered,” according to *The New York Times*. Its data summary as of October 27, 2014, states that there are at least 18 Ebola patients who have been treated or are being treated in Europe and America, mostly health and aid workers who contracted the virus in West Africa and traveled to their home countries for treatment. The C.D.C. reported in September that a worst-case scenario could exceed a million Ebola cases in four months. There are no FDA-approved drugs or vaccines to defend against the virus, which is fatal in 60 to 90 percent of cases and spreads via contact with infected bodily fluids. Here is the current situation in West Africa in the pandemic locus, according to the numbers from *The New York Times*:

Data Source: *The New York Times*

** Vitaliy**: Marco, do you think mathematical modeling can help stop pandemics?

** Marco**: The recent outbreak of the Ebola virus disease (EVD) has shown how quickly diseases can spread in human populations. This threat is, of course, not limited to EVD; there are many pathogens, such as various types of influenza (H5N1, H7N9, etc.) with the potential to cause a pandemic. Therefore, mathematical modeling of the transmission pathways becomes ever more important. Health officials need to make decisions as to how to counter the threat. There are a large number of scientific publications on the subject, such as the recent

* Vitaliy*: How does one set up a computational model of a spreading disease?

* Marco*: Detailed online models, such as GLEAMviz, are available and can be run by everyone interested in the subject. That particular model contains, just like many other similar models, three main layers: (1) an epidemic model that describes the transmission of the disease in a hypothetical, homogeneous population; (2) population data, that is, distribution of people and population densities; and (3) a mobility layer that describes how people move. I used a similar model that uses the powerful algorithms of

There are many different types of epidemic models. In what follows, we will mainly deal with the so-called **S**usceptible **I**nfected **R**ecovered (SIR) model. It models a population that consists of three compartments: susceptibles can become infected upon contact with the infected; the infected do recover at a certain rate.

To model the outbreak with the Wolfram Language, we need equations describing the number of people in each of these categories as functions of time. We will first use time-discrete equations. If we suppose first that there are only three categories and no interaction between them, we could get the following:

This means that the number, actually the percentage, of Susceptibles/Infected/Recovered at time *t*+1 is the same as at time *t*. Let’s assume that a random contact of an infected with a susceptible leads to a new infection with probability *b*; the probability of a random encounter is proportional to the number of susceptibles (Sus) and also to the number of infected (Inf). This assumption means that people are taken out of the compartment of the susceptibles and go into the infected category.

Next, we assume that people recover with a probability *c*; the recovery is proportional to the sick people; that is, the more who are sick, the more who recover.

We also need initial values for the percentages of people in the respective categories. Note that the “interaction terms” on the right-hand side always add up to zero, so that the overall population size does not change. If we start at initial conditions that add up to one, the population size will always stay one. This is an important feature of the model. Every person has to stay in one of the three compartments; we will take great care to make sure that this is also true for the SIR model on the network that we describe later! There is, however, some flexibility of how we can interpret the three compartments. In our final example we will, for example, consider deaths. It might seem logical to think that these “leave” our population. In order to keep our population constant, which is important for our model, we will then use a simple trick: we will interpret the last group, the Recovered (Rec), as a set that contains the truly recovered and the dead. It is a reasonable assumption that neither the dead nor the recovered infect other people, so they are inert to our model. Our simple assumption will be that a fixed percentage of people of the Rec group will be alive and the remainder will be dead. Hence, we include dead people in our model—so that they don’t actually leave the groups—and we do not consider births. This results in a constant population size.

This is a naive implementation of the SIR model, which allows you to change the parameters:

We use vectors Sus, Inf, and Rec and iterate them. We will later develop a more direct implementation. Note that the parameters *b* and *c* “parametrize” many effects that are at this stage not directly modeled. For example, the infection rate *b* does describe the risk of infection and therefore models things like population density (high density might lead to more infections) and behavior of people (if there are many mass events, that might increase the infection probability—so does schooling!). The recovery rate *c* might describe things like quality of the health care system, availability of physicians, and so on. Later we will try to model some of these effects more directly.

The SIR model might not be the most suitable to describe an Ebola outbreak. It is, however, not too far off either. People get infected by contact; the Recovered category might be interpreted as holding the percentage of people who have either survived or died, if we assume that reinfection is unlikely. In different countries/circumstances, the recovery/death rate might vary substantially—something we will model later explicitly.

A more systematic way of looking at the overall behavior of the SIR model is to study the so-called parameter space. We can represent how different characteristics, like the highest number of infected or the total number of people who get infected in the course of the outbreak, depend on the parameters. The axes of the following diagram show the infection and recovery rates, and the percentage of people who contract the disease during the outbreak is color-coded:

This shows that for small recovery rates and large infection rates, more than 90% of people contract the disease, whereas for large recovery rates and low infection rates, the total percentage of infected is about 5%, which, in fact, equals the initial percentage of infected.

** Vitaliy**: To go from pure mathematical to real-world simulations, we would need data, such as populations and their geographic locations. How could data be accessed?

** Marco**: We will later couple different subpopulations (e.g. airports, cities, countries, etc.) and study the spreading of the disease among these. Each subpopulation is described by an SIR model. When we start coupling the subpopulations, their individual sizes will play a crucial role. Population data, like many other types of data, is built right into

We will use built-in data to improve our model toward the end, but for a start we could use the international network of all airports to model the transport of the disease. We first need a list of all airports and all flight connections. On the website openflights.org you will find all the data we need. I saved the file “airports.dat” and the file “routes.dat.”

** Vitaliy**: We could use the latest Semantic Data Import feature to interface with external data.

** Marco**: Yes, indeed.

** Vitaliy**: Yellow-framed entries are semantically processed as

So we notice that `SemanticImport` automatically classified the third and fourth columns as cities and countries and converted them to `Entity`, which is the built-in data representation in the Wolfram Language.

** Marco**: We can now plot all airports worldwide.

** Vitaliy**: Indeed, with the new functionality

The fifth column in **airports** is a three-letter IATA airport code. We will need this airport identification code because it identifies connecting routes between airports in the second dataset. Not all data entries have it; for example, here are the last 100 cases:

Some of these are also false because they have numbers. We will clean the data by removing entries with no IATA valid code. Here are the original entries:

We will retain cleaned-up rows in the total amount that follows:

** Marco**: Next, we create a list of rules for all airport IDs and their coordinates:

** Vitaliy**: We used the

* Marco*: Now we can calculate the connections:

Not every IATA code has geo coordinates. Let’s clean out those with missing data:

Out of a total of 67,210, we will plot just 15,000 random routes, which reflects well on the full picture:

* Vitaliy*: Once we have the data, how can it be integrated with mathematical models?

** Marco**: We need to describe the mobility pattern of the population. We will use the global air transport network to build a first model of a pandemic. We think of the flights as connections between different areas. These areas could be interpreted as “catchment areas” of the respective airports.

* Vitaliy*: We could make use of the

In the resulting graph, vertices are given by IATA codes.

As we can see below, there are several disconnected components that are negligible due to relatively small size. We will discard them, as they will not significantly alter any dynamics:

From our graph, we can construct the adjacency matrix:

* Marco*: The (

Now comes a tricky step. We need to define the parameters for our model. The problem is that each of them “parametrizes” many effects and situations. For our model, we need:

- The probability of infection. This is a factor that determines in our model how likely it is that a contact leads to an infection. This factor will certainly depend on many things: population density, local behavior of people, health education, and so on. To get our modeling started, we will choose the following for all airports:

- The rate of recovery. The rate of recovery will very much depend on the type of disease and the quality of the health system. It will also depend on whether everyone has access to health insurance. In countries where only a fraction of the population has access to high-quality health care, diseases will generally spread faster. For our initial modeling, we will set the following for all airports:

With these two parameters, we have the epidemic model determined. But we still need one more parameter.

- Migration factor. It is a factor of proportionality that describes the propensity of a certain population (in the catchment area of an airport) to travel. In this model, we have taken it to be constant, but it would certainly also depend on the financial situation in that country and other factors. It describes, roughly speaking, the percentage of people in a catchment area/country who travel. We do not use a multi-agent model where the movement of individuals is described. We use a compartmental population model where, in fact, we describe percentages of the population who travel. We do (at least later in the post, for the country-based simulation) take the different population size into consideration, and in the form of the multigraph, also how many flights there are from country to country. Using the coupling matrix, we will not introduce a migration of individuals between different airports. We will first use a general migration factor (same for all), which we set to the following:

This is a strong assumption. We will use—at first—the same migration factor for all three categories: susceptibles, infected, and recovered. In reality, the infected, particularly in the infectious phase where they might have developed symptoms, will probably have a different mobility pattern. Also, the mobility will be different in different countries, and it will depend on the distance traveled as well. We will later choose more realistic parameters.

We next initialize our model and assume that at first there are only susceptibles in all cities and no infected or recovered:

Now we should introduce 5% infected to the catchment area of the original airport.

** Vitaliy**: The outbreak began in Guinea in December 2013, then spread to Liberia and Sierra Leone. According to

We choose the CKY code for the Conakry International Airport in the capital and the largest city and compute its index:

* Marco*: Before we write down the SIR equations with the coupling terms, we introduce two objects:

This is the total number of (potential) passengers at each airport. We condense the coupling matrix into a smaller list that for each airport only contains a list of connected airports:

This is very useful, because the coupling matrix is sparse and `sumind` will speed up the calculation dramatically. Now we can write down the SIR equations:

The coupling terms are highlighted in orange. Basically, we calculate a weighted average over the number of people in each compartment for all neighboring airports. There are many other types of coupling that one could choose.

Next we iterate the equations and calculate the time course:

To get a first impression, we can plot the **S**, **I**, and **R** curves for some 200 airports:

Next, we calculate the maximal number of sick people at any of the airports:

We will generate a list of airport coordinates ordered as vertices of our graph:

We can now look at a couple of frames:

Time progresses by columns from top to bottom and between the columns from left to right. Here and below in a similar simulation graphic, the color codes the number of infected people.

Note that there are three main regions: Europe, which gets infected first, then the Americas and Asia. This kind of spreading is related to the network structure. We can try to visualize this with Graph.

** Vitaliy**: The clustering algorithms of

We’ll discard Antarctica and build a database denoting which airport codes belong to which continent:

This is a function that can tell what continent a particular code belongs to:

For instance:

There are many differently sized communities in our network:

Communities are clusters of many flights joining airports of the same community compared to few flights joining airports of different communities. Not to overcrowd the plot with labels, let’s label only those communities whose size is greater than 60, based on the largest fraction of airport codes belonging to the labeling continent:

Now we can visualize the network structure and see how major hubs enable transportation to smaller ones. In this particular plot, colors just differentiate between different communities and do not represent infected people:

** Marco**: The three dominating communities become apparent. The graphic also shows via which of the main communities smaller groups are infected. This could have implications for decisions about preventative measures. We can now generate a graph that is similar to one presented in Dirk Brockmann’s paper:

* Vitaliy*: Again, time progresses by columns from top to bottom and between the columns from left to right. We used “

* Marco*: In the center of the representation, we find the airport where the outbreak started. The layer around it represents all the airports that can be reached with direct connections; they are the first to be hit. The next layer is the airports that can be reached with one connection flight; they are the next to be infected, and so on. This shows that the structure of the network, rather than geographical distance, is important.

* Vitaliy*: How can we make this model a bit more realistic?

** Marco**: We have studied some simple, prototypical cases of a model that locally describes the outbreak of a disease with an SIR model and then couples many SIR models based on connections of some transport network. However, there are many problems that still would have to be addressed:

- The probability of transmission will depend on many factors, for example, the health system and the population density in a region.
- The recovery rate will also depend on many factors, for example, the health system.
- Not all possible links (roads/flight trajectories) will be taken with equal probability. Published papers suggest that there is a power law: the longer the distance, the less likely someone is to travel.
- The migration/traveling rate will depend on the categories susceptibles, infected, recovered; sick people are less likely to travel.

Also we might want to be able to model different attempts by governments to control the outbreak. So let’s try to address some of these issues and generate another, global model of an Ebola outbreak. If we wanted to model all cities and all airports worldwide, that would probably be asking too much from an ordinary laptop. Based on the sections above, it should be clear how to extend the model, but I want to highlight some further approaches that might be useful.

Our model will represent all (or most) countries worldwide. The connections among them will be modeled based on the respective flight connections. To start with, we will collect some data for our model.

As before, we will import the airport data and the flight connections:

The `SemanticImport` function directly recognizes the countries that the airports belong to. We can easily construct a list of airports-to-country data:

This time we construct the graph using the `Graph` command. As we want to study the connections between countries, we substitute the airports by the respective countries:

** Vitaliy**: Are you saying that the graph connections indicate country adjacencies rather than airport connections?

* Marco*: The connections are generated by the flights; that is, the flights’ paths are the edges. They go from airport to airport, but we are actually interested in modeling the connections between countries. Therefore, we substitute (identify) the airports with their respective countries. Formally, you can think of this as constructing the network of all airport connections and then identifying all airports (nodes) that correspond to the same country. We could say that the mathematical term for that is vertex identification.

It turns out that data on some airports is missing; most of the missing data is from very small airports. We delete all airports for which the country is not known:

Last but not least, we construct the coupling/adjacency matrix:

We can also get the list of all countries that will form part of our model:

It is clear that if the population density is higher, that might lead to a higher infection rate; this is, of course, quite an assumption, because what is important is mostly local population densities. If the country is very large, but everybody lives in one city, the effective density is much higher.

To take the population into consideration, we will need data on the population size and density for all countries:

Now we will build a simple “model” for how the population density might (partially) determine the infection rate. In the first model we used an infection rate of *ρ*=0.2, and that gave “reasonable” results, in the sense that we saw a pattern of the spreading that was as expected. We want to extend the model, without completely modifying the parameter range. So it is “reasonable”—as a first guess—to choose the parameters for the extended model in the same range. As a matter of fact, we observe that the crucial thing is the ratio of *λ* and *ρ*. So basically we are saying that we want to start from about the same ratio as in the first simulation.

To modify the infection rate with respect to the population density, we look first at the histogram of the population densities:

We can also calculate the median:

We will make the assumption that the infection rate will increase with the population density. For the “median population density” we wish to obtain an infection rate of 0.2. We will make the bold assumption that the relationship is given by the following chart:

We can calculate the infection rates for all countries:

Of course, this “model” for the dependence of the infection rate on the population density is very crude. I invite the reader to improve that model!

** Vitaliy**: It would be really interesting to know some typical laws, based on real data, about how infection rate varies with population density. It is great to have this law as a changeable condition so that readers can try their versions of it and observe change in the simulations. Percolation phenomena perhaps should be considered, where there is a sudden steep change in infection rate when population density reaches some critical value. There could also be some saturation behavior due to a city’s infrastructure. But I am just guessing here. Would you please share your own take on this?

** Marco**: If there are more people in a confined space, the infection rate should increase monotonically. It also appears to make sense that the infection rate does not increase linearly, but that its derivative decreases (perhaps even saturates). The infection rate should somewhat depend on the distance to the next neighbor, that is, go with the square root of the density, which would lead to an exponent of 0.5. But then the movement of individual people would be blocked by their neighbors, effectively decreasing the slope, so we would need an exponent smaller than 0.5. Well that was at least my thought. The article “The Scaling of Contact Rates with Population Density for the Infectious Disease Models” by Hu et al., 2013, shows that our assumptions are reasonable—at least for this crude model.

To estimate the recovery rate, we will make another very daring assumption. We will assume that the health system is better in countries where the life expectancy is higher. To build this submodel, we will make use of the life expectancy data built into *Mathematica*:

We can visualize the distribution of life expectancies for all countries in a histogram:

Note that there are a few countries that are lacking this data. We set their life expectancies to a typical 70; this will not influence the results, because these countries (very small countries, i.e. islands, and Antarctica) will not play a role in our model.

The median life expectancy:

Now, like before, we will formulate our assumption that life expectancy is a proxy (approximate indicator) for the quality of the health system, which is a proxy for the recovery rate. This assumption is again quite crude, because cases in relatively rich countries like Spain and the US suggest that recovery rates might not be substantially higher in wealthy countries:

As with the infection rates, we calculate the recovery rates for all countries:

We can also represent what that means for all countries; the median values are marked in white and blue, while the parity of the infection rate and the recovery rate is indicated by the black dashed line. The background color represents the percentage of people who contract the disease at any time during the outbreak, as derived from the SIR model (see above):

** Vitaliy**: A typical modeling process entails running the simulation many times, trying to understand the system behavior and the influence of parameter values. Our particular values will result in some common-sense final outcome for what we can imagine will happen fighting a very tough but survivable pandemic. Countries with weak economies will suffer the highest damage, while their counterparts should count smaller losses. What is important here, I think, is to understand how that contrast depends on mobility network topology, demographics, and other real-world factors. We also should expect that these complex components and nonlinearity could result in counterintuitive behavior sometimes favoring the weak and damaging the strong.

Marco, could you please explain in greater detail the plot above?

** Marco**: The plot above shows the main two parameters of our model: infection versus recovery rate. Each point denotes a country—hovering over them will display the name. The white and green lines indicate the median values. They cut the diagram into four areas. The upper-left box (high infection rate/low recovery rate) contains countries that have the most challenging conditions. There are countries such as Sierra Leone, Nigeria, and Bangladesh in it. In the lower right (low infection rate/high recovery rate) are the countries that have the best prospects of containing the disease: United States, Canada, and Sweden. The black dashed line indicates a critical separator: above the line there are countries where the infection rate outpaces the recovery rate. In those countries an outbreak is very likely. Below the line the recovery rate dominates the infection rate, and the chances are that the disease can be contained. Note that these are just “base parameters”; they do not take the network into consideration yet. Note that countries below the dashed line can block the spread of the disease. If there are enough of them, that again will decrease the number of casualties.

** Vitaliy**: As we have mentioned before, our simulation is significantly exaggerated to amplify and see clearly various effects of a pandemic. This is why we intentionally shifted the system above the black dashed line. Our readers are welcome to consider more realistic parameter values.

** Marco**: To address another flaw of our first models, we will assume that susceptibles, infected, and recovered travel at different rates. There is one additional point to consider. If we set the migration/movement rates too high, regional differences will average out! If susceptibles, infected, and recovered all travel (fast enough), the populations of different countries mix over the periods of the simulation, so that we just see an average. In reality, the percentages of people traveling will be relatively low over the simulation period, so they will nearly be zero. In order to see the disease spreading, we will set relatively high rates for the movement of the infected; we will ignore the traveling of the healthy and recovered. This is not a bad assumption, because they do not contribute to the spreading of the disease anyway:

Let’s clear the variables to make sure that we start with a clean slate:

As before, we initiate all countries to have purely susceptible populations:

The recent outbreak started in Sierra Leone, so we will need to find out which vertex models that country:

Next we set a low number of infected people in that country:

We then iterate as before:

Now we can actually iterate:

This is a first test of whether the simulation worked—we plot time series for all countries:

Note that at the end of the simulation, most of the lines have saturated, indicating that the outbreak has come to a standstill. Here is also a time course of the total sick and deaths over the course of the outbreak:

We can now visualize the global spread of Ebola with the help of a few time frames:

Note that the number of dead people is extremely high in all of our simulations. About three billion dead people at the end would be an absolutely devastating outbreak, most likely much, much larger than what we are currently observing. This might be due to the choice of parameters; the ratio of the infection rate to the recovery rate might be different. On the other hand, we model an outbreak where the systems are relatively “inert,” meaning that they do not take efficient countermeasures. If we were to model that, we would need to decrease the infection rate over time, due to the actions of the governments.

The model is indeed just a conceptual model at this point. The reader can change certain features, that is, close down airports, and see what the effect is. He/she can change infection and recovery rates and see how that influences the outcome. There are many things you can play with. (A really bad infection rate/recovery rate ratio might be important if Ebola should mutate and become airborne, for example.) We have not (yet) tried to choose the parameters so that they optimally describe the current outbreak.

** Vitaliy**: I run our model many times for different parameter values. For example, in an opposite to the above situation, when recovery is greater than infection rate and the initial values are smaller:

We get much lower infected and sick numbers:

Also note different peak behavior, which tells us that pandemics can relapse—and with greater potential casualties. *The New York Times* data shows such relapses in the section “How Does This Compare to Past Outbreaks?” Also the C.D.C. projection and current stage of pandemic are probably an approximate exponential increase matching the very beginning (left side) of peaks in the graphs.

Can we calibrate our model by real data, for instance, time scale and absolute values of infection and recovery rates?

** Marco**: There are many papers that study effects of, say, population density on infection rates, such as the one mentioned above by Hu et al. We can use models or observed data to calibrate our model.

** Vitaliy**: The

** Marco**: The most important factor is that the high migration rates in the first models lead to a mixing of the populations so that at the end, all populations have the same rates of people in each of the three categories. Just like in Dirk Brockmann’s paper, we see that the network is very important for the spreading of the disease. The distance on the graph is important. This is particularly so because countries for which the recovery rate is higher than the infection rate basically block the disease from spreading.

** Vitaliy**: Marco, I thank you very much for your inspiring insights and time. Do you have some ideas how we could improve the model further or any other concluding remarks for our readers?

** Marco**: It is very easy to come up with more ideas about how to improve the model: we could include further transport mechanisms, that is, streets/cars, boats, train networks, etc. Each would have an effect on how people move. We could use mobile phone data or other data to better model how people actually move. Also, the SIR model does not take into consideration the incubation time, that is, the time from infection until you show the first symptoms. Our model describes populations in compartments; numbers of infected etc. are given in percentages. This is only a valid approximation if the number of infected is large enough (like a concentration given as a real number is a good idea if there are lots of molecules!), but particularly at the beginning of an outbreak, when there are few infected, other types of model such as a multi-agent model might be more realistic. The model we have presented should only be taken as a first step; more effects can be included and their relevance can be studied. By trial and error, we can try to describe the real outbreak better and better. But we also need to be careful: the more effects we include in our model, the more parameters we will get. That can lead to all sorts of problems when we need to estimate them.

We may note that the simulation does not correctly describe the cases in the US and Europe. Currently we know that 18 people were brought there on purpose. They are not at all part of the natural propagation of the disease. Also, we are speaking about “percents or more” of a population, we are not even discussing individual cases. Epidemiologically speaking, there is nothing going on in the US or Spain at the moment regarding Ebola. Sierra Leone and other countries in Africa do get into a regime where the model is “more valid.”

The model is not specific to the Ebola epidemic. There is a lot of estimation and guesswork, and we choose the parameters so that the epidemic spreading becomes clear, but the model shows a potential order of the spreading: first mainly western/central African countries, then Europe, then the US. This is very reasonable and coincides with much more complicated models. Of course, there are many other effects, such as that the spreading between neighboring countries in Africa will probably not be via flights, but rather via very local transport, that is, people crossing borders by foot/car or similar. Our model is certainly conceptual in so far as it only considers one way of transport, namely flights, which is not the full picture, of course, but it does show that there are dramatic differences between countries to be expected. In fact, we would expect a much lower percentage of casualties in Europe and the US than in the countries in Africa were everything originated. The model shows that the highest probability of spreading is between neighboring African countries, which is what larger models predict as well. It also says that certain countries in Europe, such as Germany/UK/France etc. are more at risk than others because of their flight connections. The US would be less at risk than the European countries, that is, it would get significant numbers of infected later. All of that seems to be qualitatively quite correct. Australia and Greenland would get the disease very late, or not at all, again in agreement with our model (well, at least if rho ~ lambda ~ 0.1).

In that sense I suppose that it is more realistic to chose rho equal or smaller than lambda. That will mostly limit the spreading to some African countries with low probabilities of infections in Europe and the US. Even though the “global average” infection rate could be even lower than the recovery rate, at least in some countries, due to population density and (lack of) quality of health systems, the local infection rate might be higher than the recovery rate. Also, the number of infected will obviously be lower than what our model predicts, as our model does not take countermeasures by governments or the WHO into consideration. That is easy to model by reducing the infection rate over time, more in rich countries, less in poor countries.

The predictions have to be seen as probabilities or risk that certain countries experience large natural outbreaks, that is, our model does not consider individual cases or cases that are transported to hospitals on purpose. Also, the infection rate is certainly very different for people who work in hospitals with patients. That means that the probability for a nurse contracting Ebola is much higher than for the average person, and of course we do not model that. Also the precise propagation of an epidemic depends on many “random” factors, which cannot be reasonably taken into account in this or any other model. One can introduce random factors and run the model various times to make predictions about likely scenarios, or probabilities. In that sense, our model performs actually qualitatively very well.

The conceptual nature of the model allows us to look at different scenarios: What if infection rate versus recovery rate changes? What if there is more mobility, that is, mu changes? What if we also use local transport etc.? We did not try to fit the parameters/network etc. to optimally describe the Ebola outbreak; rather, we provide the basic model to develop several scenarios and to understand the basic ingredients for this type of modeling.

** Vitaliy**: I once again thank Marco and our readers for being part of this interview and invite everyone to participate in the related Wolfram Community thread where Marco and others discuss and share related ideas, code, and in-depth studies.

The conference kicked off with a keynote by Stephen Wolfram, and then rolled right into the other 125 scheduled talks. Also featured were a connected devices playground, “Meet-Ups,” social/networking events, small group meetings, roundtable discussions, and tours of Wolfram and the Blue Waters supercomputer at the University of Illinois.

Attendees came from 20 countries spanning 6 continents, and 80 of them were at the conference for the first time. Topics covered every industry/specialty, including, but not limited to, engineering, finance, computer science, physics, astronomy, mathematics, image processing, robotics, and even quilting.

Wolfram developers and attendees alike gave talks and workshops, illustrating not only how the technology was built, but also some of the diverse applications our users have implemented in their professional fields. Their inspiring and innovative projects ranged from a corporate search engine using the Wolfram Language to stitch-coding and movie color maps; they demonstrated everything from integrating *Mathematica* and the Unity Game Engine and creating online courses to high-frequency training, connected devices, and embedding code.

Numerous possibilities with our technology were showcased, demonstrating how devices can be integrated with *Mathematica*, Wolfram|Alpha, and the Wolfram Language to perform real-time data analysis. In the connected devices playground, multiple devices were set up for attendees to interact with, including two Raspberry Pis with breadboards and LEDs, an 8Cube from hypnocube, two Tinkerforge Weather Stations hooked up to Raspberry Pis, a Sphero 2.0, an Intel Edison, an Arduino, and Electric Imps connected to a temperature/humidity sensor and a heart rate monitor.

We also brought back a favorite—the One-Liner competition. Attendees were tasked with thinking up “amazing things” using just one tweet of Wolfram Language code. Stay tuned for the announcement of the results and the winners of the 2014 Innovator Awards!

]]>Take some inspiration from these examples, while you come up with your creepy codes:

In order to win, your Halloween-themed submissions must be tweeted to us before the clock strikes midnight, Pacific time, on All Hallows’ Eve (11:59pm PDT, Friday, October 31). So you don’t waste needed code space, no hashtag is required with your original submission, but we encourage you to share your results by retweeting them with hashtag #SpookyCode.

We’re excited about the possibilities—just keep an eye on your creation and make sure that it doesn’t… come alive!

]]>This annual Summit offers an exclusive group of thought leaders an opportunity to meet and share insights into new and ongoing projects. But in light of the high caliber and exceptionally broad interest of this year’s presentations—and for the first time ever—we are sharing videos of the Summit presentations with the public, including the keynote from Stephen Wolfram, CEO of Wolfram Research and creator of Wolfram|Alpha.

For more information on next year’s event or to apply for an invitation, please visit the Wolfram Data Summit website!

]]>A century ago,

Martin Gardner was born in Oklahoma.

He philosophized for his diploma.

He wrote on Hex and Tic-Tac-Toe.

The Icosian game and polyomino.

Flexagons from paper trim,

Samuel Loyd, the game of Nim.

Digital roots and Soma stairs,

mazes, logic, magic squares.

Squaring squares, the golden Phi.

Solved the spider and the fly.

Packing circles (with corrections),

ellipses, pi, and conic sections.