February 21, 2014 — Wolfram Blog Team

*Editorial note: This post was written by Paul-Jean Letourneau as a follow-up to his post Mathematica Gets Big Data with HadoopLink*.

In my previous blog post I described how to write MapReduce algorithms in *Mathematica* using the *HadoopLink* package. Now let’s go a little deeper and write a more serious MapReduce algorithm.

I’ve blogged in the past about some of the cool genomics features in Wolfram|Alpha. You can even search the human genome for DNA sequences you’re interested in. Biologists often need to search for the locations of DNA fragments they find in the lab, in order to know what animal the fragment belongs to, or what chromosome it’s from. Let’s use *HadoopLink* to build a genome search engine!

July 31, 2013 — Paul-Jean Letourneau, Senior Data Scientist, Wolfram Research

*HadoopLink* is a package that lets you write MapReduce programs in *Mathematica* and run them on your Hadoop cluster.

July 15, 2013 — Matthias Odisio, Mathematica Algorithm R&D

Or: How I Learned to Watch the Best Movies in the Best Way

I remember when I lived across the street from an art movie theater called Le Club, looking at the movie posters on my way back home was often enough to get me in the ticket line. The director or main actors would ring a bell, or a close friend had recommended the title. Sometimes the poster alone would be appealing enough to lure me in. Even today there are still occasions when I make decisions from limited visual information, like when flipping through movie kiosks, TV guides, or a stack of DVDs written in languages I can’t read.

So how can *Mathematica* help? We’ll take a look at the top 250 movies rated on IMDb. Based on their posters and genres, how can one create a program that suggests which movies to see? What is the best way to see the most popular movies in sequence?

May 22, 2013 — Jon McLoone, International Business & Strategic Development

The benefits of linking from *Mathematica* to other languages and tools differ from case to case. But unusually, in the case of the new *RLink* in *Mathematica* 9, I think the benefits have very little to do with R, the language. The real benefit, I believe, is in the connection it makes to the R community.

When we first added the *MathLink* libraries for C, there were real benefits in farming out intensive numerical work (though *Mathematica* performance improvements over the years and development of the compiler have greatly reduced the occasions where that would be worth the effort). Creating an Excel link added an alternative interface paradigm to *Mathematica* that wasn’t available in the *Mathematica* front end. But in the case of R, it isn’t immediately obvious that it does many things that you can’t already do in *Mathematica* or many that it does significantly better.

However, with *RLink* I now have immediate access to the work of the R community through the add-on libraries that they have created to extend R into their field. A great zoo of these free libraries fill out thousands of niches–sometimes popular, sometimes obscure–but lots of them. There are over 4,000 right here and more elsewhere. At a stroke, all of them are made immediately available to the *Mathematica* environment, interpreted through the R language runtime.

May 9, 2013 — Matthias Odisio, Mathematica Algorithm R&D

Detecting skin in images can be quite useful: it is one of the primary steps for various sophisticated systems aimed at detecting people, recognizing gestures, detecting faces, content-based filtering, and more. In spite of this host of applications, when I decided to develop a skin detector, my main motivation lay elsewhere. The research and development department I work in at Wolfram Research just underwent a gentle reorganization. With my colleagues who work on probability and statistics becoming closer neighbors, I felt like developing a small application that would make use of both *Mathematica*‘s image processing and statistics features; skin detection just came to my mind.

Skin tones and appearances vary, and so do flavors of skin detectors. The detector I wanted to develop is based on probabilistic models of pixel colors. For each pixel of an image given as input, the skin detector provides a probability that the pixel color belongs to a skin region.

April 24, 2013 — Stephen Wolfram

More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes.

A few weeks ago we decided to start analyzing all this data. And I have to say that if nothing else it’s been a terrific example of the power of *Mathematica* and the Wolfram Language for doing data science. (It’ll also be good fodder for the Data Science course I’m starting to create.)

We’d always planned to use the data we collect to enhance our Personal Analytics system. But I couldn’t resist also trying to do some basic science with it.

I’ve always been interested in people and the trajectories of their lives. But I’ve never been able to combine that with my interest in science. Until now. And it’s been quite a thrill over the past few weeks to see the results we’ve been able to get. Sometimes confirming impressions I’ve had; sometimes showing things I never would have guessed. And all along reminding me of phenomena I’ve studied scientifically in *A New Kind of Science*.

So what does the data look like? Here are the social networks of a few Data Donors—with clusters of friends given different colors. (Anyone can find their own network using Wolfram|Alpha—or the `SocialMediaData`

function in *Mathematica*.)

March 5, 2013 — Wolfram Blog Team

Using *Mathematica* and other Wolfram technologies, Joseph Hirl, founder of Agilis Energy, has developed a new approach to energy analytics that is helping building owners and energy equipment suppliers around the world cut energy consumption and costs.

At the core of the company’s success is its *Mathematica*-based dynamic energy analysis application, which gives the full picture of a building’s performance, measures the impact of potential operational changes, and quantifies the results. About *Mathematica*‘s role in the development of the tool and the Agilis business, Hirl says, “The flexibility of *Mathematica* is tremendous. Our ability to build and develop this program with a lean staff has allowed us to build out a substantial business.”

The application, which has now been used at more than 800 sites in at least 12 different industries, begins with data streams, including high-interval smart meter data as well as *Mathematica*‘s built-in `WeatherData`. It then applies sophisticated statistics and dynamic visualization functionality to generate what Hirl calls an “MRI of a building,” a dynamic interface with a simulation of the building’s energy use and demand and forecasting and benchmarking tools.

February 4, 2013 — Oleksandr Pavlyk, Kernel Technology

On January 23, 1913 of the Julian calendar, Andrey A. Markov presented for the Royal Academy of Sciences in St. Petersburg his analysis of Pushkin’s *Eugene Onegin*. He found that the sequence of consonants and vowels in the text could be well described as a random sequence, where the likely category of a letter depended only on the category of the previous or previous two letters.

At the time, the Russian Empire was using the Julian calendar. The 100th anniversary of the celebrated presentation is actually February 5, 2013, in the now used Gregorian calendar.

To perform his analysis, Markov invented what are now known as “Markov chains,” which can be represented as probabilistic state diagrams where the transitions between states are labeled with the probabilities of their occurrences.

January 8, 2013 — Jeffrey Bryant, Scientific Information Group

The physics involved in simulating galaxy collisions can be extremely complex. The most accurate simulations take into account not just points representing stars, but also magnetic fields and invisible dark matter, as well as *n*-body interactions allowing the individual stars to interact with each other. These complex simulations are usually carried out on large-scale supercomputers over long periods of time. One of the more interesting aspects of galaxy collisions is that they can create density variations resulting in all kinds of emergent structure. Density waves can develop that lead to star formation from compressed gas clouds.

A couple of years ago, I wrote a Demonstration that provides a simplified solution to galaxy collisions. This Demonstration is designed to run in real time inside a `Manipulate`, so the problem has been simplified by removing *n*-body interactions, dark matter, magnetic fields, and so on. Basically, it treats the two galaxies as large point masses with lots of massless test particles orbiting them. The test particles respond only to the two galaxy “centers.” In a real galaxy collision, the chances of two stars getting close enough to each other to interact directly is very remote, so it’s not too far of a stretch to ignore this effect for a first-order approximation. The more stars that are included in the simulation (by minimizing the star separation parameter), the more intricate the results (and the more computationally intense). In fact, as more stars are added, it becomes easier to see density variations where many test masses cluster together, but it still looks very discrete. Real galaxies, like the Milky Way, can have hundreds of billions of stars. Trying to carry out a point simulation with that many stars becomes a bit taxing on most home systems, and definitely exceeds the time constraints of a real-time dynamic tool like `Manipulate`. So how can we better visualize these density variations? I decided to try to modify my Demonstration to use one of the new features in *Mathematica* 9, namely volumetric rendering. This way, we can simulate the galaxy collisions with fewer numbers of points, but render the results as if there were billions of stars, resulting in a more realistic and informative visualization.

December 6, 2012 — Stephen Wolfram

There aren’t very many qualitatively different types of computer interfaces in use in the world today. But with the release of *Mathematica* 9 I think we have the first truly practical example of a new kind—the computed predictive interface.

If one’s dealing with a system that has a small fixed set of possible actions or inputs, one can typically build an interface out of elements like menus or forms. But if one has a more open-ended system, one typically has to define some kind of language. Usually this will be basically textual (as it is for the most part for *Mathematica*); sometimes it may be visual (as for Wolfram *SystemModeler*).

The challenge is then to make the language broad and powerful, while keeping it as easy as possible for humans to write and understand. And as a committed computer language designer for the past 30+ years, I have devoted an immense amount of effort to this.

But with Wolfram|Alpha I had a different idea. Don’t try to define the best possible artificial computer language, that humans then have to learn. Instead, use natural language, just like humans do among themselves, and then have the computer do its best to understand this. At first, it was not at all clear that such an approach was going to work. But one of the big things we’ve learned from Wolfram|Alpha is with enough effort (and enough built-in knowledge), it can. And indeed two years ago in *Mathematica* 8 we used what we’d done with Wolfram|Alpha to add to *Mathematica* the capability of taking free-form natural language input, and automatically generating from it precise *Mathematica* language code.

But let’s say one’s just got some output from *Mathematica*. What should one do next? One may know the appropriate *Mathematica* language input to give. Or at least one may be able to express what one wants to do in free-form natural language. But in both cases there’s a kind of creative act required: starting from nothing one has figure out what to say.

So can we make this easier? The answer, I think, is yes. And that’s what we’ve now done with the Predictive Interface in *Mathematica* 9.

The concept of the Predictive Interface is to take what you’ve done so far, and from it predict a few possibilities for what you’re likely to want to do next.