Wolfram Computation Meets Knowledge

Computable Data Functions: A Crazy Idea That Just Works

Sometime rather alarmingly late in the Mathematica 6 release cycle it started to emerge that Stephen had a bunch of people working on an insane idea: including in Version 6 an entirely new set of features never before considered and definitely not on the release plan. Somehow this didn’t surprise anyone.

It was to be a system whereby people could access large amounts of useful data by way of simple function calls inside Mathematica, with those calls automatically going off to our servers to get updated information, or even real-time feeds like current stock prices. Needless to say, none of the server- or client-side technology to make this possible existed, but hey, it sounded like a good idea.

It turned out to be a very good idea.

As discussed in my previous blog post about building interfaces in five minutes, Mathematica is a great system for building interfaces in five minutes (or less). An example like this is marvelously simple:

Instant dynamic interactivity

But there’s no depth to it, and frankly once you’ve seen one interactive trig function, you’ve seen ’em all. It’s a great technology demonstration, and you can clearly see where you might go with it to make something really useful, but it is not, in and of itself, really all that great.

The problem is that to add richness and real-world texture to an example generally requires a lot of stuff—either a lot of code or a lot of data—and how can you do that in five minutes? This is one of the problems that data functions solve. (Data functions are also very useful for more serious work, but I’m concentrating here on what you can do in five minutes or less.)

Suppose that instead of having the slider move a sine function around, you want to have it flip through the historical charts of GDP for all the NATO countries. How many lines of code would that take?

The answer is two:

Manipulate[DateListPlot[CountryData[c, {{"GDP"}, {1970, 2005}}], PlotLabel -> c],
{c, CountryData["NATO"], ControlType -> Slider}]

Country Data

That’s a lot of richness and depth for two lines of code. You might even learn something.

This example is possible because we have a data function, called CountryData, that lets you get a large fraction of the available quantitative data about all the world’s countries, instantly. The key idea—the brilliance of it—is to stop thinking about this data as something in a table, or a database, and start thinking about it just like any other mathematical function. Instead of “sine of x,” it’s “GDP of country.”

CountryData["France", "GDP"]
2.12658x1012

CountryData["France", "GDP", "Units"]
USDollarsPerYear

You can, if you like, think of this as a call into a database, with the arguments being the equivalent of the query language. But many of the problems inherent in things like SQL go away when you have a powerful symbolic language, rather than a rigidly structured query syntax, to use inside the queries. And even more so when you have a powerful symbolic language to wrap around the queries to arrange and refine the data.

It’s really more productive to forget about the database aspect of things and just think of data functions like any other Mathematica functions that return answers, and that can be used just like Table, Integrate or Factor. In particular, don’t worry about calling the data functions many times: they cache cleverly and are very fast after the initial download or update.

Another example: We’ve got a data function for chemical compounds. Here’s the call you need to get the names of all the compounds we know about that contain the strings “chloro” followed by “hex” in their name:

ChemicalData["*chloro*hex*"]

{Hexachlorocyclohexane(MixedIsomers), Dichlorobis(2-Chlorocyclohexyl)Selenium,
Dichlorobis(2-Ethoxycyclohexyl)Selenium, Trichloro[2-(3-Cyclohexen-1-yl)Ethyl]silane,
Trichloro(Cyclohexylmethyl)Silane, Alpha-Hexachlorocyclohexane, Beta-Hexachlorocyclohexane,
Delta-Hexachlorocyclohexane, 1,6-Dichlorohexane, 1,5-Dichlorohexamethyltrisiloxane,
Trans-1,2-Dichlorocyclohexane, 2,2,6,6-Tetrachlorocyclohexanol,
2,3,4,5,6,6-Hexachloro-2,4-Cyclohexadienone, 2,2-Dichlorohexanal, 1,2-Dichlorohexafluorocyclopentene}

And here’s the single line of code you need to display all their 3D structural plots (minus a few that are missing for whatever reason):

Map[ChemicalData[#, "MoleculePlot"] &, ChemicalData["*chloro*hex*"]]

Chemical Data

I’m sorry, but if that isn’t amazing, I don’t know what is.

Keep in mind that Mathematica is not a system focused on dealing with chemistry or molecular modeling, or anything like that. If it were, this example wouldn’t be amazing. Any chemical structure database program that can’t do this pretty easily should be ashamed of itself.

But Mathematica isn’t such a program, it’s a general system with this being just a bit of top-level code. In other words, this isn’t a special case: there are a million other things in most any field you care to name that you can do just as easily. (Did I mention that in Mathematica you can click and rotate any of those molecules in real time?)

Another example: GraphPlot is an amazing function, able to automatically lay out networks of connected nodes and edges. Here’s a lovely collection of connected graphs generated using the Mod function to determine connectivity:

GraphPlot[Table[i -> Mod[i^2, 234], {i, 300}]]

Connected Graphs

Pretty, but abstract. Not that there’s anything wrong with purely mathematical structure, but in my mind this Mod graph example does not even begin to compare with the next example, which uses a data function (the graph is very large; click the small image to blow it up to a size where you can see individual nodes, and scroll around to study the whole thing):

GraphPlot[Flatten[Map[Function[{n},
Map[n -> # &,
Select[IsotopeData[n, "DaughterNuclides"], StringQ]]], IsotopeData[]]]]

Isotope Data

What is this, DNA or something? No, it’s a complete map of every radioactive decay chain of every isotope that’s ever been measured, showing how all the elements and isotopes are linked to each other. It represents what must be hundreds of billions of dollars in government nuclear research over decades: 3,150 isotopes, thousands of decay modes, all read into Mathematica and plotted, in two lines of code.

It also represents a profound image of astonishing richness and complexity that reveals something poetic about the structure of the chemical elements. It is deep, it rewards study. It is a moving testimony to how far we have come as a civilization, to know this much about a subject that didn’t even exist when people alive today were born.

You just don’t do that in two lines of code.

I forgive you for thinking these are just special cases, lucky accidents and hand-picked examples. All I can say is, learn the language, and you will see why they are not flukes.