I’m pleased to announce that as of today, the Wolfram Data Repository is officially launched! It’s been a long road. I actually initiated the project a decade ago—but it’s only now, with all sorts of innovations in the Wolfram Language and its symbolic ways of representing data, as well as with the arrival of the Wolfram Cloud, that all the pieces are finally in place to make a true computable data repository that works the way I think it should.

It’s happened to me a zillion times: I’m reading a paper or something, and I come across an interesting table or plot. And I think to myself: “I’d really like to get the data behind that, to try some things out”. But how can I get the data?

If I’m lucky there’ll be a link somewhere in the paper. But it’s usually a frustrating experience to follow it. Because even if there’s data there (and often there actually isn’t), it’s almost never in a form where one can readily use it. It’s usually quite raw—and often hard to decode, and perhaps even intertwined with text. And even if I can see the data I want, I almost always find myself threading my way through footnotes to figure out what’s going on with it. And in the end I usually just decide it’s too much trouble to actually pull out the data I want.

And I suppose one might think that this is just par for the course in working with data. But in modern times, we have a great counterexample: the Wolfram Language. It’s been one of my goals with the Wolfram Language to build into it as much data as possible—and make all of that data immediately usable and computable. And I have to say that it’s worked out great. Whether you need the mass of Jupiter, or the masses of all known exoplanets, or Alan Turing’s date of birth—or a trillion much more obscure things—you just ask for them in the language, and you’ll get them in a form where you can immediately compute with them.

Here’s the mass of Jupiter (and, yes, one can use “Wolfram|Alpha-style” natural language to ask for it):

Dividing it by the mass of the Earth immediately works:

Here’s a histogram of the masses of known exoplanets, divided by the mass of Jupiter:

And here, for good measure, is Alan Turing’s date of birth, in an immediately computable form:

Of course, it’s taken many years and lots of work to make everything this smooth, and to get to the point where all those thousands of different kinds of data are fully integrated into the Wolfram Language—and Wolfram|Alpha.

But what about other data—say data from some new study or experiment? It’s easy to upload it someplace in some raw form. But the challenge is to make the data actually useful.

And that’s where the new Wolfram Data Repository comes in. Its idea is to leverage everything we’ve done with the Wolfram Language—and Wolfram|Alpha, and the Wolfram Cloud—to make it as easy as possible to make data as broadly usable and computable as possible.

There are many parts to this. But let me state our basic goal. I want it to be the case that if someone is dealing with data they understand well, then they should be able to prepare that data for the Wolfram Data Repository in as little as 30 minutes—and then have that data be something that other people can readily use and compute with.

It’s important to set expectations. Making data fully computable—to the standard of what’s built into the Wolfram Language—is extremely hard. But there’s a lower standard that still makes data extremely useful for many purposes. And what’s important about the Wolfram Data Repository (and the technology around it) is it now makes that standard easy to achieve—with the result that it’s now practical to publish data in a form that can really be used by many people.

Each item published in the Wolfram Data Repository gets its own webpage. Here, for example, is the page for a public dataset about meteorite landings:

At the top is some general information about the dataset. But then there’s a piece of a Wolfram Notebook illustrating how to use the dataset in the Wolfram Language. And by looking at this notebook, one can start to see some of the real power of the Wolfram Data Repository.

One thing to notice is that it’s very easy to get the data. All you do is ask for `ResourceData["Meteorite Landings"]`. And whether you’re using the Wolfram Language on a desktop or in the cloud, this will give you a nice symbolic representation of data about 45716 meteorite landings (and, yes, the data is carefully cached so this is as fast as possible, etc.):

And then the important thing is that you can immediately start to do whatever computation you want on that dataset. As an example, this takes the `"Coordinates"` element from all rows, then takes a random sample of 1000 results, and geo plots them:

Many things have to come together for this to work. First, the data has to be reliably accessible—as it is in the Wolfram Cloud. Second, one has to be able to tell where the coordinates are—which is easy if one can see the dataset in a Wolfram Notebook. And finally, the coordinates have to be in a form in which they can immediately be computed with.

This last point is critical. Just storing the textual form of a coordinate—as one might in something like a spreadsheet—isn’t good enough. One has to have it in a computable form. And needless to say, the Wolfram Language has such a form for geo coordinates: the symbolic construct `GeoPosition[{`*lat*`,`*lon*`}]`.

There are other things one can immediately see from the meteorites dataset too. Like notice there’s a `"Mass"` column. And because we’re using the Wolfram Language, masses don’t have to just be numbers; they can be symbolic `Quantity` objects that correctly include their units. There’s also a `"Year"` column in the data, and again, each year is represented by an actual, computable, symbolic `DateObject` construct.

There are lots of different kinds of possible data, and one needs a sophisticated data ontology to handle them. But that’s exactly what we’ve built for the Wolfram Language, and for Wolfram|Alpha, and it’s now been very thoroughly tested. It involves 10,000 kinds of units, and tens of millions of “core entities”, like cities and chemicals and so on. We call it the Wolfram Data Framework (WDF)—and it’s one of the things that makes the Wolfram Data Repository possible.

Today is the initial launch of the Wolfram Data Repository, and to get ready for this launch we’ve been adding sample content to the repository for several months. Some of what we’ve added are “obvious” famous datasets. Some are datasets that we found for some reason interesting, or curious. And some are datasets that we created ourselves—and in some cases that I created myself, for example, in the course of writing my book *A New Kind of Science*.

There’s plenty already in the Wolfram Data Repository that’ll immediately be useful in a variety of applications. But in a sense what’s there now is just an example of what can be there—and the kinds of things we hope and expect will be contributed by many other people and organizations.

The fact that the Wolfram Data Repository is built on top of our Wolfram Language technology stack immediately gives it great generality—and means that it can handle data of any kind. It’s not just tables of numerical data as one might have in a spreadsheet or simple database. It’s data of any type and structure, in any possible combination or arrangement.

There are time series:

There are training sets for machine learning:

There’s gridded data:

There’s the text of many books:

There’s geospatial data:

Many of the data resources currently in the Wolfram Data Repository are quite tabular in nature. But unlike traditional spreadsheets or tables in databases, they’re not restricted to having just one level of rows and columns—because they’re represented using symbolic Wolfram Language `Dataset` constructs, which can handle arbitrarily ragged structures, of any depth.

But what about data that normally lives in relational or graph databases? Well, there’s a construct called `EntityStore` that was recently added to the Wolfram Language. We’ve actually been using something like it for years inside Wolfram|Alpha. But what `EntityStore` now does is to let you set up arbitrary networks of entities, properties and values, right in the Wolfram Language. It typically takes more curation than setting up something like a `Dataset`—but the result is a very convenient representation of knowledge, on which all the same functions can be used as with built-in Wolfram Language knowledge.

Here’s a data resource that’s an entity store:

This adds the entity stores to the list of entity stores to be used automatically:

Now here are 5 random entities of type `"MoMAArtist"` from the entity store:

For each artist, one can extract a dataset of values:

This queries the entity store to find artists with the most recent birth dates:

The Wolfram Data Repository is built on top of a new, very general thing in the Wolfram Language called the “resource system”. (Yes, expect all sorts of other repository and marketplace-like things to be rolling out shortly.)

The resource system has “resource objects”, that are stored in the cloud (using `CloudObject`), then automatically downloaded and cached on the desktop if necessary (using `LocalObject`). Each `ResourceObject` contains both primary content and metadata. For the Wolfram Data Repository, the primary content is data, which you can access using `ResourceData`.

The Wolfram Data Repository that we’re launching today is a public resource, that lives in the public Wolfram Cloud. But we’re also going to be rolling out private Wolfram Data Repositories, that can be run in Enterprise Private Clouds—and indeed inside our own company we’ve already set up several private data repositories, that contain internal data for our company.

There’s no limit in principle on the size of the data that can be stored in the Wolfram Data Repository. But for now, the “plumbing” is optimized for data that’s at most about a few gigabytes in size—and indeed the existing examples in the Wolfram Data Repository make it clear that an awful lot of useful data never even gets bigger than a few megabytes in size.

The Wolfram Data Repository is primarily intended for the case of definitive data that’s not continually changing. For data that’s constantly flowing in—say from IoT devices—we released last year the Wolfram Data Drop. Both Data Repository and Data Drop are deeply integrated into the Wolfram Language, and through our resource system, there’ll be some variants and combinations coming in the future.

Our goal with the Wolfram Data Repository is to provide a central place for data from all sorts of organizations to live—in such a way that it can readily be found and used.

Each entry in the Wolfram Data Repository has an associated webpage, which describes the data it contains, and gives examples that can immediately be run in the Wolfram Cloud (or downloaded to the desktop).

On the webpage for each repository entry (and in the `ResourceObject` that represents it), there’s also metadata, for indexing and searching—including standard Dublin Core bibliographic data. To make it easier to refer to the Wolfram Data Repository entries, every entry also has a unique DOI.

The way we’re managing the Wolfram Data Repository, every entry also has a unique readable registered name, that’s used both for the URL of its webpage, and for the specification of the `ResourceObject` that represents the entry.

It’s extremely easy to use data from the Wolfram Data Repository inside a Wolfram Notebook, or indeed in any Wolfram Language program. The data is ultimately stored in the Wolfram Cloud. But you can always download it—for example right from the webpage for any repository entry.

The richest and most useful form in which to get the data is the Wolfram Language or the Wolfram Data Framework (WDF)—either in ASCII or in binary. But we’re also setting it up so you can download in other formats, like JSON (and in suitable cases CSV, TXT, PNG, etc.) just by pressing a button.

Of course, even formats like JSON don’t have native ways to represent entities, or quantities with units, or dates, or geo positions—or all those other things that WDF and the Wolfram Data Repository deal with. So if you really want to handle data in its full form, it’s much better to work directly in the Wolfram Language. But then with the Wolfram Language you can always process some slice of the data into some simpler form that does makes sense to export in a lower-level format.

The Wolfram Data Repository as we’re releasing it today is a platform for publishing data to the world. And to get it started, we’ve put in about 500 sample entries. But starting today we’re accepting contributions from anyone. We’re going to review and vet contributions much like we’ve done for the past decade for the Wolfram Demonstrations Project. And we’re going to emphasize contributions and data that we feel are of general interest.

But the technology of the Wolfram Data Repository—and the resource system that underlies it—is quite general, and allows people not just to publish data freely to the world, but also to share data in a more controlled fashion. The way it works is that people prepare their data just like they would for submission to the public Wolfram Data Repository. But then instead of actually submitting it, they just deploy it to their own Wolfram Cloud accounts, giving access to whomever they want.

And in fact, the general workflow is that even when people are submitting to the public Wolfram Data Repository, we’re going to expect them to have first deployed their data to their own Wolfram Cloud accounts. And as soon as they do that, they’ll get webpages and everything—just like in the public Wolfram Data Repository.

OK, so how does one create a repository entry? You can either do it programmatically using Wolfram Language code, or do it more interactively using Wolfram Notebooks. Let’s talk about the notebook way first.

You start by getting a template notebook. You can either do this through the menu item `File > New > Data Resource`, or you can use `CreateNotebook["DataResource"]`. Either way, you’ll get something that looks like this:

Basically it’s then a question of “filling out the form”. A very important section is the one that actually provides the content for the resource:

Yes, it’s Wolfram Language code—and what’s nice is that it’s flexible enough to allow for basically any content you want. You can either just enter the content directly in the notebook, or you can have the notebook refer to a local file, or to a cloud object you have.

An important part of the Construction Notebook (at least if you want to have a nice webpage for your data) is the section that lets you give examples. When the examples are actually put up on the webpage, they’ll reference the data resource you’re creating. But when you’re filling in the Construction Notebook the resource hasn’t been created yet. The symbolic character of the Wolfram Language comes to the rescue, though. Because it lets you reference the content of the data resource symbolically as `$$Data` in the inputs that’ll be displayed, but lets you set `$$Data` to actual data when you’re working in the Construction Notebook to build up the examples.

Alright, so once you’ve filled out the Construction Notebook, what do you do? There are two initial choices: set up the resource locally on your computer, or set it up in the cloud:

And then, if you’re ready, you can actually submit your resource for publication in the public Wolfram Data Repository (yes, you need to get a Publisher ID, so your resource can be associated with your organization rather than just with your personal account):

It’s often convenient to set up resources in notebooks. But like everything else in our technology stack, there’s a programmatic Wolfram Language way to do it too—and sometimes this is what will be best.

Remember that everything that is going to be in the Wolfram Data Repository is ultimately a `ResourceObject`. And a `ResourceObject`—like everything else in the Wolfram Language—is just a symbolic expression, which happens to contain an association that gives the content and metadata of the resource object.

Well, once you’ve created an appropriate `ResourceObject`, you can just deploy it to the cloud using `CloudDeploy`. And when you do this, a private webpage associated with your cloud account will automatically be created. That webpage will in turn correspond to a `CloudObject`. And by setting the permissions of that cloud object, you can determine who will be able to look at the webpage, and who will be able to get the data that’s associated with it.

When you’ve got a `ResourceObject`, you can submit it to the public Wolfram Data Repository just by using `ResourceSubmit`.

By the way, all this stuff works not just for the main Wolfram Data Repository in the public Wolfram Cloud, but also for data repositories in private clouds. The administrator of an Enterprise Private Cloud can decide how they want to vet data resources that are submitted (and how they want to manage things like name collisions)—though often they may choose just to publish any resource that’s submitted.

The procedure we’ve designed for vetting and editing resources for the public Wolfram Data Repository is quite elaborate—though in any given case we expect it to run quickly. It involves doing automated tests on the incoming data and examples—and then ensuring that these continue working as changes are made, for example in subsequent versions of the Wolfram Language. Administrators of private clouds definitely don’t have to use this procedure—but we’ll be making our tools available if they want to.

OK, so let’s say there’s a data resource in the Wolfram Data Repository. How can it actually be used to create a data-backed publication? The most obvious answer is just for the publication to include a link to the webpage for the data resource in the Wolfram Data Repository. And once people go to the page, it immediately shows them how to access the data in the Wolfram Language, use it in the Wolfram Open Cloud, download it, or whatever.

But what about an actual visualization or whatever that appears in the paper? How can people know how to make it? One possibility is that the visualization can just be included among the examples on the webpage for the data resource. But there’s also a more direct way, which uses Source Links in the Wolfram Cloud.

Here’s how it works. You create a Wolfram Notebook that takes data from the Wolfram Data Repository and creates the visualization:

Then you deploy this visualization to the Wolfram Cloud—either using Wolfram Language functions like `CloudDeploy` and `EmbedCode`, or using menu items. But when you do the deployment, you say to include a source link (`SourceLink->Automatic` in the Wolfram Language). And this means that when you get an embeddable graphic, it comes with a source link that takes you back to the notebook that made the graphic:

So if someone is reading along and they get to that graphic, they can just follow its source link to see how it was made, and to see how it accesses data from the Wolfram Data Repository. With the Wolfram Data Repository you can do data-backed publishing; with source links you can also do full notebook-backed publishing.

Now that we’ve talked a bit about how the Wolfram Data Repository works, let’s talk again about why it’s important—and why having data in it is so valuable.

The #1 reason is simple: it makes data immediately useful, and computable.

There’s nice, easy access to the data (just use `ResourceData["..."]`). But the really important—and unique—thing is that data in the Wolfram Data Repository is stored in a uniform, symbolic way, as WDF, leveraging everything we’ve done with data over the course of so many years in the Wolfram Language and Wolfram|Alpha.

Why is it good to have data in WDF? First, because in WDF the meaning of everything is explicit: whether it’s an entity, or quantity, or geo position, or whatever, it’s a symbolic element that’s been carefully designed and documented. (And it’s not just a disembodied collection of numbers or strings.) And there’s another important thing: data in WDF is already in precisely the form it’s needed for one to be able to immediately visualize, analyze or otherwise compute with it using any of the many thousands of functions that are built into the Wolfram Language.

Wolfram Notebooks are also an important part of the picture—because they make it easy to show how to work with the data, and give immediately runnable examples. Also critical is the fact that the Wolfram Language is so succinct and easy to read—because that’s what makes it possible to give standalone examples that people can readily understand, modify and incorporate into their own work.

In many cases using the Wolfram Data Repository will consist of identifying some data resource (say through a link from a document), then using the Wolfram Language in Wolfram Notebooks to explore the data in it. But the Wolfram Data Repository is fully integrated into the Wolfram Language, so it can be used wherever the language is used. Which means the data from the Wolfram Data Repository can be used not just in the cloud or on the desktop, but also in servers and so on. And, for example, it can also be used in APIs or scheduled tasks, using the exact same `ResourceData` functions as ever.

The most common way the Wolfram Data Repository will be used is one resource at a time. But what’s really great about the uniformity and standardization that WDF provides is that it allows different data resources to be used together: those dates or geo positions mean the same thing even in different data resources, so they can immediately be put together in the same analysis, visualization, or whatever.

The Wolfram Data Repository builds on the whole technology stack that we’ve been assembling for the past three decades. In some ways it’s just a sophisticated piece of infrastructure that makes a lot of things easier to do. But I can already tell that its implications go far beyond that—and that it’s going to have a qualitative effect on the extent to which people can successfully share and reuse a wide range of kinds of data.

It’s a big win to have data in the Wolfram Data Repository. But what’s involved in getting it there? There’s almost always a certain amount of data curation required.

Let’s take a look again at the meteorite landings dataset I showed earlier in this post. It started from a collection of data made available in a nicely organized way by NASA. (Quite often one has to scrape webpages or PDFs; this is a case where the data happens to be set up to be downloadable in a variety of convenient formats.)

As is fairly typical, the basic elements of the data here are numbers and strings. So the first thing to do is to figure out how to map these to meaningful symbolic constructs in WDF. For example, the “mass” column is labeled as being “(g)”, i.e. in grams—so each element in it should get converted to `Quantity[`*value*`,"Grams"]`. It’s a little trickier, though, because for some rows—corresponding to some meteorites—the value is just blank, presumably because it isn’t known.

So how should that be represented? Well, because the Wolfram Language is symbolic it’s pretty easy. And in fact there’s a standard symbolic construct `Missing[...]` for indicating missing data, which is handled consistently in analysis and visualization functions.

As we start to look further into the dataset, we see all sorts of other things. There’s a column labeled “year”. OK, we can convert that into `DateObject[{`*value*`}]`—though we need to be careful about any BC dates (how would they appear in the raw data?).

Next there are columns “reclat” and “reclong”, as well as a column called “GeoLocation” that seems to combine these, but with numbers quoted a different precision. A little bit of searching suggests that we should just take reclat and reclong as the latitude and longitude of the meteorite—then convert these into the symbolic form `GeoPosition[{`*lat*`,`*lon*`}]`.

To do this in practice, we’d start by just importing all the data:

OK, let’s extract a sample row:

Already there’s something unexpected: the date isn’t just the year, but instead it’s a precise time. So this needs to be converted:

Now we’ve got to reset this to correspond only to a date at a granularity of a year:

Here is the geo position:

And we can keep going, gradually building up code that can be applied to each row of the imported data. In practice there are often little things that go wrong. There’s something missing in some row. There’s an extra piece of text (a “footnote”) somewhere. There’s something in the data that got misinterpreted as a delimiter when the data was provided for download. Each one of these needs to be handled—preferably with as much automation as possible.

But in the end we have a big list of rows, each of which needs to be assembled into an association, then all combined to make a `Dataset` object that can be checked to see if it’s good to go into the Wolfram Data Repository.

The example above is fairly typical of basic curation that can be done in less than 30 minutes by any decently skilled user of the Wolfram Language. (A person who’s absorbed my book *An Elementary Introduction to the Wolfram Language* should, for example, be able to do it.)

It’s a fairly simple example—where notably the original form of the data was fairly clean. But even in this case it’s worth understanding what hasn’t been done. For example, look at the column labeled `"Classification"` in the final dataset. It’s got a bunch of strings in it. And, yes, we can do something like make a word cloud of these strings:

But to really make these values computable, we’d have to do more work. We’d have to figure out some kind of symbolic representation for meteorite classification, then we’d have to do curation (and undoubtedly ask some meteorite experts) to fit everything nicely into that representation. The advantage of doing this is that we could then ask questions about those values (“what meteorites are above L3?”), and expect to compute answers. But there’s plenty we can already do with this data resource without that.

My experience in general has been that there’s a definite hierarchy of effort and payoff in getting data to be computable at different levels—starting with the data just existing in digital form, and ending with the data being cleanly computable enough that it can be fully integrated in the core Wolfram Language, and used for repeated, systematic computations.

Let’s talk about this hierarchy a bit.

The zeroth thing, of course, is that the data has to exist. And the next thing is that it has to be in digital form. If it started on handwritten index cards, for example, it had better have been entered into a document or spreadsheet or something.

But then the next issue is: how are people supposed to get access to that document or spreadsheet? Well, a good answer is that it should be in some kind of accessible cloud—perhaps referenced with a definite URI. And for a lot of data repositories that exist out there, just making the data accessible like this is the end of the story.

But one has to go a lot further to make the data actually useful. The next step is typically to make sure that the data is arranged in some definite structure. It might be a set of rows and columns, or it might be something more elaborate, and, say, hierarchical. But the point is to have a definite, known structure.

In the Wolfram Language, it’s typically trivial to take data that’s stored in any reasonable format, and use `Import` to get it into the Wolfram Language, arranged in some appropriate way. (As I’ll talk about later, it might be a `Dataset`, it might be an `EntityStore`, it might just be a list of `Image` objects, or it might be all sorts of other things.)

But, OK, now things start getting more difficult. We need to be able to recognize, say, that such-and-such a column has entries representing countries, or pairs of dates, or animal species, or whatever. `SemanticImport` uses machine learning and does a decent job of automatically importing many kinds of data. But there are often things that have to be fixed. How exactly is missing data represented? Are there extra annotations that get in the way of automatic interpretation? This is where one starts needing experts, who really understand the data.

But let’s say one’s got through this stage. Well, then in my experience, the best thing to do is to start visualizing the data. And very often one will immediately see things that are horribly wrong. Some particular quantity was represented in several inconsistent ways in the data. Maybe there was some obvious transcription or other error. And so on. But with luck it’s fairly easy to transform the data to handle the obvious issues—though to actually get it right almost always requires someone who is an expert on the data.

What comes out of this process is typically very useful for many purposes—and it’s the level of curation that we’re expecting for things submitted to the Wolfram Data Repository.

It’ll be possible to do all sorts of analysis and visualization and other things with data in this form.

But if one wants, for example, to actually integrate the data into Wolfram|Alpha, there’s considerably more that has to be done. For a start, everything that can realistically be represented symbolically has to be represented symbolically. It’s not good enough to have random strings giving values of things—because one can’t ask systematic questions about those. And this typically requires inventing systematic ways to represent new kinds of concepts in the world—like the `"Classification"` for meteorites.

Wolfram|Alpha works by taking natural language input. So the next issue is: when there’s something in the data that can be referred to, how do people refer to it in natural language? Often there’ll be a whole collection of names for something, with all sorts of variations. One has to algorithmically capture all of the possibilities.

Next, one has to think about what kinds of questions will be asked about the data. In Wolfram|Alpha, the fact that the questions get asked in natural language forces a certain kind of simplicity on them. But it makes one also need to figure out just what the linguistics of the questions can be (and typically this is much more complicated than the linguistics for entities or other definite things). And then—and this is often a very difficult part—one has to figure out what people want to compute, and how they want to compute it.

At least in the world of Wolfram|Alpha, it turns out to be quite rare for people just to ask for raw pieces of data. They want answers to questions—that have to be computed with models, or methods, or algorithms, from the underlying data. For meteorites, they might want to know not the raw information about when a meteorite fell, but compute the weathering of the meteorite, based on when it fell, what climate it’s in, what it’s made of, and so on. And to have data successfully be integrated into Wolfram|Alpha, those kinds of computations all need to be there.

For full Wolfram|Alpha there’s even more. Not only does one have to be able to give a single answer, one has to be able to generate a whole report, that includes related answers, and presents them in a well-organized way.

It’s ultimately a lot of work. There are very few domains that have been added to Wolfram|Alpha with less than a few skilled person-months of work. And there are plenty of domains that have taken person-years or tens of person-years. And to get the right answers, there always has to be a domain expert involved.

Getting data integrated into Wolfram|Alpha is a significant achievement. But there’s further one can go—and indeed to integrate data into the Wolfram Language one has to go further. In Wolfram|Alpha people are asking one-off questions—and the goal is to do as well as possible on individual questions. But if there’s data in the Wolfram Language, people won’t just ask one-off questions with it: they’ll also do large-scale systematic computations. And this demands a much greater level of consistency and completeness—which in my experience rarely takes less than person-years per domain to achieve.

But OK. So where does this leave the Wolfram Data Repository? Well, the good news is that all that work we’ve put into Wolfram|Alpha and the Wolfram Language can be leveraged for the Wolfram Data Repository. It would take huge amounts of work to achieve what’s needed to actually integrate data into Wolfram|Alpha or the Wolfram Language. But given all the technology we have, it takes very modest amounts of work to make data already very useful. And that’s what the Wolfram Data Repository is about.

With the Wolfram Data Repository (and Wolfram Notebooks) there’s finally a great way to do true data-backed publishing—and to ensure that data can be made available in an immediately useful and computable way.

For at least a decade there’s been lots of interest in sharing data in areas like research and government. And there’ve been all sorts of data repositories created—often with good software engineering—with the result that instead of data just sitting on someone’s local computer, it’s now pretty common for it to be uploaded to a central server or cloud location.

But the problem has been that the data in these repositories is almost always in a quite raw form—and not set up to be generally meaningful and computable. And in the past—except in very specific domains—there’s been no really good way to do this, at least in any generality. But the point of the Wolfram Data Repository is to use all the development we’ve done on the Wolfram Language and WDF to finally be able to provide a framework for having data in an immediately computable form.

The effect is dramatic. One goes from a situation where people are routinely getting frustrated trying to make use of data to one in which data is immediately and readily usable. Often there’s been lots of investment and years of painstaking work put into accumulating some particular set of data. And it’s often sad to see how little the data actually gets used—even though it’s in principle accessible to anyone. But I’m hoping that the Wolfram Data Repository will provide a way to change this—by allowing data not just to be accessible, but also computable, and easy for anyone to immediately and routinely use as part of their work.

There’s great value to having data be computable—but there’s also some cost to making it so. Of course, if one’s just collecting the data now, and particularly if it’s coming from automated sources, like networks of sensors, then one can just set it up to be in nice, computable WDF right from the start (say by using the data semantics layer of the Wolfram Data Drop). But at least for a while there’s going to still be a lot of data that’s in the form of things like spreadsheets and traditional databases—-that don’t even have the technology to support the kinds of structures one would need to directly represent WDF and computable data.

So that means that there’ll inevitably have to be some effort put into curating the data to make it computable. Of course, with everything that’s now in the Wolfram Language, the level of tools available for curation has become extremely high. But to do curation properly, there’s always some level of human effort—and some expert input—that’s required. And a key question in understanding the post-Wolfram-Data-Repository data publishing ecosystem is who is actually going to do this work.

In a first approximation, it could be the original producers of the data—or it could be professional or other “curation specialists”—or some combination. There are advantages and disadvantages to all of these possibilities. But I suspect that at least for things like research data it’ll be most efficient to start with the original producers of the data.

The situation now with data curation is a little similar to the historical situation with document production. Back when I was first doing science (yes, in the 1970s) people handwrote papers, then gave them to professional typists to type. Once typed, papers would be submitted to publishers, who would then get professional copyeditors to copyedit them, and typesetters to typeset them for printing. It was all quite time consuming and expensive. But over the course of the 1980s, authors began to learn to type their own papers on a computer—and then started just uploading them directly to servers, in effect putting them immediately in publishable form.

It’s not a perfect analogy, but in both data curation and document editing there are issues of structure and formatting—and then there are issues that require actual understanding of the content. (Sometimes there are also more global “policy” issues too.) And for producing computable data, as for producing documents, almost always the most efficient thing will be to start with authors “typing their own papers”—or in the case of data, putting their data into WDF themselves.

Of course, to do this requires learning at least a little about computable data, and about how to do curation. And to assist with this we’re working with various groups to develop materials and provide training about such things. Part of what has to be communicated is about mechanics: how to move data, convert formats, and so on. But part of it is also about principles—and about how to make the best judgement calls in setting up data that’s computable.

We’re planning to organize “curateathons” where people who know the Wolfram Language and have experience with WDF data curation can pair up with people who understand particular datasets—and hopefully quickly get all sorts of data that they may have accumulated over decades into computable form—and into the Wolfram Data Repository.

In the end I’m confident that a very wide range of people (not just techies, but also humanities people and so on) will be able to become proficient at data curation with the Wolfram Language. But I expect there’ll always be a certain mixture of “type it yourself” and “have someone type it for you” approaches to data curation. Some people will make their data computable themselves—or will have someone right there in their lab or whatever who does. And some people will instead rely on outside providers to do it.

Who will these providers be? There’ll be individuals or companies set up much like the ones who provide editing and publishing services today. And to support this we’re planning a “Certified Data Curator” program to help define consistent standards for people who will work with the originators of a wide range of different kinds of data putting it into computable form.

But in additional to individuals or specific “curation companies”, there are at least two other kinds of entities that have the potential to be major facilitators of making data computable.

The first is research libraries. The role of libraries at many universities is somewhat in flux these days. But something potentially very important for them to do is to provide a central place for organizing—and making computable—data from the university and beyond. And in many ways this is just a modern analog of traditional library activities like archiving and cataloging.

It might involve the library actually having a private cloud version of the Wolfram Data Repository—and it might involve the library having its own staff to do curation. Or it might just involve the library providing advice. But I’ve found there’s quite a bit of enthusiasm in the library community for this kind of direction (and it’s perhaps an interesting sign that at our company people involved in data curation have often originally been trained in library science).

In addition to libraries, another type of organization that should be involved in making data computable is publishing companies. Some might say that publishing companies have had it a bit easy in the last couple of decades. Back in the day, every paper they published involved all sorts of production work, taking it from manuscript to final typeset version. But for years now, authors have been delivering their papers in digital forms that publishers don’t have to do much work on.

With data, though, there’s again something for publishers to do, and again a place for them to potentially add great value. Authors can pretty much put raw data into public repositories for themselves. But what would make publishers visibly add value is for them to process (or “edit”) the data—putting in the work to make it computable. The investment and processes will be quite similar to what was involved on the text side in the past—it’s just that now instead of learning about phototypesetting systems, publishers should be learning about WDF and the Wolfram Language.

It’s worth saying that as of today all data that we accept into the Wolfram Data Repository is being made freely available. But we’re anticipating in the near future we’ll also incorporate a marketplace in which data can be bought and sold (and even potentially have meaningful DRM, at least if it’s restricted to being used in the Wolfram Language). It’ll also be possible to have a private cloud version of the Wolfram Data Repository—in which whatever organization that runs it can set up whatever rules it wants about contributions, subscriptions and access.

One feature of traditional paper publishing is the sense of permanence it provides: once even just a few hundred printed copies of a paper are on shelves in university libraries around the world, it’s reasonable to assume that the paper is going to be preserved forever. With digital material, preservation is more complicated.

If someone just deploys a data resource to their Wolfram Cloud account, then it can be available to the world—but only so long as the account is maintained. The Wolfram Data Repository, though, is intended to be something much more permanent. Once we’ve accepted a piece of data for the repository, our goal is to ensure that it’ll continue to be available, come what may. It’s an interesting question how best to achieve that, given all sorts of possible future scenarios in the world. But now that the Wolfram Data Repository is finally launched, we’re going to be working with several well-known organizations to make sure that its content is as securely maintained as possible.

The Wolfram Data Repository—and private versions of it—is basically a powerful, enabling technology for making data available in computable form. And sometimes all one wants to do is to make the data available.

But at least in academic publishing, the main point usually isn’t the data. There’s usually a “story to be told”—and the data is just backup for that story. Of course, having that data backing is really important—and potentially quite transformative. Because when one has the data, in computable form, it’s realistic for people to work with it themselves, reproducing or checking the research, and directly building on it themselves.

But, OK, how does the Wolfram Data Repository relate to traditional academic publishing? For our official Wolfram Data Repository we’re going to have definite standards for what we accept—and we’re going to concentrate on data that we think is of general interest or use. We have a whole process for checking the structure of data, and applying software quality assurance methods, as well as expert review, to it.

And, yes, each entry in the Wolfram Data Repository gets a DOI, just like a journal article. But for our official Wolfram Data Repository we’re focused on data—and not the story around it. We don’t see it as our role to check the methods by which the data was obtained, or to decide whether conclusions drawn from it are valid or not.

But given the Wolfram Data Repository, there are lots of new opportunities for data-backed academic journals that do in effect “tell stories”, but now have the infrastructure to back them up with data that can readily be used.

I’m looking forward, for example, to finally making the journal *Complex Systems* that I founded 30 years ago a true data-backed journal. And there are many existing journals where it makes sense to use versions of the Wolfram Data Repository (often in a private cloud) to deliver computable data associated with journal articles.

But what’s also interesting is that now that one can take computable data for granted, there’s a whole new generation of “Journal of Data-Backed ____” journals that become possible—that not only use data from the Wolfram Data Repository, but also actually present their results as Wolfram Notebooks that can immediately be rerun and extended (and can also, for example, contain interactive elements).

I’ve been talking about the Wolfram Data Repository in the context of things like academic journals. But it’s also important in corporate settings. Because it gives a very clean way to have data shared across an organization (or shared with customers, etc.).

Typically in a corporate setting one’s talking about private cloud versions. And of course these can have their own rules about how contributions work, and who can access what. And the data can not only be immediately used in Wolfram Notebooks, but also in automatically generated reports, or instant APIs.

It’s been interesting to see—during the time we’ve been testing the Wolfram Data Repository—just how many applications we’ve found for it within our own company.

There’s information that used to be on webpages, but is now in our private Wolfram Data Repository, and is now immediately usable for computation. There’s information that used to be in databases, and which required serious programming to access, but is now immediately accessible through the Wolfram Language. And there are all sorts of even quite small lists and so on that used to exist only in textual form, but are now computable data in our data repository.

It’s always been my goal to have a truly “computable company”—and putting in place our private Wolfram Data Repository is an important step in achieving this.

In addition to public and corporate uses, there are also great uses of Wolfram Data Repository technology for individuals—and particularly for individual researchers. In my own case, I’ve got huge amounts of data that I’ve collected or generated over the course of my life. I happen to be pretty organized at keeping things—but it’s still usually something of an adventure to remember enough to “bring back to life” data I haven’t dealt with in a decade or more. And in practice I make much less use of older data than I should—even though in many cases it took me immense effort to collect or generate the data in the first place.

But now it’s a different story. Because all I have to do is to upload data once and for all to the Wolfram Data Repository, and then it’s easy for me to get and use the data whenever I want to. Some data (like medical or financial records) I want just for myself, so I use a private cloud version of the Wolfram Data Repository. But other data I’ve been getting uploaded into the public Wolfram Data Repository.

Here’s an example. It comes from a page in my book *A New Kind of Science*:

The page says that by searching about 8 trillion possible systems in the computational universe I found 199 that satisfy some particular criterion. And in the book I show examples of some of these. But where’s the data?

Well, because I’m fairly organized about such things, I can go into my file system, and find the actual Wolfram Notebook from 2001 that generated the picture in the book. And that leads me to a file that contains the raw data—which then takes a very short time to turn into a data resource for the Wolfram Data Repository:

We’ve been systematically mining data from my research going back into the 1980s—even from Mathematica Version 1 notebooks from 1988 (which, yes, still work today). Sometimes the experience is a little less inspiring. Like to find a list of people referenced in the index of *A New Kind of Science*, together with their countries and dates, the best approach seemed to be to scrape the online book website:

And to get a list of the books I used while working on *A New Kind of Science* required going into an ancient FileMaker database. But now all the data—nicely merged with Open Library information deduced from ISBNs—is in a clean WDF form in the Wolfram Data Repository. So I can do such things as immediately make a word cloud of the titles of the books:

Many things have had to come together to make today’s launch of the Wolfram Data Repository possible. In the modern software world it’s easy to build something that takes blobs of data and puts them someplace in the cloud for people to access. But what’s vastly more difficult is to have the data actually be immediately useful—and making that possible is what’s required the whole development of our Wolfram Language and Wolfram Cloud technology stack, which are now the basis for the Wolfram Data Repository.

But now that the Wolfram Data Repository exists—and private versions of it can be set up—there are lots of new opportunities. For the research community, the most obvious is finally being able to do genuine data-backed publication, where one can routinely make underlying data from pieces of research available in a way that people can actually use. There are variants of this in education—making data easy to access and use for educational exercises and projects.

In the corporate world, it’s about making data conveniently available across an organization. And for individuals, it’s about maintaining data in such a way that it can be readily used for computation, and built on.

But in the end, I see the Wolfram Data Repository as a key enabling technology for defining how one can work with data in the future—and I’m excited that after all this time it’s finally now launched and available to everyone.

]]>It’s National Pet Day on April 11, the day we celebrate furry, feathered or otherwise nonhuman companions. To commemorate the date, we thought we’d use some new features in the Wolfram Language to map a dog walk using pictures taken with a smartphone along the way. After that, we’ll use some neural net functions to identify the content in the photos. One of the great things about Wolfram Language 11.1 is pre-trained neural nets, including Inception V3 trained on ImageNet Competition data and Inception V1 trained on Places365 data, among others, making it super easy for a novice programmer to implement them. These two pre-trained neural nets make it easy to: 1) identify objects in images; and 2) tell a user what sort of landscape an image represents.

The Wolfram Documentation Center also makes this especially easy.

First, we need to talk a little bit about metadata stored in digital photographs. When you snap a photo on your smartphone or digital camera, all sorts of data is saved with the image, including the location where the picture was taken. The exchangeable image file format (EXIF) is a standard developed in 1985 that organizes the types of metadata stored. For our purposes, we’re interested in geolocation so we can make a map of our dog walk.

To demonstrate how image metadata works, let’s start with a picture of my cats, Chairman Meow and Detective Biscuits, and see how the Wolfram Language can extract where the picture was taken using `GeoPosition`.

Fantastic—we have some coordinates. Now let’s see where on Earth this is on a map using `GeoGraphics`. We’ve defined those coordinates as `"catlocation"` above. Now, using coordinates from the image of my cats dutifully keeping the bed warm, we define our map as `"catmap"`.

Excellent. Now let’s use the zoom tool to show where these coordinates are.

So, yes, this picture was taken in my old neighborhood in Baton Rouge, Louisiana, where I was a grad student before starting work here at Wolfram Research (Geaux Tigers!). Very cool, and good to know that data is stored in my iPhone pictures.

Just for fun, let’s see if Wolfram’s built-in knowledge has any data on my old neighborhood, known as the Garden District, using Ctrl + = followed by input, which allows us to use the Wolfram Language’s free-form input capability.

Fantastic. Let’s get a quick map of Baton Rouge.

And how about a map showing the rough outline of the Garden District?

This provides us with a rough outline of the Garden District in Baton Rouge. There is a ton of built-in socioeconomic data in the Wolfram Language we could look at, but we’ll save that for another blog post.

Since I’m not a dog owner, I asked a coworker if I could join her on a dog walk to snap some pictures to first map the walk using nothing but photos from a smartphone, then use a neural net to identify the content of the photos.

So I can just drag and drop the photos into a notebook and define them, and then their locations, from their metadata.

OK, that’s pretty good, but let’s add some points, change up some colors and add tooltips to show the images at each stop.

In the Wolfram Notebook (or if you download the CDF), when you hover the mouse over each point, it shows the image that was taken at that location. Very cool.

Next, let’s move on to a new feature in Wolfram Language 11.1, pre-trained neural nets. First, we need to define our net, and we’re going to use Inception V3 trained on ImageNet Competition data, implemented with one impressively simple line of code. Inception V3 is a dataset used for image recognition using a convolutional neural network. Sounds complicated, but we can implement it easily and figure out what objects are in the images.

Now all we have to do is put in an image, and we’ll get an accurate result of what our image is.

Fantastic. Let’s try another picture and see how sure the neural net is of its determination by using `"TopProbabilities"` as an option.

So its best guess is a goose at a 0.901 probability. Pretty good. Let’s try another image.

The net is less sure in this case what kind of dog this is—which is exactly the right answer: this dog, Maxie, is a mixed border collie/corgi.

Just for fun, let’s see what it thinks my cats are.

Wow. That’s pretty impressive, since they are indeed tabby cats. And I guess it’s reasonable there’s a 0.0446 probability my cats look like platypuses.

Along our dog walk, we took a picture of a pond. Let’s use a different pre-trained neural net to see if it can tell us what kind of landscape it is. For this, we’ll use Inception V1 trained on Places365 data, again implemented with one line of amazingly simple code. This particular neural net identifies landscapes based on a training set of images taken at various locations.

Very neat. Let’s try something else.

OK, sure, a pasture rather than a park. But you can see that it had other things in mind. This kind of reminds me of Stephen Wolfram’s blog post on overcoming artificial stupidity. Since it was written, neural nets (and Wolfram|Alpha) have certainly come a long way.

And let’s see if we can confuse it with a picture of a sculpture.

Not bad.

As you can see, the pre-trained neural nets work really well. If you want to train your own net, Wolfram Language 11.1 makes it painless to do so. So next time you’re out for a walk taking pictures of random objects and want to recreate your walk from images, you can use these new features in the Wolfram Cloud or Wolfram Desktop.

Happy coding, and happy National Pet Day!

Vibration measurement is an important tool for fault detection in rotating machinery. In a previous post, “How to Use Your Smartphone for Vibration Analysis, Part 1: The Wolfram Language,” I described how you can perform a vibration analysis with a smartphone and Mathematica. Here, I will show how this technique can be improved upon using the Wolfram Cloud. One advantage with this is that I don’t need to bring my laptop.

The configuration of files may vary depending on whether you use an iPhone or an Android. I used an iPhone. I also used Dropbox for storing the sound file. At the moment, the file format from the default app in iPhone, Voice Memo, saves the sound file only in M4A format, and that format can’t yet be imported with the Wolfram Language. Therefore, I used the app Awesome Voice Recorder, AVR, which can also store the file in Dropbox.

- First, create a Wolfram Cloud Account.
- Install the Wolfram Cloud app on your smartphone.
- If you don’t already have a Dropbox account, create one.
- Download an app that can save the sound file in MP3 format to your Dropbox. I use AVR.

You are now ready to create a vibration analysis tool.

With my iPhone, I recorded 10 seconds of the sound of my finger rubbing around the rim of a wine glass.

I named the file “wineglass.mp3” and stored it on Dropbox. Some simple code for the Fourier transform of that sound looks like the following:

It is now easy to deploy the same code to the Wolfram Cloud. Name the file “FFT Basic”. With modifications to obtain the sound data from a form instead of the file system, the FFT code looks like this:

Now go to the top-left menu in the mobile cloud app. Select Deployments and then Instant Web Forms, and you’ll see the executable file. Click it and then click to select the file, and the fast Fourier transform (FFT) results will be shown.

The tool can be generalized by allowing the user to change the scale and draw a line to show known frequencies. Typical frequencies that you may want to follow in vibration analysis are gear mesh frequencies, imbalances, blade-passing frequencies and known resonances. For the wine glass, we don’t currently have a known disturbance or resonance frequency, but I included a line anyway:

Deploy the code to the Wolfram Cloud, naming the file “FFT Analysis”:

With the improved code, I can also work with the FFT plot in the app:

This was my first experience with the Wolfram Cloud, and within an hour I had an application ready to go. The resonant frequency of the glass is 771 Hz, a frequency the human voice is capable of, so the opera trick of shattering a glass by singing is plausible. We tried using a loudspeaker playing a tone at 771 Hz, but had no success. There will be YouTube videos when we have managed this.

]]>Industry 4.0, the fourth industrial revolution of cyber-physical systems, is on the way! With it come sensors and boards that are much cheaper than they used to be. All of these components are connected through some kind of network or cloud so that they are able to talk to each other. This is where the OPC Unified Architecture (OPC UA) comes in. OPC UA is a machine-to-machine communication protocol for industrial automation. It is designed to be the successor to the older OPC Classic protocol that is bound to the Microsoft-only process exchange COM/DCOM (if you are interested in the OPCClassic library for Wolfram SystemModeler, you can find it here).

Having a standard for communication in the upcoming Industry 4.0 is important for every new connected device. With a carefully considered protocol, all of the following aspects can be included in a smart way:

- Platform independence
- Security
- Extensibility
- Information modeling

More information can be found on the OPC Foundation website. As you can see, a lot of focus is on the secure communication between all devices in the OPC UA network. Some of the companies that are extensively using OPC UA include Siemens, SAP, Honeywell and Yokogawa.

While the OPC Classic (also known as OPC Data Access) has had widespread adoption within many industries, including everything from paper and pulp to automotive manufacturing, it relies on traditional elements such as programmable logic controllers (PLCs) and SCADA (supervisory control and data acquisition) control systems.

With the advent of the Internet of Things and Industry 4.0, this traditional hierarchy is quickly being replaced with more flexible solutions. Computing power is becoming cheaper, and making smart sensors with built-in logic is no longer prohibitively pricy.

Since OPC UA is cross-platform compatible, it can be run on almost any device. This can, for example, be great for prototyping where you want to use the existing IT infrastructure but use inexpensive equipment for the trials. OPC UA can be configured on devices such as a Raspberry Pi, and even on your smartphone.

As in all networks and systems, some complex behavior might need to be calculated or modeled, and this is where Wolfram SystemModeler comes in. To easily be able to set up a control system that can connect to the machines using the OPC UA protocol will both provide easy integration and work as an efficient and cost-effective testing platform.

Here’s an example of creating an easy-to-understand test model in SystemModeler using the OPCUA library. As you can see in the model diagram below, a tank is set up (top left) with some input controlling the inflow to the tank. The tank model then communicates its value (current liquid level) through OPC UA to an OPC UA server. This value is then read from the server and fed as an input to a control loop that changes the liquid level of the second tank (bottom right).

We have created the entire system in this model, i.e. in this test model we are not connecting to any hardware and can communicate through the OPC UA protocol and test the control system and its response on the system.

Now we are ready to connect with hardware and run this model in real time, communicating with some real-world tanks. We could just modify the model slightly by replacing the tank models with components that connect to the real tanks through an OPC server, and use the measured values of the tanks as inputs to the SystemModeler model. That model would look like this:

Connecting to your OPC system with SystemModeler is actually this easy! In the system without the tanks, we get the measurements from the OPC server by reading the nodes “tank1” and “tank2” that are connected to sensors measuring the values of the real-world tanks. In the same way, we set the desired flow to tank 2 by writing a signal to the OPC server on node “tank2”. The node that we write this value to is then connected to a valve that adjusts accordingly.

The result of having an OPCUA Modelica Library is that we have all the power of Modelica, SystemModeler and Mathematica in our communications network. That means that all analysis tools, control systems and computational power can be integrated directly in the OPC UA industrial system network.

Imagine a scenario where you would like to connect the SystemModeler simulation to external hardware; for example, if you would like to send a control signal from an OPC UA server to a simple Arduino or Raspberry Pi, then combining OPC UA with the ModelPlug library would make a lot of sense. The ModelPlug library allows you to connect via the Firmata standard to devices such as Arduino boards.

Let’s use this in an example. In our server room, we have a Raspberry Pi that monitors and logs the temperature in the room. An OPC UA server was installed on the Raspberry Pi that allows any other OPC-configured client on the network to poll the server for the current temperature data. The OPCUA library is one such client. Using only two blocks, we can get the real-time temperature from the sensor into our simulation models.

With two additional blocks from the ModelPlug library, we can feed that data into an Arduino board, where it can be used to move, for example, an actuator. For now, let us construct a very simple prototype whereby the onboard light on the Arduino blinks at different intervals depending on the room temperature.

With just a few blocks (and no lines of code), we can thus create a simple prototype of a logic control system that is ready to run without any of the pain that usually accompanies network programming. The light will always blink for 0.1 seconds, but the time when it blinks again will change depending on the temperature. If we press the play button, our Arduino will start blinking right away:

The higher the temperature, the shorter the blink interval, and vice versa.

This can also be great for testing out code and programs that you want to run in a production environment without risking damaging sensitive equipment. In Modelica, code and compiled executables can be imported, connected and used as a block in your model.

Wolfram SystemModeler can be utilized very effectively when combining different Modelica libraries, such as ModelPlug and OPCUA, to either create virtual prototypes of systems or test them in the real world using cheap devices like Arduinos or Raspberry Pis. The tested code for the system can then easily be exported to another system, or used directly in a HIL (hardware-in-the-loop) simulation.

]]>This year’s Wolfram Summer School will be held at Bentley University in Waltham, Massachusetts, from June 18 to July 7, 2017.

Maybe you’re a researcher who wants to study the dynamics of galaxies with cellular automata. Perhaps you’re an innovator who wants to create a way to read time from pictures of analog clocks or build a new startup with products that use RFID (radio-frequency identification) to track objects. You might be an educator who wants to build an algebra feedback system or write a textbook that teaches designers how to disinvent the need for air conditioning. These projects show the diversity and creativity of some of our recent Summer School students. Does this sound like you? If so, we want you to join us this summer!

We offer three tracks: Science, Technology & Innovation and Educational Innovation. As a student, you will design and execute a challenging, practical project and acquire an incredible amount of knowledge along the way. At the Summer School, you will gain skills that are useful universally and network with a community of smart, creative people from all over the world. Many of our students go on to work for us in some capacity, whether that be as an employee, intern, ambassador, etc.

If you’re accepted to the Summer School, you will be mentored by top-notch staff who have made powerful and important contributions to the Wolfram Language, Mathematica, Wolfram|Alpha and more—including the inventor of the Wolfram Language, Stephen Wolfram.

The Wolfram Language has a vast depth of built-in algorithms and knowledge. Recent feature additions include a dynamic and ever-growing set of machine learning capabilities (including neural networks), as well as advancements in the areas of image processing, visualization, 3D printing, geographic information systems, natural language processing and computational linguistics and database and web frameworks, among many others.

Whatever your background, profession or future plans, the Wolfram Summer School can equip you with the computational tools you need to bring any idea to reality.

For more information, visit our website.

]]>I’m pleased to announce the release today of Version 11.1 of the Wolfram Language (and Mathematica). As of now, Version 11.1 is what’s running in the Wolfram Cloud—and desktop versions are available for immediate download for Mac, Windows and Linux.

What’s new in Version 11.1? Well, actually a remarkable amount. Here’s a summary:

There’s a lot here. One might think that a .1 release, nearly 29 years after Version 1.0, wouldn’t have much new any more. But that’s not how things work with the Wolfram Language, or with our company. Instead, as we’ve built our technology stack and our procedures, rather than progressively slowing down, we’ve been continually accelerating. And now even something as notionally small as the Version 11.1 release packs an amazing amount of R&D, and new functionality.

There’s one very obvious change in 11.1: the documentation looks different. We’ve spiffed up the design, and on the web we’ve made everything responsive to window width—so it looks good even when it’s in a narrow sidebar in the cloud, or on a phone.

We’ve also introduced some new design elements—like the mini-view of the Details section. Most people like to see examples as soon as they get to a function page. But it’s important not to forget the Details—and the mini-view provides what amounts to a little “ad” for them.

Here’s a word cloud of new functions in Version 11.1:

Altogether there are an impressive 132 new functions—together with another 98 that have been significantly enhanced. These functions represent the finished output of our R&D pipeline in just the few months that have elapsed since Version 11.0 was released.

When we bring out a major “integer” release—like Version 11—we’re typically introducing a bunch of complete, new frameworks. In (supposedly) minor .1 releases like Version 11.1, we’re not aiming for complete new frameworks. Instead, there’s typically new functionality that’s adding to existing frameworks—together with a few (sometimes “experimental”) hints of major new frameworks to come. Oh, and if a complete, new framework does happen to be finished in time for a .1 release, it’ll be there too.

One very hot area in which Version 11.1 makes some big steps forward is neural nets. It’s been exciting over the past few years to see this area advance so quickly in the world at large, and it’s been great to see the Wolfram Language at the very leading edge of what’s being done.

Our goal is to define a very high-level interface to neural nets, that’s completely integrated into the Wolfram Language. Version 11.1 adds some new recently developed building blocks—in particular 30 new types of neural net layers (more than double what was there in 11.0), together with automated support for recurrent nets. The concept is always to let the neural net be specified symbolically in the Wolfram Language, then let the language automatically fill in the details, interface with low-level libraries, etc. It’s something that’s very convenient for ordinary feed-forward networks (tensor sizes are all knitted together automatically, etc.)—but for recurrent nets (with variable-length sequences, etc.) it’s something that’s basically essential if one’s going to avoid lots of low-level programming.

Another crucial feature of neural nets in the Wolfram Language is that it’s set up to be automatic to encode images, text or whatever in an appropriate way. In Version 11.1, `NetEncoder` and `NetDecoder` cover a lot of new cases—extending what’s integrated into the Wolfram Language.

It’s worth saying that underneath the whole integrated symbolic interface, the Wolfram Language is using a very efficient low-level library—currently MXNet—which takes care of optimizing ultimate performance for the latest CPU and GPU configurations. By the way, another feature enhanced in 11.1 is the ability to store complete neural net specifications, complete with encoders, etc. in a portable and reusable .wlnet file.

There’s a lot of power in treating neural nets as symbolic objects. In 11.1 there are now functions like `NetMapOperator` and `NetFoldOperator` that symbolically build up new neural nets. And because the neural nets are symbolic, it’s easy to manipulate them, for example breaking them apart to monitor what they’re doing inside, or systematically comparing the performance of different structures of net.

In some sense, neural net layers are like the machine code of a neural net programming system. In 11.1 there’s a convenient function—`NetModel`—that provides pre-built trained or untrained neural net models. As of today, there are a modest number of famous neural nets included, but we plan to add more every week—surfing the leading edge of what’s being developed in the neural net research community, as well as adding some ideas of our own.

Here’s a simple example of `NetModel` at work:

Now apply the network to some actual data—and see it gets the right answer:

But because the net is specified symbolically, it’s easy to “go inside” and “see what it’s thinking”. Here’s a tiny (but neat) piece of functional programming that visualizes what happens at every layer in the net—and, yes, in the end the first square lights up red to show that the output is 0:

Neural nets are an important method for machine learning. But one of the core principles of the Wolfram Language is to provide highly automated functionality, independent of underlying methods. And in 11.1 there’s a bunch more of this in the area of machine learning. (As it happens, much of it uses the latest deep learning neural net methods, but for users what’s important is what it does, not how it does it.)

My personal favorite new machine learning function in 11.1 is `FeatureSpacePlot`. Give it any collection of objects, and it’ll try to lay them out in an appropriate “feature space”. Like here are the flags of countries in Europe:

What’s particularly neat about `FeatureSpacePlot` is that it’ll immediately use sophisticated pre-trained feature extractors for specific classes of input—like photographs, texts, etc. And there’s also now a `FeatureNearest` function that’s the analog of `Nearest`, but operates in feature space. Oh, and all the stuff with `NetModel` and pre-trained net models immediately flows into these functions, so it becomes trivial, say, to experiment with “meaning spaces”:

Particularly with `NetModel`, there are all sorts of very useful few-line neural net programs that one can construct. But in 11.1 there are also some major new, more infrastructural, machine learning capabilities. Notable examples are `ActiveClassification` and `ActivePrediction`—which build classifiers and predictors by actively sampling a space, learning how to do this as efficiently as possible. There will be lots of end-user applications for `ActiveClassification` and `ActivePrediction`, but for us internally the most immediately interesting thing is that we can use these functions to optimize all sorts of meta-algorithms that are built into the Wolfram Language.

Version 11.0 began the process of making audio—like images—something completely integrated into the Wolfram Language. Version 11.1 continues that process. For example, for desktop systems, it adds `AudioCapture` to immediately capture audio from a microphone on your computer. (Yes, it’s nontrivial to automatically handle out-of-core storage and processing of large audio samples, etc.) Here’s an example of me saying “hello”:

Play Audio

You can immediately take this, and, say, make a cepstrogram (yes, that’s another new audio function in 11.1):

Version 11.1 has quite an assortment of new features for images and visualization. `CurrentImage` got faster and better. `ImageEffect` has lots of new effects added. There are new functions and options to support the latest in computational photography and computational microscopy. And images got even more integrated as first-class objects—that one can for example now immediately do arithmetic with:

Something else with images—that I’ve long wanted—is the ability to take a bitmap image, and find an approximate vector graphics representation of it:

`TextRecognize` has also become significantly stronger—in particular being able to pick out structure in text, like paragraphs and columns and the like.

Oh, and in visualization, there are things like `GeoBubbleChart`, here showing the populations of the largest cities in the US:

There’s lots of little (but nice) stuff too. Like support for arbitrary callouts in pie charts, optimized labeling of discrete histograms and full support of scaling functions for `Plot3D`, etc.

There’s always new data flowing into the Wolfram Knowledgebase, and there’ve also been plenty of completely new things added since 11.0: 130,000+ new types of foods, 250,000+ atomic spectral lines, 12,000+ new mountains, 10,000+ new notable buildings, 300+ types of neurons, 650+ new waterfalls, 200+ new exoplanets (because they’ve recently been discovered), and lots else (not to mention 7,000+ new spelling words). There’s also, for example, much higher resolution geo elevation data—so now a 3D-printable Mount Everest can have much more detail:

Something new in Version 11.1 are integrated external services—that allow built-in functions that work by calling external APIs. Two examples are `WebSearch` and `WebImageSearch`. Here are thumbnail images found by searching the web for “colorful birds”:

For the heck of it, let’s see what `ImageIdentify` thinks they are (oh, and in 11.1. `ImageIdentify` is much more accurate, and you can even play with the network inside it by using `NetModel`):

Since `WebSearch` and `WebImageSearch` use external APIs, users need to pay for them separately. But we’ve set up what we call Service Credits to make this seamless. (Everything’s in the language, of course, so there’s for example `$ServiceCreditsAvailable`.)

There will be quite a few more examples of integrated services in future versions, but in 11.1, beyond web searching, there’s also `TextTranslation`. `WordTranslation` (new in 11.0) handles individual word translation for hundreds of languages; now in 11.1 `TextTranslation` uses external services to also translate complete pieces of text between several tens of languages:

A significant part of our R&D organization is devoted to continuing our three-decade effort to push the frontiers of mathematical and algorithmic computation. So it should come as no surprise that Version 11.1 has all sorts of advances in these areas. There’s space-filling curves, fractal meshes, ways to equidistribute points on a sphere:

There are new kinds of spatial, robust and multivariate statistics. There are Hankel transforms, built-in modular inverses, and more. Even in differentiation, there’s something new: *n*^{th} order derivatives, for symbolic *n*:

Here’s something else about differentiation: there are now functions `RealAbs` and `RealSign` that are versions of `Abs` and `Sign` that are defined only by the real axis, and so can freely be differentiated, without having to give any assumptions about variables.

In Version 10.1, we introduced the function `AnglePath`, that computes a path from successive segments with specified lengths and angles. At some level, `AnglePath` is like an industrial-scale version of Logo (or Scratch) “turtle geometry”. But `AnglePath` has turned out to be surprisingly broadly useful, so for Version 11.1, we’ve generalized it to `AnglePath3D` (and, yes, there are all sorts of subtleties about frames and Euler angles and so on).

When we say “June 23, 1988”, what do we mean? The beginning of that day? The whole 24-hour period from midnight to midnight? Or what? In Version 11.1 we’ve introduced the notion of granularity for dates—so you can say whether a date is supposed to represent a day, a year, a second, a week starting Sunday—or for that matter just an instant in time.

It’s a nice application of the symbolic character of the Wolfram Language—and it solves all sorts of problems in dealing with dates and times. In a way, it’s a little like precision for numbers, but it’s really its own thing. Here for example is how we now represent “the current week”:

Here’s the current decade:

This is the next month from now:

This says we want to start from next month, then add 7 weeks—getting another month:

And here’s the result to the granularity of a month:

Talking of dates, by the way, one of the things that’s coming across the system is the use of `Dated` as a qualifier, for example for properties of entities of the knowledgebase (so this asks for the population of New York City in 1970):

I’m very proud of how smooth the Wolfram Language is to use—and part of how that’s been achieved is that for 30 years we’ve been continually polishing it. We’re always making sure everything fits perfectly together—and we’re always adding little conveniences.

One of our principles is that if there’s a lump of computational work that people repeatedly do, then so long as there’s a good name for it (that people will readily remember, and readily recognize when they see it in a piece of code), it should be inserted as a built-in function. A very simple example in Version 11.1 is `ReverseSort`:

(One might think: what’s the point of this—it’s just `Reverse[Sort[...]]`. But it’s very common to want to map what’s now `ReverseSort` over a bunch of objects, and it’s smoother to be able to say `ReverseSort /@ ...` rather than `Reverse[Sort[#]]& /@ ...` or `Reverse@*Sort /@ ...`).

Another little convenience: `Nearest` now has special ways to specify useful things to return. For example, this gives the distances from 2.7 to the 5 nearest values:

`CellularAutomaton` is a very broad function. Version 11.1 makes it easier to use for common cases by allowing rules to be specified by associations with labeled elements:

We’re always trying to make sure that patterns we’ve established get used as broadly as possible. Like in 11.1, you can use `UpTo` in lots of new places, like in `ImageSize` specifications.

We also always trying to make sure that things are as general as possible. Like `IntegerString` now works not only with the standard representation of integers, but also with traditional ones used for different purposes around the world:

And `IntegerName` can also now handle different types and languages of names:

And there are lots more examples—each making the experience of using the Wolfram Language just a little bit smoother.

If you make a definition list ` x=7`, or

`PersistentValue` lets you specify a name (like `"foo"`), and a `"persistence location"`. (It also allows options like `PersistenceTime` and `ExpirationDate`.) The persistence location can just be `"KernelSession"`—which means that the value lasts only for a single kernel session. But it can also be `"FrontEndSession"`, or `"Local"` (meaning that it should be the same whenever you use the same computer), or `"Cloud"` (meaning that it’s globally synchronized across the cloud).

`PersistentValue` is pretty general. It lets you have values in different places (like different private clouds, for example); then there’s a `$PersistencePath` that defines the order to look at them in, and a `MergingFunction` that specifies how (if at all) the values should be merged.

One of the goals of the Wolfram Language is to be able to interact as broadly as possible with all computational ecosystems. Version 11.1 adds support for the M4A audio format, the .ubj binary JSON format, as well as .ini files and Java .properties files. There’s also a new function, `BinarySerialize`, that converts any Wolfram Language expression into a new binary (“WXF”) form, optimized for speed or size:

`BinaryDeserialize` gets it back:

Version 11.0 introduced `WolframScript`—a command-line interface to the Wolfram Language, running either locally or in the cloud. With `WolframScript` you can create standalone Wolfram Language programs that run from the shell. There are several enhancements to `WolframScript` itself in 11.1, but there’s also now a new `New` > `Script` menu item that gives you a notebook interface for creating .wls (=“Wolfram Language Script”) files to be run by `WolframScript`:

One of the major ways the Wolfram Language has advanced in recent times has been in its deployability. We’ve put a huge amount of work into making sure that the Wolfram Language can be robustly deployed at scale (and there are now lots of examples of successes out in the world).

We make updates to the Wolfram Cloud very frequently (and invisibly), steadily enhancing server performance and user interface capabilities. Along with Version 11.1 we’ve made some major updates. There are a few signs of this in the language.

Like there’s now an option `AutoCopy` that can be set for any cloud object—and that means that every time the object is accessed, one should get a fresh copy of it. This is very useful if, for example, you want to have a notebook that lots of people can separately modify. (“Explore these ideas; here’s a notebook to start from…”, etc.)

`CloudDeploy[APIFunction[...]]` makes it extremely easy to deploy web APIs. In Version 11.1 there are some options to automate aspects of how those APIs behave. For example, there’s `AllowedCloudExtraParameters`, which lets you say that APIs can have parameters like `"_timeout"` or `"_geolocation"` automated. There’s also `AllowedCloudParameterExtensions` (no, it’s not the longest name in the system; that honor currently goes to `MultivariateHypergeometricDistribution`). What `AllowedCloudParameterExtensions` does is to let you say not just ` x=value`, but

Another thing about Version 11.1 is that it’s got various features added to support private instances of the Wolfram Cloud—and our major new Wolfram Enterprise Private Cloud product (with a first version released late last year). For example, in addition to `$WolframID` for the Wolfram Cloud, there’s also `$CloudUserID` that’s generalized to allow authentication on private clouds. And inside the system, there are all sorts of new capabilities associated with “multicloud authentication” and so on. (Yes, it’s a complicated area—but the symbolic character of the Wolfram Language lets one handle it rather beautifully.)

OK, so I’ve summarized some of what’s in 11.1. There’s a lot more I could say. New functions, and new capabilities—each of which is going to be exciting to somebody. But to me it’s actually pretty amazing that I can write this long a post about a .1 release! It’s a great testament to the strength of the R&D pipeline—and to how much can be done with the framework we’ve built in the Wolfram Language over the past 30 years.

We always work on a portfolio of projects—from small ones that get delivered very quickly, to ones that may take a decade or more to mature. Version 11.1 has the results of several multi-year projects (e.g. in machine learning, computational geometry, etc.), and a great many shorter projects. It’s exciting for us to be able to deliver the fruits of our efforts, and I look forward to hearing what people do with Version 11.1—and to seeing successive versions continue to be developed and delivered.

]]>In Mathematica 10, we introduced support for anatomical structures in `EntityValue`, which included, among many other things, a “Graphics3D” property that returns a 3D model of the anatomical structure in question. We also styled the models and aligned them with the concepts in the Unified Medical Language System (UMLS).

The output is a standard `Graphics3D` expression, but it contains metadata in the form of an `Annotation` that allows for additional exploration.

This means each model knows what lower-level structures it’s made of.

I should note that the models being used are not just eye candy. If that were the intent, we might explore low polygon count models and use textures for a more realistic appearance. But these models are not just good for looking at—you can also use them for computation. For meaningful results, you need accurate models, which may be large, so you may need to be patient when downloading and/or rendering them. Keep in mind that some entities, like the brain, have lots of internal structures. So the model may be larger than you expect, although you may not see this internal structure from the outside.

One example of using these models for computation would be calculating the eigenfrequencies of an air-filled transverse colon (let the jokes fly). Finite element mesh (FEM) calculations are common in medical research today. By retrieving the mesh from `AnatomyData`, we can perform computations on the model.

Now we can obtain the resonant frequencies of the transverse colon.

You can use `Sound` to listen to the resonant frequencies of the transverse colon.

Play Audio

We can find the surface area of the model by directly operating on the `MeshRegion`. Units are in square millimeters.

To compute the volume, we need to convert the `MeshRegion` into a `BoundaryMeshRegion` first. Units are in cubic millimeters.

You can even compute the distance between the transverse colon and the region centroid of the large intestine. Units are in millimeters.

Lower-resolution models will give lower-quality results.

To make it easy to render anatomical structures, we introduced `AnatomyPlot3D`.

`AnatomyPlot3D` allows directives to modify its “primitives,” similar in syntax to `Graphics3D`.

The output of `AnatomyPlot3D` doesn’t contain the original `Entity` objects. They get resolved at evaluation time to 3D graphics primitives, and the normal rules of the graphics language apply to them. The output of `AnatomyPlot3D` is a `Graphics3D` object.

Because `AnatomyPlot3D` can be thought of as an extension to the `Graphics3D` language, you can mix anatomical structures with normal `Graphics3D` primitives.

In `AnatomyPlot3D`, the `Graphics3D` language has been extended at multiple levels to make use of anatomical structures. Within `AnatomyPlot3D`, anatomical entities work just like graphics primitives do in `Graphics3D`. But they can also be used in place of coordinates in `Graphics3D` primitives like `Point` and `Line`. In that context, the entity represents the region centroid of the structure. This allows you, for example, to draw a line from one entity to another.

This concept can be applied to annotate a 3D model using arrows and labels.

You can refer to the subparts of structures and apply styles to them using `AnatomyForm`. It applies only to anatomical structures (not `Graphics3D` primitives) and supports a couple of different forms. The following example behaves similar to a standalone directive, except that it applies only to the anatomical structure, not the `Cuboid`.

A more useful form can be used to style subparts.

`AnatomyForm` works by allowing you to associate specified directives with specified entities that may or may not exist in the structure you are visualizing. Any `Directive` supported by Graphics3D is supported, including `Lighting`, `Opacity`, `EdgeForm`, `FaceForm`, `ClipPlanes` and any combination thereof. In addition to supporting styles for specific entities, `AnatomyForm` also supports a default case via the use of an underscore. The following example shows the left humerus in red and everything else transparent and backlit, giving an X-ray-like appearance.

`PlotRange` can make use of entities to constrain what would otherwise include all of the referenced entities. The following example includes several bones of the left lower limb, but the `PlotRange` is centered on the left patella and padded out from there by a fixed amount.

`SkinStyle` is a convenient way to include any enclosing skin that can be found around the specified entities.

The default styling can be overridden.

You can use `ClipPlanes` to peel away layers of skin.

Use multiple clip planes to peel away successive anatomical layers.

Apply geometric transformations to anatomical structures to rotate them. The following example includes many bones in the skull, but applies a rotation to just the elements of the lower jaw around the temporomandibular joint.

A mix of styles can be useful for highlighting different tissue types in the head.

A similar approach can be used in the torso for different organs.

Here is an advanced example showing the use of `ClipPlanes` to remove muscles below a specific cutting plane.

Inner structures can be differentiated using styles, in this case within the brain.

Here are links to some animations rendered using `AnatomyPlot3D`:

Scaling Transform Applied to the Bones of the Skull

As time goes on, we will continue to add additional models and tools allowing you to explore human anatomy more deeply.

To download this post as a CDF, click here. New to CDF? Get your copy for free with this one-time download.

]]>Until now, it has been difficult for the average engineer to perform simple vibration analysis. The initial cost for simple equipment, including software, may be several thousand dollars—and it is not unusual for advanced equipment and software to cost ten times as much. Normally, a vibration specialist starts an investigation with a hammer impact test. An accelerometer is mounted on a structure, and a special impact hammer is used to excite the structure at several locations in the simplest and most common form of hammer impact testing. The accelerometer and hammer-force signals are recorded. Modal analysis is then used to get a preliminary understanding of the behavior of the system. The minimum equipment requirements for such a test are an accelerometer, an impact hammer, amplifiers, a signal recorder and analysis software.

I’ve figured out how to use the Wolfram Language on my smartphone to sample and analyze machine vibration and noise, and to perform surprisingly good vibration analysis. I’ll show you how, and give you some simple Wolfram Language code to get you started.

Throughout the history of the development of machines, vibration and sound measurements have been important issues. There are two reasons for this:

- Vibrations are a source of noise, can decrease operating performance and can also cause workplace injuries such as “vibration white fingers,” also known as hand-arm vibration syndrome (HAVS), resulting from continuous use of handheld machinery vibrating at certain key frequencies.
- Vibrations can indicate wear or defects of the bearings and gears, and so analysis of vibrations is a useful and commonly used tool for fault detection.

The many applications of vibration analysis have led to a huge number engineers studying this subject in universities all over the world. The research area of “machine vibrations” has its own conferences and publications. Companies specialize in developing different kinds of equipment and services. Most large machine-building industries have departments specializing in vibrations.

So here we go: recording the sound of a vibrating machine with an iPhone is simple. I used the iPhone’s built-in digital voice recorder, Voice Memo. The recording can be converted to an MP3 file just by saving it in the MP3 format in iTunes. The default Apple M4A format cannot be used directly in the Wolfram Language. If iTunes is not available, there are a lot of free converters on the internet that will change your M4A files to MP3 files.

Play Audio

I’ll use an industrial gearbox as an example. The example itself is not that important, but it does suggest possible areas where the method can be used. Some data and dimensions have been modified from the original application.

A gearbox is making a lot of noise. What is the problem?

In the figure below, a motor drives the input shaft. The shaft speed is reduced by the gearbox. The power is used by the output shaft. Typically, the motor is a diesel or electrical engine; in this example, it is a diesel engine. The number of teeth are z1 = 23, z2 = 65, z3 = 27 and z4 = 43, respectively. The largest wheels are about one meter in diameter.

We ran the engine at 1200 rpm, recording five seconds of sound with my iPhone. Converted to MP3, the sound file was named “measurement.mp3”. Then all I needed to do was import it into the Wolfram Language to plot the frequency spectrum.

The excitation frequency in Hertz of the gear contact on the input shaft is z1*rpm/60, and

(z1/z2)*z3 *rpm/60 on the output shaft. Marking the histogram for the input and output shafts’ gear mesh excitation frequencies—red and green, respectively—makes it clear that these frequencies are correlated with the spectrogram.

Let’s make the spectrogram interactive: often we don’t want to use the whole sound file, so we add an option to select a start and end time within the file. Let’s also make it possible to change the rpm.

The analysis toolkit is ready. The options `PlotRange`, `MaxValue` and `Manipulate` in the plot above are set manually. Of course, this can be developed further. But we stop here to keep it simple.

So what happened with the real application investigation? Well, the same analysis as above was performed over the whole rpm region. The maxima of the peaks at each rpm are plotted below.

The input shaft’s highest value is 747.7 rpm, and the output shaft’s is 1,800 rpm. Both become excited at about 287 Hz, the gearbox fundamental resonance frequency. Note that

747.7/60*23 = 286.6 Hz and 1800*23/65*27/60 = 286.6 Hz

We concluded that the gear mesh was not optimized for smooth running and the gearbox had a bad resonance frequency. We opened the gearbox and were able to confirm wear on the teeth, which suggested possibilities for improving the contact pattern. We improved contact by selecting an optimum helical angle, as well as tip relief and correct crowning. Tip relief is a surface modification of a tooth profile; a small amount of material is removed near the tip of the gear tooth. Crowning is a surface modification in the lengthwise direction to prevent contact at the teeth ends, where a small amount of material is removed near the end of the gear tooth.

I have used this method, utilizing my smartphone and the Wolfram Language, several times for real-world and often complex investigations and applications. Often, measurement specialists have already gotten involved before I arrive. But they may have missed the basics because they are using comprehensive measurement programs.

The method I describe here may sometimes yield a similar—or even better—understanding of the problem in just a few minutes at no cost. Well worth trying.

]]>

I will touch on two aspects of her scientific work that were mentioned in the film: orbit calculations and reentry calculations. For the orbit calculation, I will first exactly follow what Johnson did and then compare with a more modern, direct approach utilizing an array of tools made available with the Wolfram Language. Where the movie mentions the solving of differential equations using Euler’s method, I will compare this method with more modern ones in an important problem of rocketry: computing a reentry trajectory from the rocket equation and drag terms (derived using atmospheric model data obtained directly from within the Wolfram Language).

The movie doesn’t focus much on the math details of the types of problems Johnson and her team dealt with, but for the purposes of this blog, I hope to provide at least a flavor of the approaches one might have used in Johnson’s day compared to the present.

One of the earliest papers that Johnson coauthored, “Determination of Azimuth Angle at Burnout for Placing a Satellite over a Selected Earth Position,” deals with the problem of making sure that a satellite can be placed over a specific Earth location after a specified number of orbits, given a certain starting position (e.g. Cape Canaveral, Florida) and orbital trajectory. The approach that Johnson’s team used was to determine the azimuthal angle (the angle formed by the spacecraft’s velocity vector at the time of engine shutoff with a fixed reference direction, say north) to fire the rocket in, based on other orbital parameters. This is an important step in making sure that an astronaut is in the correct location for reentry to Earth.

In the paper, Johnson defines a number of constants and input parameters needed to solve the problem at hand. One detail to explain is the term “burnout,” which refers to the shutoff of the rocket engine. After burnout, orbital parameters are essentially “frozen,” and the spacecraft moves solely under the Earth’s gravity (as determined, of course, through Newton’s laws). In this section, I follow the paper’s unit conventions as closely as possible.

For convenience, some functions are defined to deal with angles in degrees instead of radians. This allows for smoothly handling time in angle calculations:

Johnson goes on to describe several other derived parameters, though it’s interesting to note that she sometimes adopted values for these rather than using the values returned by her formulas. Her adopted values were often close to the values obtained by the formulas. For simplicity, the values from the formulas are used here.

Semilatus rectum of the orbit ellipse:

Angle in orbit plane between perigee and burnout point:

Orbit eccentricity:

Orbit period:

Eccentric anomaly:

To describe the next parameter, it’s easiest to quote the original paper: “The requirement that a satellite with burnout position *φ*1, *λ*1 pass over a selected position *φ*2, *λ*2 after the completion of *n* orbits is equivalent to the requirement that, during the first orbit, the satellite pass over an equivalent position with latitude *φ*2 the same as that of the selected position but with longitude *λ*2e displaced eastward from *λ*2 by an amount sufficient to compensate for the rotation of the Earth during the *n* complete orbits, that is, by the polar hour angle *n ω _{E} T*. The longitude of this equivalent position is thus given by the relation”:

Time from perigee for angle *θ*:

Part of the final solution is to determine values for intermediate parameters *δλ*_{1-2e} and *θ*_{2e}. This can be done in a couple of ways. First, I can use `ContourPlot` to obtain a graphical solution via equations 19 and 20 from the paper:

`FindRoot` can be used to find the solutions numerically:

Of course, Johnson didn’t have access to `ContourPlot` or `FindRoot`, so her paper describes an iterative technique. I translated the technique described in the paper into the Wolfram Language, and also solved for a number of other parameters via her iterative method. Because the base computations are for a spherical Earth, corrections for oblateness are included in her method:

Graphing the value of *θ*2e for the various iterations shows a quick convergence:

I can convert the method in a `FindRoot` command as follows (this takes the oblateness effects into account in a fully self-consistent manner and calculates values for all nine variables involved in the equations):

Interestingly, even the iterative root-finding steps of this more complicated system converge quite quickly:

With the orbital parameters determined, it is desirable to visualize the solution. First, some critical parameters from the previous solutions need to be extracted:

Next, the latitude and longitude of the satellite as a function of azimuth angle need to be derived:

*φ*s and *λ*s are the latitudes and longitudes as a function of *θ*s:

The satellite ground track can be constructed by creating a table of points:

Johnson’s paper presents a sketch of the orbital solution including markers showing the burnout, selected and equivalent positions. It’s easy to reproduce a similar plain diagram here:

For comparison, here is her original diagram:

A more visually useful version can be constructed using `GeoGraphics`, taking care to convert the geocentric coordinates into geodetic coordinates:

Today, virtually every one of us has, within immediate reach, access to computational resources far more powerful than those available to the entirety of NASA in the 1960s. Now, using only a desktop computer and the Wolfram Language, you can easily find direct numerical solutions to problems of orbital mechanics such as those posed to Katherine Johnson and her team. While perhaps less taxing of our ingenuity than older methods, the results one can get from these explorations are no less interesting or useful.

To solve for the azimuthal angle *ψ* using more modern methods, let’s set up parameters for a simple circular orbit beginning after burnout over Florida, assuming a spherically symmetric Earth (I’ll not bother trying to match the orbit of the Johnson paper precisely, and I’ll redefine certain quantities from above using the modern SI system of units). Starting from the same low-Earth orbit altitude used by Johnson, and using a little spherical trigonometry, it is straightforward to derive the initial conditions for our orbit:

The relevant physical parameters can be obtained directly from within the Wolfram Language:

Next, I obtain a differential equation for the motion of our spacecraft, given the gravitational field of the Earth. There are several ways you can model the gravitational potential near the Earth. Assuming a spherically symmetric planet and utilizing a Cartesian coordinate system throughout, the potential is merely:

Alternatively, you can use a more realistic model of Earth’s gravity, where the planet’s shape is taken to be an oblate ellipsoid of revolution. The exact form of the potential from such an ellipsoid (assuming constant mass-density over ellipsoidal shells), though complicated (containing multiple elliptic integrals), is available through `EntityValue`:

For a general homogeneous triaxial ellipsoid, the potential contains piecewise functions:

Here, *κ* is the largest root of *x*^{2}/(*a*^{2}+*κ*)+*y*^{2}/(*b*^{2}+*κ*)+*z*^{2}/(*c*^{2}+*κ*)=1. In the case of an oblate ellipsoid, the previous formula can be simplified to contain only elementary functions…

… where *κ*=((2 *z*^{2} (*a*^{2}-*c*^{2}+*x*^{2}+*y*^{2})+(-*a*^{2}+*c*^{2}+*x*^{2}+*y*^{2})^{2}+*z*^{4})^{1/2}-*a*^{2}-*c*^{2}+*x^{2}+y^{2}+z^{2})*/2.

A simpler form that is widely used in the geographic and space science community, and that I will use here, is given by the so-called International Gravity Formula (IGF). The IGF takes into account differences from a spherically symmetric potential up to second order in spherical harmonics, and gives numerically indistinguishable results from the exact potential referenced previously. In terms of four measured geodetic parameters, the IGF potential can be defined as follows:

I could easily use even better values for the gravitational force through `GeogravityModelData`. For the starting position, the IGF potential deviates only 0.06% from a high-order approximation:

With these functional forms for the potential, finding the orbital path amounts to taking a gradient of the potential to get the gravitational field vector and then applying Newton’s third law. Doing so, I obtain the orbital equations of motion for the two gravity models:

I am now ready to use the power of `NDSolve` to compute orbital trajectories. Before doing this, however, it will be nice to display the orbital path as a curve in three-dimensional space. To give these curves context, I will plot them over a texture map of the Earth’s surface, projected onto a sphere. Here I construct the desired graphics objects:

While the orbital path computed in an inertial frame forms a periodic closed curve, when you account for the rotation of the Earth, it will cause the spacecraft to pass over different points on the Earth’s surface during each subsequent revolution. I can visualize this effect by adding an additional rotation term to the solutions I obtain from `NDSolve`. Taking the number of orbital periods to be three (similar to John Glenn’s flight) for visualization purposes, I construct the following `Manipulate` to see how the orbital path is affected by the azimuthal launch angle *ψ*, similar to the study in Johnson’s paper. I’ll plot both a path assuming a spherical Earth (in white) and another path using the IGF (in green) to get a sense of the size of the oblateness effect (note that the divergence of the two paths increases with each orbit):

In the notebook attached to this blog, you can see this `Manipulate` in action, and note the speed at which each new solution is obtained. You would hope that Katherine Johnson and her colleagues at NASA would be impressed!

Now, varying the angle *ψ* at burnout time, it is straightforward to calculate the position of the spacecraft after, say, three revolutions:

The movie also mentions Euler’s method in connection with the reentry phase. After the initial problem of finding the azimuthal angle has been solved, as done in the previous sections, it’s time to come back to Earth. Rockets are fired to slow down the orbiting body, and a complex set of events happens as the craft transitions from the vacuum of space to an atmospheric environment. Changing atmospheric density, rapid deceleration and frictional heating all become important factors that must be taken into account in order to safely return the astronaut to Earth. Height, speed and acceleration as a function of time are all problems that need to be solved. This set of problems can be solved with Euler’s method, as done by Katherine Johnson, or by using the differential equation-solving functionality in the Wolfram Language.

For simple differential equations, one can get a detailed step-by-step solution with a specified quadrature method. An equivalent of Newton’s famous *F* = *m a* for a time-dependent mass *m*(*t*) is the so-called ideal rocket equation (in one dimension)…

… where *m*(*t*) is the rocket mass, *v*_{e} the engine exhaust velocity and *m ^{‘}_{p}*(

With initial and final conditions for the mass, I get the celebrated rocket equation (Tsiolkovsky 1903):

The details of solving this equation with concrete parameter values and e.g. with the classical Euler method I can get from Wolfram|Alpha. Here are those details together with a detailed comparison with the exact solution, as well as with other numerical integration methods:

Following the movie plot, I will now implement a minimalistic ODE model of the reentry process. I start by defining parameters that mimic Glenn’s flight:

I assume that the braking process uses 1% of the thrust of the stage-one engine and runs, say, for 60 seconds. The equation of motion is:

Here, **F**_{grav} is the gravitational force, **F**_{exhaust}(*t*) the explicitly time-dependent engine force and **F**_{friction}(* x*(

For the height-dependent air density, I can conveniently use the `StandardAtmosphereData` function. I also account for a height-dependent area because of the parachute that opened about 8.5 km above ground:

This gives the following set of coupled nonlinear differential equations to be solved. The last `WhenEvent``[...]` specifies to end the integration when the capsule reaches the surface of the Earth. I use vector-valued position and velocity variables X and V:

With these definitions for the weight, exhaust and air friction force terms…

… total force can be found via:

In this simple model, I neglected the Earth’s rotation, intrinsic rotations of the capsule, active flight angle changes, supersonic effects on the friction force and more. The explicit form of the differential equations in coordinate components is the following. The equations that Katherine Johnson solved would have been quite similar to these:

Supplemented by the initial position and velocity, it is straightforward to solve this system of equations numerically. Today, this is just a simple call to `NDSolve`. I don’t have to worry about the method to use, step size control, error control and more because the Wolfram Language automatically chooses values that guarantee meaningful results:

Here is a plot of the height, speed and acceleration as a function of time:

Plotting as a function of height instead of time shows that the exponential increase of air density is responsible for the high deceleration. This is not due to the parachute, which happens at a relatively low altitude. The peak deceleration happens at a very high altitude as the capsule goes from a vacuum to an atmospheric environment very quickly:

And here is a plot of the vertical and tangential speed of the capsule in the reentry process:

Now I repeat the numerical solution with a fixed-step Euler method:

Qualitatively, the solution looks the same as the previous one:

For the used step size of the time integration, the accumulated error is on the order of a few percent. Smaller step sizes would reduce the error (see the previous Wolfram|Alpha output):

Note that the landing time predicted by the Euler method deviates only 0.11% from the previous time. (For comparison, if I were to solve the equation with two modern methods, say `"BDF"` vs. `"Adams"`, the error would be smaller by a few orders of magnitude.)

Now, the reentry process generates a lot of heat. This is where the heat shield is needed. At which height is the most heat per area *q* generated? Without a detailed derivation, I can, from purely dimensional grounds, conjecture :

Many more interesting things could be calculated (Hicks 2009), but just like the movie had to fit everything into two hours and seven minutes, I will now end my blog for the sake of time. I hope I can be pardoned for the statement that, with the Wolfram Language, the sky’s the limit.

To download this post as a Computable Document Format (CDF) file, click here. New to CDF? Get your copy for free with this one-time download.

]]>In my recent Wolfram Community post, “How many animals can one find in a random image?,” I looked into the pareidolia phenomenon from the viewpoints of pixel clusters in random (2D) black-and-white images. Here are some of the shapes I found, extracted, rotated, smoothed and colored from the connected black pixel clusters of a single 800×800 image of randomly chosen, uncorrelated black-and-white pixels.

For an animation of such shapes arising, changing and disappearing in a random gray-level image with slowly time-dependent pixel values, see here. By looking carefully at a selected region of the image, at the slowly changing, appearing and disappearing shapes, one frequently can “see” animals and faces.

The human mind quickly sees faces, animals, animal heads and ghosts in these shapes. Human evolution has optimized our vision system to recognize predators and identify food. Our recognition of an eye (or a pair of eyes) in the above shapes is striking. For the neuropsychological basis of seeing faces in a variety of situations where actual faces are absent, see Martinez-Conde2016.

A natural question: is this feature of our vision specific to 2D silhouette shapes, or does the same thing happen for 3D shapes? So here, I will look at random shapes in 3D images and the 2D projections of these 3D shapes. Various of the region-related functions that were added in the last versions of the Wolfram Language make this task possible, straightforward and fun.

I should explain the word Arp-imals from the title. With the term “Arp-imals” I refer to objects in the style of the sculptures by Jean Arp, meaning smooth, round, randomly curved biomorphic forms. Here are some examples.

Forms such as these hide frequently in 3D images made from random black-and-white voxels. Here is a quick preview of shapes we will extract from random images.

We will also encounter what I call Moore-iens, in the sense of the sculptures by the slightly later artist Henry Moore.

With some imagination, one can also see forms of possible aliens in some of the following 2D shapes. (See Domagal-Goldman2016 for a discussion of possible features of alien life forms.)

As in the 2D case, we start with a random image: this time, a 3D image of voxels of values 0 and 1. For reproducibility, we will seed the random number generator. The Arp-imals are so common that virtually any seed produces them. And we start with a relatively small image. Larger images will contain many more Arp-imals.

Hard to believe at first, but the blueprints of the above-shown 3D shapes are in the last 3D cube. In the following, we will extract them and make them more visible.

As in the 2D case, we again use `ImageMesh` to extract connected regions of white cells. The regions still look like a random set of connected polyhedra. After smoothing the boundaries, nicer shapes will arise.

Here are the regions, separated into non-touching ones, using the function `ConnectedMeshComponents`. The function `makeShapes3D` combines the image creation, the finding of connected voxel regions, and the region separation.

For demonstration purposes, in the next example, we use a relatively low density of white voxels to avoid the buildup of a single large connected region that spans the whole cube.

Here are the found regions individually colored in their original positions in the 3D image.

To smooth the outer boundaries, thereby making the shapes more animal-, Arp-imal- and alien-like, the function `smooth3D` (defined in the accompanying notebook) is a quick-and-dirty implementation of the Loop subdivision algorithm. (As the 3D shapes might have a higher genus, we cannot use `BSplineSurface` directly, which would have been the direct equivalent to the 2D case.) Here are successive smoothings of the third of the above-extracted regions.

Using the region plot theme `"SmoothShading"` of the function `BoundaryMeshRegion`, we can add normals to get the feeling of a genuinely smooth boundary.

And for less than $320 one can obtain this Arp-inspired piece in brass. A perfect, unique, stunning post-Valentine’s gift. For hundreds of alternative shapes to print, see below. We use `ShellRegion` to reduce the price and save some internal material by building a hollow region.

Here is the smoothing procedure shown for another of the above regions.

And for three more.

Many 3D shapes can now be extracted from random and nonrandom 3D images. The next input calculates the region corresponding to lattice points with coprime coordinates.

In the above example, we start with a coarse 3D region, which feels polyhedral due to the obvious triangular boundary faces. It is only after the smoothing procedure that we obtain “interesting-looking” 3D shapes. The details of the applied smoothing procedure do not matter, as long as sharp edges and corners are softened.

Human perception is optimized for smooth shapes, and most plants and animals have smooth boundaries. This is why we don’t see anything interesting in the collection of regions returned from `ImageMesh` applied to a 3D image. This is quite similar to the 2D case. In the following visualization of the 2D case, we start with a set of randomly selected points. Then we connect these points through a curve. Filling the curve yields a deformed checkerboard-like pattern that does not remind us of a living being. Rasterizing the filled curve in a coarse-grained manner still does not remind us of organic shapes. The connected region, and especially the smoothed region, do remind most humans of living beings.

The following `Manipulate` (available in the notebook) allows us to explore the steps and parameters involved in an interactive session.

And here is a corresponding 3D example.

In her reply to my community post, Marina Shchitova showed some examples of faces and animals in shadows of hands and fingers. Some classic examples from the Cassel1896 book are shown here.

So, what do projections/shadows of the above two 3D shapes look like? (For a good overview of the use of shadows in art at the time and place of the young Arp, see Forgione1999.)

The projections of these 3D shapes are exactly the types of shapes I encountered in the connected smoothed components of 2D images. The function `projectTo2D` takes a 3D graphic complex and projects it into a thin slice parallel to the three coordinate planes. The result is still a `Graphics3D` object.

These are the 2×3 projections of the above two 2D shapes. Most people recognize animal shapes in the projections.

We get exactly these projections if we just look at the 3D shape from a larger distance with a viewpoint and direction parallel to the coordinate axes.

For comparison, here are three views of the first object from very far away, effectively showing the projections.

By rotating the 3D shapes, we can generate a large variety of different shapes in the 2D projections. The following `Manipulate` allows us to explore the space of projections’ shapes interactively. Because we need the actual rotated coordinates, we define a function `rotate`, rather than using the built-in function `Rotate`.

Here is an array of 16 projections into the x-z plane for random orientations of the 3D shape.

The initial 3D image does not have to be completely random. In the next example, we randomly place circles in 3D and color a voxel white if the circle intersects the voxel. As a result, the 3D shapes corresponding to the connected voxel regions have a more network-like shape.

2D projection shapes of 3D animals typically have no symmetry. Even if an animal has a symmetry, the visible shape from a given viewpoint and a given animal posture does not have a symmetry. But most animals have a bilateral symmetry. I will now use random images that have a bilateral symmetry. As a result, many of the resulting shapes will also have a bilateral symmetry. Not all of the shapes, because some regions do not intersect the symmetry plane. Bilateral symmetry is important for the classic Rorschach inkblot test: “The mid-line appears to attract the patient’s attention with a sort of magical power,” noted Rorschach (Schott2013). The function `makeSymmetricShapes3D` will generate regions with bilateral symmetry.

Here are some examples.

And here are smoothed and colored versions of these regions. The viewpoint is selected in such a way as to make the bilateral symmetry most obvious.

To get a better feeling for the connection between the pixel values of the 3D image and the resulting smoothed shape, the next `Manipulate` allows us to specify each pixel value for a small-sized 3D image. The grids/matrices of checkboxes represent the voxel values of one-half of a 3D image with bilateral symmetry.

Randomly and independently selecting the voxel value of a 3D image makes it improbable that very large connected components without many holes form. Using instead random functions and deriving voxel values from these random continuous functions yields different-looking types of 3D shapes that have a larger uniformity over the voxel range. Effectively, the voxel values are no longer totally uncorrelated.

Here are some examples of the resulting regions, as well as their smoothed versions.

Our notebook contains in the initialization section more than 400 selected regions of “interesting” shapes classified into five types (mostly arbitrarily, but based on human feedback).

Let’s look at some examples of these regions. Here is a list of some selected ones. Many of these shapes found in random 3D images could be candidates for Generation 8 Pokémon or even some new creatures, tentatively dubbed Mathtubbies.

Many of the shapes are reminiscent of animals, even if the number of legs and heads is not always the expected number.

To see all of the 400+ shapes from the initialization cells, one could carry out the following.

`Do[Print[Framed[Style[t, Bold, Gray], FrameStyle -> Gray]];
Do[Print[Rasterize @ makeRegion @ r], {r, types[t]}], {t,Keys[types]}]`

The shapes in the list above were manually selected. One could now go ahead and partially automate the finding of interesting animal-looking shapes and “natural” orientations using machine learning techniques. In the simplest case, we could just use `ImageIdentify`.

This seems to be a stegosaurus-poodle crossbreed. But we will not pursue this direction here and now, but rather return to the 2D projections. (For using software to find faces in architecture and general equipment, see Hong2014.)

Before returning to the 2D projections, we will play for a moment with the 3D shapes generated and modify them for a different visual appearance.

For instance, we could tetrahedralize the regions and fill the tetrahedra with spheres.

Or with smaller tetrahedra.

Or add some spikes.

Or fill the shapes with cubes.

Or thicken or thin the shapes.

Or thicken and add thin bands.

Or just add a few stripes as camouflage.

Or model the inside through a wireframe of cylinders.

Or build a stick figure.

Or fill the surface with a tube.

Or a Kelvin inversion.

If we look at the 2D projections of some of these 3D shapes, we can see again (with some imagination) a fair number of faces, witches, kobolds, birds and other animals. Here are some selected examples. We show the 3D shape in the original orientation, a randomly oriented version of the 3D shape, and the three coordinate-plane projections of the randomly rotated 3D shape.

Unsurprisingly, some are recognizable 3D shapes, like these projections that look like bird heads.

Others are much more surprising, like the two heads in the projections of the two-legged-two-finned frog-dolphin.

Different orientations of the 3D shape can yield quite different projections.

For the reader’s amusement, here are some more projections.

Now that we have looked at 2D projections of 3D shapes, the next natural step would be to look at 3D projections of 4D shapes. And while there is currently no built-in function `Image4D`, it is not too difficult to implement for finding the connected components of white 4D voxels. We implement this through the graph theory function `ConnectedComponents` and consider two 4D voxels as being connected by an edge if they share a common 3D cube face. As an example, we use a 10*10*10*10 voxel 4D image. `makeVoxels4D` makes the 4D image data and `whitePositionQ` marks the position of the white voxels for quick lookup.

The 4D image contains quite a few connected components.

Here are the four canonical projections of the 4D complex.

We package the finding of the connected components into a function `getConnected4DVoxels`.

We also define a function `rotationMatrix4D` for conveniently carrying rotations in the six 2D planes of the 4D space.

Once we have the 3D projections, we can again use the above function to smooth the corresponding 3D shapes.

In the absence of Tralfamadorian vision, we can visualize a 4D connected voxel complex, rotate this complex in 4D, then project into 3D, smooth the shapes and then project into 2D. For a single 4D shape, this yields a large variety of possible 2D projections. The function `projectionGrid3DAnd2D` projects the four 3D projections canonically into 2D. This means we get 12 projections. Depending on the shape of the body, some might be identical.

We show the 3D shape in a separate graphic so as not to cover up the projections. Again, many of the 2D projections, and also some of the 3D projections, remind us of animal shapes.

The following `Manipulate` allows us to rotate the 4D shape. The human mind sees many animal shapes and faces.

Here is another example, with some more scary animal heads.

We could now go to 5D images, but this will very probably bring no new insights. To summarize some of the findings: After rotation and smoothing, a few percent of the connected regions of black voxels in random 3D images have an animal-like shape, or an artistic rendering of an animal-like shape. A large fraction (~10%) of the projections of these 3D shapes into 2D pronouncedly show the pareidolia phenomenon, in the sense that we believe we can recognize animals and faces in these projections. 4D images, due to the voxel count that increases exponentially with dimension, yield an even larger number of possible animal and face shapes.

To download this post as a CDF, click here. New to CDF? Get your copy for free with this one-time download.

]]>