Wolfram Computation Meets Knowledge

New in 14: Chemistry, Life Sciences & Knowledgebase

Two years ago we released Version 13.0 of Wolfram Language. Here are the updates in chemistry, life sciences and the Knowledgebase since then, including the latest features in 14.0. The contents of this post are compiled from Stephen Wolfram’s Release Announcements for 13.1, 13.2, 13.3 and 14.0.

Chemistry

Chemical Computation (January 2024)

We began the process of introducing chemical computation into the Wolfram Language in Version 12.0, and by Version 13 we had good coverage of atoms, molecules, bonds and functional groups. Now in Version 14 we’ve added coverage of chemical formulas, amounts of chemicals—and chemical reactions.

Here’s a chemical formula, that basically just gives a “count of atoms”:

Now here are specific molecules with that formula:

Let’s pick one of these molecules:

Now in Version 14 we have a way to represent a certain quantity of molecules of a given type—here 1 gram of methylcyclopentane:

ChemicalConvert can convert to a different specification of quantity, here moles:

And here a count of molecules:

But now the bigger story is that in Version 14 we can represent not just individual types of molecules, and quantities of molecules, but also chemical reactions. Here we give a “sloppy” unbalanced representation of a reaction, and ReactionBalance gives us the balanced version:

And now we can extract the formulas for the reactants:

We can also give a chemical reaction in terms of molecules:

But with our symbolic representation of molecules and reactions, there’s now a big thing we can do: represent classes of reactions as “pattern reactions”, and work with them using the same kinds of concepts as we use in working with patterns for general expressions. So, for example, here’s a symbolic representation of the hydrohalogenation reaction:

Now we can apply this pattern reaction to particular molecules:

Here’s a more elaborate example, in this case entered using a SMARTS string:

Here we’re applying the reaction just once:

And now we’re doing it repeatedly

in this case generating longer and longer molecules (which in this case happen to be polypeptides):

Representing Amounts of Chemicals (June 2022)

Molecule lets one symbolically represent a molecule. Quantity lets one symbolically represent a quantity with units. In Version 13.1 we now have the new construct ChemicalInstance that’s in effect a merger of these, allowing one to represent a certain quantity of a certain chemical.

This gives a symbolic representation of 1 liter of acetone (by default at standard temperature and pressure):

We can ask what the mass of this instance of this chemical is:

ChemicalConvert lets us do a conversion returning particular units:

Here’s instead a conversion to moles:

This directly gives the amount of substance that 1 liter of acetone corresponds to:

This generates a sequence of straight-chain hydrocarbons:

Here’s the amount of substance corresponding to 1 g of each of these chemicals:

ChemicalInstance lets you specify not just the amount of a substance, but also its state, in particular temperature and pressure. Here we’re converting 1 kg of water at 4° C to be represented in terms of volume:

Chemistry as Rule Application: Symbolic Pattern Reactions (June 2022)

At the core of the Wolfram Language is the abstract idea of applying transformations to symbolic expressions. And at some level one can view chemistry and chemical reactions as a physical instantiation of this idea, where one’s not dealing with abstract symbolic constructs, but instead with actual molecules and atoms.

In Version 13.1 we’re introducing PatternReaction as a symbolic representation for classes of chemical reactions—in effect providing an analog for chemistry of Rule for general symbolic expressions.

Here’s an example of a “pattern reaction”:

The first argument specifies a pair of “reactant” molecule patterns to be transformed into “product” molecule patterns. The second argument specifies which atoms in which reactant molecules map to which atoms in which product molecules. If you mouse over the resulting pattern reaction, you’ll see corresponding atoms “light up”:

Given a pattern reaction, we can use ApplyReaction to apply the reaction to concrete molecules:

Here are plots of the resulting product molecules:

The molecule patterns in the pattern reaction are matched against subparts of the concrete molecules, then the transformation is done, leaving the other parts of the molecules unchanged. In a sense it’s the direct analog of something like

where the b in the symbolic expression is replaced, and the result is “knitted back” to fill in where the b used to be.

You can do what amounts to various kinds of “chemical functional programming” with ApplyReaction and PatternReaction. Here’s an example where we’re essentially building up a polymer by successive nesting of a reaction:

It’s often convenient to build pattern reactions symbolically using Wolfram Language “chemical primitives”. But PatternReaction also lets you specify reactions as SMARTS strings:

Computable Species

Millions of Species Become Computable (January 2024)

We first introduced computable data on biological organisms back when Wolfram|Alpha was released in 2009. But in Version 14—following several years of work—we’ve dramatically broadened and deepened the computable data we have about biological organisms.

So for example here’s how we can figure out what species have cheetahs as predators:

And here are pictures of these:

Here’s a map of countries where cheetahs have been seen (in the wild):

We now have data—curated from a great many sources—on more than a million species of animals, as well as most of the plants, fungi, bacteria, viruses and archaea that have been described. And for animals, for example, we have nearly 200 properties that are extensively filled in. Some are taxonomic properties:

Some are physical properties:

Some are genetic properties:

Some are ecological properties (yes, the cheetah is not the apex predator):

It’s useful to be able to get properties of individual species, but the real power of our curated computable data shows up when one does larger-scale analyses. Like here’s a plot of the lengths of genomes for organisms with the longest ones across our collection of organisms:

Or here’s a histogram of the genome lengths for organisms in the human gut microbiome:

And here’s a scatterplot of the lifespans of birds against their weights:

Following the idea that cheetahs aren’t apex predators, this is a graph of what’s “above” them in the food chain:

Knowledgebase

The Knowledgebase Is Always Growing (January 2024)

Every minute of every day, new data is being added to the Wolfram Knowledgebase. Much of it is coming automatically from real-time feeds. But we also have a very large-scale ongoing curation effort with humans in the loop. We’ve built sophisticated (Wolfram Language) automation for our data curation pipeline over the years—and this year we’ve been able to increase efficiency in some areas by using LLM technology. But it’s hard to do curation right, and our long-term experience is that to do so ultimately requires human experts being in the loop, which we have.

So what’s new since Version 13.0? 291,842 new notable current and historical people; 264,467 music works; 118,538 music albums; 104,024 named stars; and so on. Sometimes the addition of an entity is driven by the new availability of reliable data; often it’s driven by the need to use that entity in some other piece of functionality (e.g. stars to render in AstroGraphics). But more than just adding entities there’s the issue of filling in values of properties of existing entities. And here again we’re always making progress, sometimes integrating newly available large-scale secondary data sources, and sometimes doing direct curation ourselves from primary sources.

A recent example where we needed to do direct curation was in data on alcoholic beverages. We have very extensive data on hundreds of thousands of types of foods and drinks. But none of our large-scale sources included data on alcoholic beverages. So that’s an area where we need to go to primary sources (in this case typically the original producers of products) and curate everything for ourselves.

So, for example, we can now ask for something like the distribution of flavors of different varieties of vodka (actually, personally, not being a consumer of such things, I had no idea vodka even had flavors…):

But beyond filling out entities and properties of existing types, we’ve also steadily been adding new entity types. One recent example is geological formations, 13,706 of them:

So now, for example, we can specify where T. rex have been found

and we can show those regions on a map: