On June 23 we celebrate the 30th anniversary of the launch of Mathematica. Most software from 30 years ago is now long gone. But not Mathematica. In fact, it feels in many ways like even after 30 years, we’re really just getting started. Our mission has always been a big one: to make the world as computable as possible, and to add a layer of computational intelligence to everything.
Our first big application area was math (hence the name “Mathematica”). And we’ve kept pushing the frontiers of what’s possible with math. But over the past 30 years, we’ve been able to build on the framework that we defined in Mathematica 1.0 to create the whole edifice of computational capabilities that we now call the Wolfram Language—and that corresponds to Mathematica as it is today.
From when I first began to design Mathematica, my goal was to create a system that would stand the test of time, and would provide the foundation to fill out my vision for the future of computation. It’s exciting to see how well it’s all worked out. My original core concepts of language design continue to infuse everything we do. And over the years we’ve been able to just keep building and building on what’s already there, to create a taller and taller tower of carefully integrated capabilities.
It’s fun today to launch Mathematica 1.0 on an old computer, and compare it with today:
Yes, even in Version 1, there’s a recognizable Wolfram Notebook to be seen. But what about the Mathematica code (or, as we would call it today, Wolfram Language code)? Well, the code that ran in 1988 just runs today, exactly the same! And, actually, I routinely take code I wrote at any time over the past 30 years and just run it.
Of course, it’s taken a lot of longterm discipline in language design to make this work. And without the strength and clarity of the original design it would never have been possible. But it’s nice to see that all that daily effort I’ve put into leadership and consistent language design has paid off so well in longterm stability over the course of 30 years.
Back in 1988, Mathematica was a big step forward in highlevel computing, and people were amazed at how much it could do. But it’s absolutely nothing compared to what Mathematica and the Wolfram Language can do today. And as one way to see this, here’s how the different major areas of functionality have “lit up” between 1988 and today:
There were 551 builtin functions in 1988; there are now more than 5100. And the expectations for each function have vastly increased too. The concept of “superfunctions” that automate a swath of algorithmic capability already existed in 1988—but their capabilities pale in comparison to our modern superfunctions.
Back in 1988 the core ideas of symbolic expressions and symbolic programming were already there, working essentially as they do today. And there were also all sorts of functions related to mathematical computation, as well as to things like basic visualization. But in subsequent years we were able to conquer area after area.
Partly it’s been the growth of raw computer power that’s made new areas possible. And partly it’s been our ability to understand what could conceivably be done. But the most important thing has been that—through the integrated design of our system—we’ve been able to progressively build on what we’ve already done to reach one new area after another, at an accelerating pace. (Here’s a plot of function count by version.)
I recently found a todo list I wrote in 1991—and I’m happy to say that now, in 2018, essentially everything on it has been successfully completed. But in many cases it took building a whole tower of capabilities—over a large number of years—to be able to achieve what I wanted.
From the very beginning—and even from projects of mine that preceded Mathematica—I had the goal of building as much knowledge as possible into the system. At the beginning the knowledge was mostly algorithmic, and formal. But as soon we could routinely expect network connectivity to central servers, we started building in earnest what’s now our immense knowledgebase of computable data about the real world.
Back in 1988, I could document pretty much everything about Mathematica in the 750page book I wrote. Today if we were to print out the online documentation it would take perhaps 36,000 pages. The core concepts of the system remain as simple and clear as they ever were, though—so it’s still perfectly possible to capture them even in a small book.
Thirty years is basically half the complete history of modern digital computing. And it’s remarkable—and very satisfying—that Mathematica and the Wolfram Language have had the strength not only to persist, but to retain their whole form and structure, across all that time.
Thirty years ago Mathematica (all 2.2 megabytes of it) came in boxes available at “neighborhood software stores”, and was distributed on collections of floppy disks (or, for larger computers, on various kinds of magnetic tapes). Today one just downloads it anytime (about 4 gigabytes), accessing its knowledgebase (many terabytes) online—or one just runs the whole system directly in the Wolfram Cloud, through a web browser. (In a curious footnote to history, the web was actually invented back in 1989 on a collection of NeXT computers that had been bought to run Mathematica.)
Thirty years ago there were “workstation class computers” that ran Mathematica, but were pretty much only owned by institutions. In 1988, PCs used MSDOS, and were limited to 640K of working memory—which wasn’t enough to run Mathematica. The Mac could run Mathematica, but it was always a tight fit (“2.5 megabytes of memory required; 4 megabytes recommended”)—and in the footer of every notebook was a memory gauge that showed you how close you were to running out of memory. Oh, yes, and there were two versions of Mathematica, depending on whether or not your machine had a “numeric coprocessor” (which let it do floatingpoint arithmetic in hardware rather than in software).
Back in 1988, I had got my first cellphone—which was the size of a shoe. And the idea that something like Mathematica could “run on a phone” would have seemed preposterous. But here we are today with the Wolfram Cloud app on phones, and Wolfram Player running natively on iPads (and, yes, they don’t have virtual memory, so our tradition of tight memory management from back in the old days comes in very handy).
In 1988, computers that ran Mathematica were always things you plugged into a power outlet to use. And the notion of, for example, using Mathematica on a plane was basically inconceivable (well, OK, even in 1981 when I lugged my Osborne 1 computer running CP/M onto a plane, I did find one power outlet for it at the very back of a 747). It wasn’t until 1991 that I first proudly held up at a talk a Compaq laptop that was (creakily) running Mathematica off batteries—and it wasn’t routine to run Mathematica portably for perhaps another decade.
For years I used to use 1989^1989 as my test computation when I tried Mathematica on a new machine. And in 1989 I would usually be counting the seconds waiting for the computation to be finished. (1988^1988 was usually too slow to be useful back in 1988: it could take minutes to return.) Today, of course, the same computation is instantaneous. (Actually, a few years ago, I did the computation again on the first Raspberry Pi computer—and it again took several seconds. But that was a $25 computer. And now even it runs the computation very fast.)
The increase in computer speed over the years has had not only quantitative but also qualitative effects on what we’ve been able to do. Back in 1988 one basically did a computation and then looked at the result. We talked about being able to interact with a Mathematica computation in real time (and there was actually a demo on the NeXT computer that did a simple case of this even in 1989). But it basically took 18 years before computers were routinely fast enough that we could implement Manipulate and Dynamic—with “Mathematica in the loop”.
I considered graphics and visualization an important feature of Mathematica from the very beginning. Back then there were “paint” (bitmap) programs, and there were “draw” (vector) programs. We made the decision to use the thennew PostScript language to represent all our graphics output resolutionindependently.
We had all sorts of computational geometry challenges (think of all those little shattered polygons), but even back in 1988 we were able to generate resolutionindependent 3D graphics, and in preparing for the original launch of Mathematica we found the “most complicated 3D graphic we could easily generate”, and ended up with the original icosahedral “spikey”—which has evolved today into our rhombic hexecontahedron logo:
In a sign of a bygone software era, the original Spikey also graced the elegant, but whimsical, Mathematica startup screen on the Mac:
Back in 1988, there were commandline interfaces (like the Unix shell), and there were word processors (like WordPerfect). But it was a new idea to have “notebooks” (as we called them) that mixed text, input and output—as well as graphics, which more usually were generated in a separate window or even on a separate screen.
Even in Mathematica 1.0, many of the familiar features of today’s Wolfram Notebooks were already present: cells, cell groups, style mechanisms, and more. There was even the same doubledcellbracket evaluation indicator—though in those days longer rendering times meant there needed to be more “entertainment”, which Mathematica provided in the form of a bouncingstringfigure wait cursor that was computed in real time during the vertical retrace interrupt associated with refreshing the CRT display.
In what would now be standard good software architecture, Mathematica from the very beginning was always divided into two parts: a kernel doing computations, and a front end supporting the notebook interface. The two parts communicated through the MathLink protocol (still used today, but now called WSTP) that in a very modern way basically sent symbolic expressions back and forth.
Back in 1988—with computers like Macs straining to run Mathematica—it was common to run the front end on a local desktop machine, and then have a “remote kernel” on a heftier machine. Sometimes that machine would be connected through Ethernet, or rarely through the internet. More often one would use a dialup connection, and, yes, there was a whole mechanism in Version 1.0 to support modems and phone dialing.
When we first built the notebook front end, we thought of it as a fairly thin wrapper around the kernel—that we’d be able to “dash off” for the different user interfaces of different computer systems. We built the front end first for the Mac, then (partly in parallel) for the NeXT. Within a couple of years we’d built separate codebases for the thennew Microsoft Windows, and for X Windows.
But as we polished the notebook front end it became more and more sophisticated. And so it was a great relief in 1996 when we managed to create a merged codebase that ran on all platforms.
And for more than 15 years this was how things worked. But then along came the cloud, and mobile. And now, out of necessity, we again have multiple notebook front end codebases. Maybe in a few years we’ll be able to merge them again. But it’s funny how the same issues keep cycling around as the decades go by.
Unlike the front end, we designed the kernel from the beginning to be as robustly portable as possible. And over the years it’s been ported to an amazing range of computers—very often as the first serious piece of application software that a new kind of computer runs.
From the earliest days of Mathematica development, there was always a raw commandline interface to the kernel. And it’s still there today. And what’s amazing to me is how often—in some new and unfamiliar situation—it’s really nice to have that raw interface available. Back in 1988, it could even make graphics—as ASCII art—but that’s not exactly in so much demand today. But still, the raw kernel interface is what for example wolframscript uses to provide programmatic access to the Wolfram Language.
There’s much of the earlier history of computing that’s disappearing. And it’s not so easy in practice to still run Mathematica 1.0. But after going through a few early Macs, I finally found one that still seemed to run well enough. We loaded up Mathematica 1.0 from its distribution floppies, and yes, it launched! (I guess the distribution floppies were made the week before the actual release on June 23, 1988; I vaguely remember a scramble to get the final disks copied.)
Needless to say, when I wanted to livestream this, the Mac stopped working, showing only a strange zebra pattern on its screen. Whacking the side of the computer (a typical 1980s remedy) didn’t do anything. But just as I was about to give up, the machine suddenly came to life, and there I was, about to run Mathematica 1.0 again.
I tried all sorts of things, creating a fairly long notebook. But then I wondered: just how compatible is this? So I saved the notebook on a floppy, and put it in a floppy drive (yes, you can still get those) on a modern computer. At first, the modern operating system didn’t know what to do with the notebook file.
But then I added our old “.ma” file extension, and opened it. And… oh my gosh… it just worked! The latest version of the Wolfram Language successfully read the 1988 notebook file format, and rendered the live notebook (and also created a nice, modern “.nb” version):
There’s a bit of funny spacing around the graphics, reflecting the old way that graphics had to be handled back in 1988. But if one just selects the cells in the notebook, and presses Shift + Enter, up comes a completely modern version, now with color outputs too!
Before Mathematica, sophisticated technical computing was at best the purview of a small “priesthood” of technical computing experts. But as soon as Mathematica appeared on the scene, this all changed—and suddenly a typical working scientist or mathematician could realistically expect to do serious computation with their own hands (and then to save or publish the results in notebooks).
Over the past 30 years, we’ve worked very hard to open progressively more areas to immediate computation. Often there’s great technical sophistication inside. But our goal is to be able to let people translate highlevel computational thinking as directly and automatically as possible into actual computations.
The result has been incredibly powerful. And it’s a source of great satisfaction to see how much has been invented and discovered with Mathematica over the years—and how many of the world’s most productive innovators use Mathematica and the Wolfram Language.
But amazingly, even after all these years, I think the greatest strengths of Mathematica and the Wolfram Language are only just now beginning to become broadly evident.
Part of it has to do with the emerging realization of how important it is to systematically and coherently build knowledge into a system. And, yes, the Wolfram Language has been unique in all these years in doing this. And what this now means is that we have a huge tower of computational intelligence that can be immediately applied to anything.
To be fair, for many of the past 30 years, Mathematica and the Wolfram Language were primarily deployed as desktop software. But particularly with the increasing sophistication of the general computing ecosystem, we’ve been able in the past 5–10 years to build out extremely strong deployment channels that have now allowed Mathematica and the Wolfram Language to be used in an increasing range of important enterprise settings.
Mathematica and the Wolfram Language have long been standards in research, education and fields like quantitative finance. But now they’re in a position to bring the tower of computational intelligence that they embody to any area where computation is used.
Since the very beginning of Mathematica, we’ve been involved with what’s now called artificial intelligence (and in recent times we’ve been leaders in supporting modern machine learning). We’ve also been very deeply involved with data in all forms, and with what’s now called data science.
But what’s becoming clearer only now is just how critical the breadth of Mathematica and the Wolfram Language is to allowing data science and artificial intelligence to achieve their potential. And of course it’s satisfying to see that all those capabilities that we’ve built over the past 30 years—and all the design coherence that we’ve worked so hard to maintain—are now so important in areas like these.
The concept of computation is surely the single most important intellectual development of the past century. And it’s been my goal with Mathematica and the Wolfram Language to provide the best possible vehicle to infuse highlevel computation into every conceivable domain.
For pretty much every field X (from art to zoology) there either is now, or soon will be, a “computational X” that defines the future of the field by using the paradigm of computation. And it’s exciting to see how much the unique features of the Wolfram Language are allowing it to help drive this process, and become the “language of computational X”.
Traditional nonknowledgebased computer languages are fundamentally set up as a way to tell computers what to do—typically at a fairly low level. But one of the aspects of the Wolfram Language that’s only now beginning to be recognized is that it’s not just intended to be for telling computers what to do; it’s intended to be a true computational communication language, that provides a way of expressing computational thinking that’s meaningful both to computers and to humans.
In the past, it was basically just computers that were supposed to “read code”. But like a vast generalization of the idea of mathematical notation, the goal with the Wolfram Language is to have something that humans can readily read, and use to represent and understand computational ideas.
Combining this with the idea of notebooks brings us the notion of computational essays—which I think are destined to become a key communication tool for the future, uniquely made possible by the Wolfram Language, with its 30year history.
Thirty years ago it was exciting to see so many scientists and mathematicians “discover computers” through Mathematica. Today it’s exciting to see so many new areas of “computational X” being opened up. But it’s also exciting to see that—with the level of automation we’ve achieved in the Wolfram Language—we’ve managed to bring sophisticated computation to the point where it’s accessible to essentially anyone. And it’s been particularly satisfying to see all sorts of kids—at middleschool level or even below—start to get fluent in the Wolfram Language and the highlevel computational ideas it provides access to.
If one looks at the history of computing, it’s in many ways a story of successive layers of capability being added, and becoming ubiquitous. First came the early languages. Then operating systems. Later, around the time Mathematica came on the scene, user interfaces began to become ubiquitous. A little later came networking and then largescale interconnected systems like the web and the cloud.
But now what the Wolfram Language provides is a new layer: a layer of computational intelligence—that makes it possible to take for granted a high level of builtin knowledge about computation and about the world, and an ability to automate its application.
Over the past 30 years many people have used Mathematica and the Wolfram Language, and many more have been exposed to their capabilities, through systems like WolframAlpha built with them. But what’s possible now is to let the Wolfram Language provide a truly ubiquitous layer of computational intelligence across the computing world. It’s taken decades to build a tower of technology and capabilities that I believe are worthy of this—but now we are there, and it’s time to make this happen.
But the story of Mathematica and the Wolfram Language is not just a story of technology. It’s also a story of the remarkable community of individuals who’ve chosen to make Mathematica and the Wolfram Language part of their work and lives. And now, as we go forward to realize the potential for the Wolfram Language in the world of the future, we need this community to help explain and implement the paradigm that the Wolfram Language defines.
Needless to say, injecting new paradigms into the world is never easy. But doing so is ultimately what moves forward our civilization, and defines the trajectory of history. And today we’re at a remarkable moment in the ability to bring ubiquitous computational intelligence to the world.
But for me, as I look back at the 30 years since Mathematica was launched, I am thankful for everything that’s allowed me to singlemindedly pursue the path that’s brought us to the Mathematica and Wolfram Language of today. And I look forward to our collective effort to move forward from this point, and to contribute to what I think will ultimately be seen as a crucial element in the development of technology and our world.
]]>In a sense, you can view neural network regression as a kind of intermediary solution between true regression (where you have a fixed probabilistic model with some underlying parameters you need to find) and interpolation (where your goal is mostly to draw an eyepleasing line between your data points). Neural networks can get you something from both worlds: the flexibility of interpolation and the ability to produce predictions with error bars like when you do regression.
For those of you who already know about neural networks, I can give a very brief hint as to how this works: you build a randomized neural network with dropout layers that you train like you normally would, but after training you don’t deactivate the dropout layers and keep using them to sample the network several times while making predictions to get a measure of the errors. Don’t worry if that sentence didn’t make sense to you yet, because I will explain all of this in more detail.
To start, let’s do some basic neural network regression on the following data I made by taking points on a bell curve (e.g. the function ) and adding random numbers to it:
✕
exampleData = {{1.8290606952826973`, 0.34220332868351117`}, {0.6221091101205225`, 0.6029615713235724`}, {1.2928624443456638`, 0.14264805848673934`}, {1.7383127604822395`, \ 0.09676233458358859`}, {2.701795903782372`, 0.1256597483577385`}, {1.7400006797156493`, 0.07503425036465608`}, {0.6367237544480613`, 0.8371547667282598`}, {2.482802633037993`, 0.04691691595492773`}, {0.9566109777301293`, 0.3860569423794188`}, {2.551790012296368`, \ 0.037340684890464014`}, {0.6626176509888584`, 0.7670620756823968`}, {2.865357628008809`, 0.1120949485036743`}, \ {0.024445094773154707`, 1.3288343886644758`}, {2.6538667331049197`, \ 0.005468132072381475`}, {1.1353110951218213`, 0.15366247144719652`}, {3.209853579579198`, 0.20621896435600656`}, {0.13992534568622972`, 0.8204487134187859`}, {2.4013110392840886`, \ 0.26232722849881523`}, {2.1199290467312526`, 0.09261482926621102`}, {2.210336371360782`, 0.02664895740254644`}, {0.33732886898809156`, 1.1701573388517288`}, {2.2548343241910374`, \ 0.3576908508717164`}, {1.4077788877461703`, 0.269393680956761`}, {3.210242875591371`, 0.21099679051999695`}, {0.7898064016052615`, 0.6198835029596128`}, {2.1835077887328893`, 0.08410415228550497`}, {0.008631687647122632`, 1.0501425654209409`}, {2.1792531502694334`, \ 0.11606480328877161`}, {3.231947584552822`, 0.2359904673791076`}, \ {0.7980615888830211`, 0.5151437742866803`}} plot = ListPlot[exampleData, PlotStyle > Red] 
A regression neural network is basically a chain of alternating linear and nonlinear layers: the linear layers give your net a lot of free parameters to work with, while the nonlinear layers make sure that things don’t get boring. Common examples of nonlinear layers are the hyperbolic tangent, logistic sigmoid and the ramp function. For simplicity, I will stick with the Ramp nonlinearity, which simply puts kinks into straight lines (meaning that you get regressions that are piecewise linear):
✕
netRamp = NetChain[ {LinearLayer[100], Ramp, LinearLayer[100], Ramp, LinearLayer[]}, "Input" > "Real", "Output" > "Real" ]; trainedRamp = NetTrain[netRamp, <"Input" > exampleData[[All, 1]], "Output" > exampleData[[All, 2]]>, Method > "ADAM", LossFunction > MeanSquaredLossLayer[], TimeGoal > 120, TargetDevice > "GPU"]; Show[Plot[ trainedRamp[x], {x, 3.5, 3.5}, PlotLabel > "Overtrained network"], plot, ImageSize > Full, PlotRange > All] 
As you can see, the network more or less just follows the points because it doesn’t understand the difference between the trend and the noise in the data. In the range above, the mixup between trend and noise is particularly bad. The longer you train the network and the larger your linear layer, the stronger this effect will be. Obviously this is not what you want, since you’re really interested in fitting the trend of the data. Besides: if you really want to fit noise, you could just use interpolation instead. To prevent this overfitting of the data, you regularize (as explained in this tutorial) the network by using any or all of the following: a ValidationSet, regularization or a DropoutLayer. I will focus on the regularization coefficient and on dropout layers (in the next section you’ll see why), so let me briefly explain how they work:
To get a feeling of how these two methods regularize the regression, I made the following parameter sweeps of and :
✕
log\[Lambda]List = Range[5, 1, 1]; regularizedNets = NetTrain[ netRamp, <"Input" > exampleData[[All, 1]], "Output" > exampleData[[All, 2]]>, LossFunction > MeanSquaredLossLayer[], Method > {"ADAM", "L2Regularization" > 10^#}, TimeGoal > 20 ] & /@ log\[Lambda]List; With[{xvals = Range[3.5, 3.5, 0.1]}, Show[ ListPlot[ TimeSeries[Transpose@Through[regularizedNets[xvals]], {xvals}, ValueDimensions > Length[regularizedNets]], PlotLabel > "\!\(\*SubscriptBox[\(L\), \(2\)]\)regularized networks", Joined > True, PlotLegends > Map[StringForm["`1` = `2`", Subscript[\[Lambda], 2], HoldForm[10^#]] &, log\[Lambda]List] ], plot, ImageSize > 450, PlotRange > All ] ] 
✕
pDropoutList = {0.0001, 0.001, 0.01, 0.05, 0.1, 0.5}; dropoutNets = NetChain[ {LinearLayer[300], Ramp, DropoutLayer[#], LinearLayer[]}, "Input" > "Real", "Output" > "Real" ] & /@ pDropoutList; trainedDropoutNets = NetTrain[ #, <"Input" > exampleData[[All, 1]], "Output" > exampleData[[All, 2]]>, LossFunction > MeanSquaredLossLayer[], Method > {"ADAM"(*,"L2Regularization"\[Rule]10^#*)}, TimeGoal > 20 ] & /@ dropoutNets; With[{xvals = Range[3.5, 3.5, 0.1]}, Show[ ListPlot[ TimeSeries[Transpose@Through[trainedDropoutNets[xvals]], {xvals}, ValueDimensions > Length[trainedDropoutNets]], PlotLabel > "Dropoutregularized networks", Joined > True, PlotLegends > Map[StringForm["`1` = `2`", Subscript[p, drop], #] &, pDropoutList] ], plot, ImageSize > 450, PlotRange > All ] ] 
To summarize:
Both regularization methods mentioned previously were originally proposed as ad hoc solutions to the overfitting problem. However, recent work has shown that there are actually very good fundamental mathematical reasons why these methods work. Even more importantly, it has been shown that you can use them to do better than just produce a regression line! For those of you who are interested, I suggest reading this blog post by Yarin Gal. His thesis “Uncertainty in Deep Learning” is also well worth a look and is the main source for what follows in the rest of this post.
As it turns out, there is a link between stochastic regression neural networks and Gaussian processes, which are freeform regression methods that let you predict values and put error bands on those predictions. To do this, we need to consider neural network regression as a proper Bayesian inference procedure. Normally, Bayesian inference is quite computationally expensive, but as it conveniently turns out, you can do an approximate inference with minimal extra effort on top of what I already did above.
The basic idea is to use dropout layers to create a noisy neural network that is trained on the data as normal. However, I’m also going to use the dropout layers when doing predictions: for every value where I need a prediction, I will sample the network multiple times to get a sense of the errors in the predictions.
Furthermore, it’s good to keep in mind that you, as a newly converted Bayesian, are also dealing with priors. In particular, the network weights are now random variables with a prior distribution and a posterior distribution (i.e. the distributions before and after learning). This may sound rather difficult, so let me try to answer two questions you may have at this point:
Q1: Does that mean that I actually have to think hard about my prior now?
A1: No, not really, because it simply turns out that our old friend , the regularization coefficient, is really just the inverse standard deviation of the network prior weights: if you choose a larger , that means you’re only allowing small network weights.
Q2: So what about the posterior distribution of the weights? Don’t I have to integrate the predictions over the posterior weight distribution to get a posterior predictive distribution?
A2: Yes, you do, and that’s exactly what you do (at least approximately) when you sample the trained network with the dropout layers active. The sampling of the network is just a form of Monte Carlo integration over the posterior distribution.
So as you can see, being a Bayesian here really just means giving things a different name without having to change your way of doing things very much.
Let’s start with the simplest type of regression in which the noise level of the data is assumed constant across the x axis. This is also called homoscedastic regression (as opposed to heteroscedastic regression, where the noise is a function of x). It does not, however, mean that the prediction error will also be constant: the prediction error depends on the noise level but also on the uncertainty in the network weights.
So let’s get to it and see how this works out, shall we? First I will define my network with a dropout layer. Normally you’d put a dropout layer before every linear layer, but since the input is just a number, I’m omitting the first dropout layer:
✕
\[Lambda]2 = 0.01; pdrop = 0.1; nUnits = 300; activation = Ramp; net = NetChain[ {LinearLayer[nUnits], ElementwiseLayer[activation], DropoutLayer[pdrop], LinearLayer[]}, "Input" > "Real", "Output" > "Real" ] 
✕
trainedNet = NetTrain[ net, <"Input" > exampleData[[All, 1]], "Output" > exampleData[[All, 2]]>, LossFunction > MeanSquaredLossLayer[], Method > {"ADAM", "L2Regularization" > \[Lambda]2}, TimeGoal > 10 ]; 
Next, we need to produce predictions from this model. To calibrate the model, you need to provide a prior length scale l that expresses your belief in how correlated the data is over a distance (just like in Gaussian process regression). Together with the regularization coefficient , the dropout probability p and the number of training data points N, you have to add the following variance to the sample variance of the network:
The following function takes a trained net and samples it multiple times with the dropout layers active (using NetEvaluationMode → "Train"). It then constructs a time series object of the –1, 0 and +1σ bands of the predictions:
✕
sampleNet[net : (_NetChain  _NetGraph), xvalues_List, sampleNumber_Integer?Positive, {lengthScale_, l2reg_, prob_, nExample_}] := TimeSeries[ Map[ With[{ mean = Mean[#], stdv = Sqrt[Variance[#] + (2 l2reg nExample)/(lengthScale^2 (1  prob))] }, mean + stdv*{1, 0, 1} ] &, Transpose@ Select[Table[ net[xvalues, NetEvaluationMode > "Train"], {i, sampleNumber}], ListQ]], {xvalues}, ValueDimensions > 3 ]; 
Now we can go ahead and plot the predictions with 1σ error bands. The prior seems to work reasonably well, though in real applications you’d need to calibrate it with a validation set (just like you would with and p).
✕
l = 2; samples = sampleNet[trainedNet, Range[5, 5, 0.05], 200, {l, \[Lambda]2, pdrop, Length[exampleData]}]; Show[ ListPlot[ samples, Joined > True, Filling > {1 > {2}, 3 > {2}}, PlotStyle > {Lighter[Blue], Blue, Lighter[Blue]} ], ListPlot[exampleData, PlotStyle > Red], ImageSize > 600, PlotRange > All ] 
As you can see, the network has a tendency to do linear extrapolation due to my choice of the ramp nonlinearity. Picking different nonlinearities will lead to different extrapolation behaviors. In terms of Gaussian process regression, the choice of your network design influences the effective covariance kernel you’re using.
If you’re curious to see how the different network parameters influence the look of the regression, skip down a few paragraphs and try the manipulates, where you can interactively train your own network on data you can edit on the fly.
In heteroscedastic regression, you let the neural net try and find the noise level for itself. This means that the regression network outputs two numbers instead of one: a mean and a standard deviation. However, since the outputs of the network are real numbers, it’s easier if you use the logprecision instead of the standard deviation: :
✕
\[Lambda]2 = 0.01; pdrop = 0.1; nUnits = 300; activation = Ramp; regressionNet = NetGraph[ {LinearLayer[nUnits], ElementwiseLayer[activation], DropoutLayer[pdrop], LinearLayer[], LinearLayer[]}, { NetPort["Input"] > 1 > 2 > 3, 3 > 4 > NetPort["Mean"], 3 > 5 > NetPort["LogPrecision"] }, "Input" > "Real", "Mean" > "Real", "LogPrecision" > "Real" ] 
Next, instead of using a MeanSquaredLossLayer to train the network, you minimize the negative loglikelihood of the observed data. Again, you replace σ with the log of the precision and multiply everything by 2 to be in agreement with the convention of MeanSquaredLossLayer.
✕
FullSimplify[2* LogLikelihood[ NormalDistribution[\[Mu], \[Sigma]], {yobs}] /. \[Sigma] > 1/ Sqrt[Exp[log\[Tau]]], Assumptions > log\[Tau] \[Element] Reals] 
Discarding the constant term gives us the following loss:
✕
loss = Function[{y, mean, logPrecision}, (y  mean)^2*Exp[logPrecision]  logPrecision ]; netHetero = NetGraph[< "reg" > regressionNet, "negLoglikelihood" > ThreadingLayer[loss] >, { NetPort["x"] > "reg", {NetPort["y"], NetPort[{"reg", "Mean"}], NetPort[{"reg", "LogPrecision"}]} > "negLoglikelihood" > NetPort["Loss"] }, "y" > "Real", "Loss" > "Real" ] 
✕
trainedNetHetero = NetTrain[ netHetero, <"x" > exampleData[[All, 1]], "y" > exampleData[[All, 2]]>, LossFunction > "Loss", Method > {"ADAM", "L2Regularization" > \[Lambda]2} ]; 
Again, the predictions are sampled multiple times. The predictive variance is now the sum of the variance of the predicted mean + mean of the predicted variance. The priors no longer influence the variance directly, but only through the network training:
✕
sampleNetHetero[net : (_NetChain  _NetGraph), xvalues_List, sampleNumber_Integer?Positive] := With[{regressionNet = NetExtract[net, "reg"]}, TimeSeries[ With[{ samples = Select[Table[ regressionNet[xvalues, NetEvaluationMode > "Train"], {i, sampleNumber}], AssociationQ] }, With[{ mean = Mean[samples[[All, "Mean"]]], stdv = Sqrt[Variance[samples[[All, "Mean"]]] + Mean[Exp[samples[[All, "LogPrecision"]]]]] }, Transpose[{mean  stdv, mean, mean + stdv}] ] ], {xvalues}, ValueDimensions > 3 ] ]; 
Now you can plot the predictions with 1σ error bands:
✕
samples = sampleNetHetero[trainedNetHetero, Range[5, 5, 0.05], 200]; Show[ ListPlot[ samples, Joined > True, Filling > {1 > {2}, 3 > {2}}, PlotStyle > {Lighter[Blue], Blue, Lighter[Blue]} ], ListPlot[exampleData, PlotStyle > Red], ImageSize > 600, PlotRange > All ] 
Of course, it’s still necessary to do validation of this network; one network architecture might be much better suited to the data at hand than another, so there is still the need to use validation sets to decide which model you have to use and with what parameters. Attached to the end of this blog post, you’ll find a notebook with an interactive demo of the regression method I just showed. With this code, you can find out for yourself how the different model parameters influence the predictions of the network.
The code in this section shows how to implement the loss function described in the paper “Dropout Inference in Bayesian Neural Networks with AlphaDivergences” by Li and Gal. For an interpretation of the α parameter used in this work, see e.g. figure 2 in “BlackBox αDivergence Minimization” by HernándezLobato et al (2016).
In the paper by Li and Gal, the authors propose a modified loss function ℒ for a stochastic neural network to solve a weakness of the standard loss function I used above: it tends to underfit the posterior and give overly optimistic predictions. Optimistic predictions are a problem: when you fit your data to try and get a sense of what the real world might give you, you don’t want to be thrown a curveball afterwards.
During training, the training inputs (with indexing the training examples) are fed through the network K times to sample the outputs and compared to the training outputs . Given a particular standard loss function l (e.g. mean square error, negative log likelihood, crossentropy) and regularization function for the weights θ, the modified loss function ℒ is given as:
The parameter α is the divergence parameter, which is typically tuned to (though you can pick other values as well, if you want). It can be thought of as a “pessimism” parameter: the higher it is, the more the network will tend to err on the side of caution and the larger error estimates. Practically speaking, a higher α parameter makes the loss function more lenient to the presence of large losses among the K samples, meaning that after training the network will produce a larger spread of predictions when sampled. Literature seems to suggest that is a pretty good value to start with. In the limit α→0, the LogSumExp simply becomes the sample average over K losses.
As can be seen, we need to sample the network several times during training. We can accomplish this with NetMapOperator. As a simple example, suppose we want to apply a dropout layer times to the same input. To do this, we duplicate the input and then wrap a NetMapOperator around the dropout layer and map it over the duplicated input:
✕
input = Range[5]; NetChain[{ ReplicateLayer[10], NetMapOperator[ DropoutLayer[0.5] ] } ][input, NetEvaluationMode > "Train"] 
Next, define a net that will try to fit the data points with a normal distribution like in the previous heteroscedastic example. The output of the net is now a length2 vector with the mean and the log precision (we can’t have two output ports because we’re going to have wrap the whole thing into NetMapOperator):
✕
alpha = 0.5; pdrop = 0.1; units = 300; activation = Ramp; \[Lambda]2 = 0.01; (*L2 regularization coefficient*) k = 25; (* number of samples of the network for calculating the loss*) regnet = NetInitialize@NetChain[{ LinearLayer[units], ElementwiseLayer[activation], DropoutLayer[pdrop], LinearLayer[] }, "Input" > "Real", "Output" > {2} ]; 
You will also need a network element to calculate the LogSumExp operator that aggregates the losses of the different samples of the regression network. I implemented the αweighted LogSumExp by factoring out the largest term before feeding the vector into the exponent to make it more numerically stable. Note that I’m ignoring theterm since it’s a constant for the purpose of training the network.
✕
logsumexp\[Alpha][alpha_] := NetGraph[< "timesAlpha" > ElementwiseLayer[Function[alpha #]], "max" > AggregationLayer[Max, 1], "rep" > ReplicateLayer[k], "sub" > ThreadingLayer[Subtract], "expAlph" > ElementwiseLayer[Exp], "sum" > SummationLayer[], "logplusmax" > ThreadingLayer[Function[{sum, max}, Log[sum] + max]], "invalpha" > ElementwiseLayer[Function[(#/alpha)]] >, { NetPort["Input"] > "timesAlpha", "timesAlpha" > "max" > "rep", {"timesAlpha", "rep"} > "sub" > "expAlph" > "sum" , {"sum", "max"} > "logplusmax" > "invalpha" }, "Input" > {k} ]; logsumexp\[Alpha][alpha] 
Define the network that will be used for training:
✕
net\[Alpha][alpha_] := NetGraph[< "rep1" > ReplicateLayer[k],(* replicate the inputs and outputs of the network *) "rep2" > ReplicateLayer[k], "map" > NetMapOperator[regnet], "mean" > PartLayer[{All, 1}], "logprecision" > PartLayer[{All, 2}], "loss" > ThreadingLayer[ Function[{mean, logprecision, y}, (mean  y)^2*Exp[logprecision]  logprecision]], "logsumexp" > logsumexp\[Alpha][alpha] >, { NetPort["x"] > "rep1" > "map", "map" > "mean", "map" > "logprecision", NetPort["y"] > "rep2", {"mean", "logprecision", "rep2"} > "loss" > "logsumexp" > NetPort["Loss"] }, "x" > "Real", "y" > "Real" ]; net\[Alpha][alpha] 
… and train it:
✕
trainedNet\[Alpha] = NetTrain[ net\[Alpha][alpha], <"x" > exampleData[[All, 1]], "y" > exampleData[[All, 2]]>, LossFunction > "Loss", Method > {"ADAM", "L2Regularization" > \[Lambda]2}, TargetDevice > "CPU", TimeGoal > 60 ]; 
✕
sampleNet\[Alpha][net : (_NetChain  _NetGraph), xvalues_List, nSamples_Integer?Positive] := With[{regnet = NetExtract[net, {"map", "Net"}]}, TimeSeries[ Map[ With[{ mean = Mean[#[[All, 1]]], stdv = Sqrt[Variance[#[[All, 1]]] + Mean[Exp[#[[All, 2]]]]] }, mean + stdv*{1, 0, 1} ] &, Transpose@Select[ Table[ regnet[xvalues, NetEvaluationMode > "Train"], {i, nSamples} ], ListQ]], {xvalues}, ValueDimensions > 3 ] ]; 
✕
samples = sampleNet\[Alpha][trainedNet\[Alpha], Range[5, 5, 0.05], 200]; Show[ ListPlot[ samples, Joined > True, Filling > {1 > {2}, 3 > {2}}, PlotStyle > {Lighter[Blue], Blue, Lighter[Blue]} ], ListPlot[exampleData, PlotStyle > Red], ImageSize > 600, PlotRange > All ] 
I’ve discussed that dropout layers and the regularization coefficient in neural network training can actually be seen as components of a Bayesian inference procedure that approximates Gaussian process regression. By simply training a network with dropout layers like normal and then running the network several times in NetEvaluationMode → "Train", you can get an estimate of the predictive posterior distribution, which not only includes the noise inherently in the data but also the uncertainty in the trained network weights.
If you’d like to learn more about this material or have any questions you’d like to ask, please feel free to visit my discussion on Wolfram Community.
Recognizing words is one of the simplest tasks a human can do, yet it has proven extremely difficult for machines to achieve similar levels of performance. Things have changed dramatically with the ubiquity of machine learning and neural networks, though: the performance achieved by modern techniques is dramatically higher compared with the results from just a few years ago. In this post, I’m excited to show a reduced but practical and educational version of the speech recognition problem—the assumption is that we’ll consider only a limited set of words. This has two main advantages: first of all, we have easy access to a dataset through the Wolfram Data Repository (the Spoken Digit Commands dataset), and, maybe most importantly, all of the classifiers/networks I’ll present can be trained in a reasonable time on a laptop.
It’s been about two years since the initial introduction of the Audio object into the Wolfram Language, and we are thrilled to see so many interesting applications of it. One of the main additions to Version 11.3 of the Wolfram Language was tight integration of Audio objects into our machine learning and neural net framework, and this will be a cornerstone in all of the examples I’ll be showing today.
Without further ado, let’s squeeze out as much information as possible from the Spoken Digit Commands dataset!
Let’s get started by accessing and inspecting the dataset a bit:
✕
ro=ResourceObject["Spoken Digit Commands"] 
The dataset is a subset of the Speech Commands dataset released by Google. We wanted to have a “spoken MNIST,” which would let us produce small, selfenclosed examples of machine learning on audio signals. Since the Spoken Digit Commands dataset is a ResourceObject, it’s easy to get all the training and testing data within the Wolfram Language:
✕
trainingData=ResourceData[ro,"TrainingData"]; testingData=ResourceData[ro,"TestData"]; RandomSample[trainingData,3]//Dataset 
One important thing we made sure of is that the speakers in the training and testing sets are different. This means that in the testing phase, the trained classifier/network will encounter speakers that it has never heard before.
✕
Intersection[trainingData[[All,"SpeakerID"]],testingData[[All,"SpeakerID"]]] 
The possible output values are the digits from 0 to 9:
✕
classes=Union[trainingData[[All,"Output"]]] 
Conveniently, the length of all the input data is between .5 and 1 seconds, with the majority for the signals being one second long:
✕
Dataset[trainingData][Histogram[#,ScalingFunctions>"Log"]&@*Duration,"Input"] 
In Version 11.3, we built a collection of audio encoders in NetEncoder and properly integrated it into the rest of the machine learning and neural net framework. Now we can seamlessly extract features from a large collection of audio recordings; inject them into a net; and train, test and evaluate networks for a variety of applications.
Since there are multiple features that one might want to extract from an audio signal, we decided that it was a good idea to have one encoder per feature rather than a single generic "Audio" one. Here is the full list:
• "Audio"
• "AudioSTFT"
• "AudioSpectrogram"
• "AudioMelSpectrogram"
• "AudioMFCC"
The first step (which is common in all encoders) is the preprocessing: the signal is reduced to a single channel, resampled to a fixed sample rate and can be padded or trimmed to a specified duration.
The simplest one is NetEncoder["Audio"], which just returns the raw waveform:
✕
encoder=NetEncoder["Audio"] 
✕
encoder[RandomChoice[trainingData]["Input"]]//Flatten//ListLinePlot 
The starting point for all of the other audio encoders is the shorttime Fourier transform, where the signal is partitioned in (potentially overlapping) chunks, and the Fourier transform is computed on each of them. This way we can get both time (since each chunk is at a very specific time) and frequency (thanks to the Fourier transform) information. We can visualize this process by using the Spectrogram function:
✕
a=AudioGenerator[{"Sin",TimeSeries[{{0,1000},{1,4000}}]},2]; Spectrogram[a] 
The main parameters for this operation that are common to all of the frequency domain features are WindowSize and Offset, which control the sizes of the chunks and their offsets.
Each NetEncoder supports the "TargetLength" option. If this is set to a specific number, the input audio will be trimmed or padded to the correct duration; otherwise, the length of the output of the NetEncoder will depend on the length of the original signal.
For the scope of this blog post, I’ll be using the "AudioMFCC" NetEncoder, since it is a feature that packs a lot of information about the signal while keeping the dimensionality low:
✕
encoder=NetEncoder[{"AudioMFCC","TargetLength">All,"SampleRate">16000,"WindowSize" > 1024,"Offset"> 570,"NumberOfCoefficients">28,"Normalization">True}] encoder[RandomChoice[trainingData]["Input"]]//Transpose//MatrixPlot 
As I mentioned at the beginning, these encoders are quite fast: this specific one on my notverynew machine runs through all 10,000 examples in slightly more than two seconds:
✕
encoder[trainingData[[All,"Input"]]];//AbsoluteTiming 
Now we have the data and an efficient way of extracting features. Let’s find out what Classify can do for us.
To start, let’s massage our data into a format that Classify would be happier with:
✕
classifyTrainingData = #Input > #Output & /@ trainingData; classifyTestingData = #Input > #Output & /@ testingData; 
Classify does have some trouble dealing with variablelength sequences (which hopefully will be improved on soon), so we’ll have to find ways to work around that.
To make the problem simpler, we can get rid of the variable length of the features. One naive way is to compute the mean of the sequence:
✕
cl=Classify[classifyTrainingData,FeatureExtractor>(Mean@*encoder),PerformanceGoal>"Quality"]; 
The result is a bit disheartening, but not unexpected, since we are trying to summarize each signal with only 28 parameters. Not stunning.
✕
cm=ClassifierMeasurements[cl,classifyTestingData]; cm["Accuracy"] cm["ConfusionMatrixPlot"] 
To improve the results of Classify, we can feed it more information about the signal by adding the standard deviation of each sequence as well:
✕
cl=Classify[classifyTrainingData,FeatureExtractor>(Flatten[{Mean[#],StandardDeviation[#]}]&@*encoder),PerformanceGoal>"Quality"]; 
Some effort does pay off:
✕
cm=ClassifierMeasurements[cl,classifyTestingData]; cm["Accuracy"] cm["ConfusionMatrixPlot"] 
We can follow this strategy a bit more, and also add the Kurtosis of the sequence:
✕
cl=Classify[classifyTrainingData,FeatureExtractor>(Flatten[{Mean[#],StandardDeviation[#],Kurtosis[#]}]&@*encoder),PerformanceGoal>"Quality"]; 
The improvement is not as huge, but it is there:
✕
cm=ClassifierMeasurements[cl,classifyTestingData]; cm["Accuracy"] cm["ConfusionMatrixPlot"] 
We could continue dripping information about statistics of the sequences, with smaller and smaller returns. But with this specific dataset, we can follow a simpler strategy: remember how we noticed that most recordings were about 1 second long? That means that if we fix the length of the extracted feature to the equivalent of 1 second (about 28 frames) using the "TargetLength" option, the encoder will take care of doing the padding or trimming as appropriate. This way, all the inputs to Classify will have the same dimensions of {28,28}:
✕
encoderFixed=NetEncoder[{"AudioMFCC","TargetLength">28,"SampleRate">16000,"WindowSize" > 1024,"Offset"> 570,"NumberOfCoefficients">28,"Normalization">True}] 
✕
cl=Classify[classifyTrainingData,FeatureExtractor>encoderFixed,PerformanceGoal>"DirectTraining"]; 
The training time is longer, but we do still get an accuracy bump:
✕
cm=ClassifierMeasurements[cl,classifyTestingData]; cm["Accuracy"] cm["ConfusionMatrixPlot"] 
This is about as far as we can get with Classify and lowlevel features. Time to ditch the automation and to bring out the neural networks machinery!
Let’s remember that we’re playing with a spoken versions of MNIST, so what could be a better starting place than LeNet? This is a network that is often used as a benchmark on the standard image MNIST, and is very fast to train (even without GPU).
We’ll use the same strategy as in the last Classify example: we’ll fix the length of the signals to about one second, and we’ll tune the parameters of the NetEncoder so that the input will have the same dimensions of the MNIST images. This is one of the reasons we can confidently use a CNN architecture for this job: we are dealing with 2D matrices (images, in essence—actually, that’s how we usually look at MFCC), and we want the network to infer information from their structures.
Let’s grab LeNet from NetModel:
✕
lenet=NetModel["LeNet Trained on MNIST Data","UninitializedEvaluationNet"] 
Since the "AudioMFCC" NetEncoder produces twodimensional data (time x frequency), and the net requires threedimensional inputs (where the first dimensions are the channel dimensions), we can use ReplicateLayer to make them compatible:
✕
lenet=NetPrepend[lenet,ReplicateLayer[1]] 
Using NetReplacePart, we can attach the "AudioMFCC" NetEncoder to the input and the appropriate NetDecoder to the output:
✕
audioLeNet=NetReplacePart[lenet, { "Input">NetEncoder[{"AudioMFCC","TargetLength">28,"SampleRate">16000,"WindowSize" > 1024,"Offset"> 570,"NumberOfCoefficients">28,"Normalization">True}], "Output">NetDecoder[{"Class",classes}] } ] 
To speed up convergence and prevent overfitting, we can use NetReplace to add a BatchNormalizationLayer after every convolution:
✕
audioLeNet=NetReplace[audioLeNet,{x_ConvolutionLayer:>NetChain[{x,BatchNormalizationLayer[]}]}] 
NetInformation allows us to visualize at a glance the net’s structure:
✕
NetInformation[audioLeNet,"SummaryGraphic"] 
Now our net is ready for training! After defining a validation set on 5% of the training data, we can let NetTrain worry about all hyperparameters:
✕
resultObject=NetTrain[ audioLeNet, trainingData, All, ValidationSet>Scaled[.05] ] 
Seems good! Now we can use ClassifierMeasurements on the net to measure the performance:
✕
cm=ClassifierMeasurements[resultObject["TrainedNet"],classifyTestingData]; cm["Accuracy"] cm["ConfusionMatrixPlot"] 
It looks like the added effort paid off!
We can also embrace the variablelength nature of the problem by specifying "TargetLength"→All in the encoder:
✕
encoder=NetEncoder[{"AudioMFCC","TargetLength">All,"NumberOfCoefficients">28,"SampleRate">16000,"WindowSize" > 1024,"Offset"> 571,"Normalization">True}] 
This time we’ll use an architecture based on the GatedRecurrentLayer. Used on its own, it returns its state per each time step, but we are only interested in the classification of the entire sequence, i.e. we want a single output for all time steps. We can use SequenceLastLayer to extract the last state for the sequence. After that, we can add a couple of fully connected layers to do the classification:
✕
rnn= NetChain[{ GatedRecurrentLayer[32,"Dropout">{"VariationalInput">0.3}], GatedRecurrentLayer[64,"Dropout">{"VariationalInput">0.3}], SequenceLastLayer[], LinearLayer[64], Ramp, LinearLayer[Length@classes], SoftmaxLayer[]}, "Input">encoder, "Output">NetDecoder[{"Class",classes}] ] 
Again, we’ll let NetTrain worry about all hyperparameters:
✕
resultObjectRNN=NetTrain[ rnn, trainingData, All, ValidationSet>Scaled[.05] ] 
… and measure the performance:
✕
cm=ClassifierMeasurements[resultObjectRNN["TrainedNet"],classifyTestingData]; cm["Accuracy"] cm["ConfusionMatrixPlot"] 
It seems that treating the input as a pure sequence and letting the network figure out how to extract meaning from it works quite well!
Now that we have some trained networks, we can play with them a bit. First of all, let’s take the recurrent network and chop off the last two layers:
✕
choppedNet=NetTake[resultObjectRNN["TrainedNet"],{1,5}] 
This leaves us with something that produces a vector of 64 numbers per each input signal. We can try to use this chopped network as a feature extractor and plot the results:
✕
FeatureSpacePlot[Style[#["Input"],ColorData[97][#["Output"]+1]]>#["Output"]&/@testingData,FeatureExtractor>choppedNet] 
It looks like the various classes get properly separated!
We can also record a signal, and test the trained network on it:
✕
a=AudioTrim@AudioCapture[] 
✕
resultObjectRNN["TrainedNet"][a] 
We can attempt something more adventurous on this dataset: up until now, we have simply done classification (a sequence goes in, a single class comes out). What if we tried transduction: a sequence (the MFCC features) goes in, and another sequence (the characters) comes out?
First of all, let’s add string labels to our data:
✕
labels = <0 > "zero", 1 > "one", 2 > "two", 3 > "three", 4 > "four", 5 > "five", 6 > "six", 7 > "seven", 8 > "eight", 9 > "nine">; trainingDataString = Append[#, "Target" > labels[#Output]] & /@ trainingData; testingDataString = Append[#, "Target" > labels[#Output]] & /@ testingData; 
We need to remember that once trained, this will not be a general speechrecognition network: it will only have been exposed to one word at a time, only to a limited set of characters and only 10 words!
✕
Union[Flatten@Characters@Values@labels]//Sort 
A recurrent architecture would output a sequence of the same length as the input, which is not what we want. Luckily, we can use the CTCBeamSearch NetDecoder to take care of this. Say that the input sequence is n steps long, and the decoding has m different classes: the NetDecoder will expect an input of dimensions (there are m possible states, plus a special blank character). Given this information, the decoder will find the most likely sequence of states by collapsing all of the ones that are not separated by the blank symbol.
Another difference with the previous architecture will be the use of NetBidirectionalOperator. This operator applies a net to a sequence and its reverse, catenating both results into one single output sequence:
✕
net=NetGraph[{NetBidirectionalOperator@GatedRecurrentLayer[64,"Dropout">{"VariationalInput">0.4}], NetBidirectionalOperator@GatedRecurrentLayer[64,"Dropout">{"VariationalInput">0.4}], NetMapOperator[{LinearLayer[128],Ramp,LinearLayer[],SoftmaxLayer[]}]}, {NetPort["Input"]>1>2>3>NetPort["Target"]}, "Input">NetEncoder[{"AudioMFCC","TargetLength">All,"NumberOfCoefficients">28,"SampleRate">16000,"WindowSize" > 1024,"Offset"> 571,"Normalization">True}], "Target">NetDecoder[{"CTCBeamSearch",Alphabet[]}]] 
To train the network, we need a way to compute the loss that takes the decoding into account. This is what the CTCLossLayer is for:
✕
trainedCTC=NetTrain[net,trainingDataString,LossFunction>CTCLossLayer["Target">NetEncoder[{"Characters",Alphabet[]}]],ValidationSet>Scaled[.05],MaxTrainingRounds>20]; 
Let’s pick a random example from the test set:
✕
a=RandomChoice@testingDataString 
Look at how the trained network behaves:
✕
trainedCTC[a["Input"]] 
We can also look at the output of the net just before the CTC decoding takes place. This represents the probability of each character per time step:
✕
probabilities=NetReplacePart[trainedCTC,"Target">None][a["Input"]]; ArrayPlot[Transpose@probabilities,DataReversed>True,FrameTicks>{Thread[{Range[26],Alphabet[]}],None}] 
We can also show these probabilities superimposed on the spectrogram of the signal:
✕
Show[{ArrayPlot[Transpose@probabilities,DataReversed>True,FrameTicks>{Thread[{Range[26],Alphabet[]}],None}],Graphics@{Opacity[.5],Spectrogram[a["Input"],DataRange>{{0,Length[probabilities]},{0,27}},PlotRange>All][[1]]}}] 
There is definitely the possibility that the network would make small spelling mistakes (e.g. “sixo” instead of “six”). We can visually inspect these spelling mistakes by applying the net to all classes and get a WordCloud of them:
✕
WordCloud[StringJoin/@trainedCTC[#[[All,"Input"]]]]&/@GroupBy[testingDataString,Last] 
Most of these spelling mistakes are quite small, and a simple Nearest function might be enough to correct them:
✕
nearest=First@*Nearest[Values@labels]; nearest["sixo"] 
To measure the performance of the net and the Nearest function, first we need to define a function that, given an output for the net (a list of characters), computes the probability per each class:
✕
probs=AssociationThread[Values[labels]>0]; getProbabilities[chars:{___String}]:=Append[probs,nearest[StringJoin[chars]]>1] 
Let’s check that it works:
✕
getProbabilities[{"s","i","x","o"}] getProbabilities[{"f","o","u","r"}] 
Now we can use ClassifierMeasurements by giving an association of probabilities and the correct labels per each example as input:
✕
cm=ClassifierMeasurements[getProbabilities/@trainedCTC[testingDataString[[All,"Input"]]],testingDataString[[All,"Target"]]] 
The accuracy is quite high!
✕
cm["Accuracy"] cm["ConfusionMatrixPlot"] 
Up till now, the architectures we have been experimenting with are fairly straightforward. We can now attempt to do something more ambitious: an encoder/decoder architecture. The basic idea is that we’ll have two main components in the net: the encoder, whose job is to encode all the information about the input features into a single vector (of 128 elements, in our case); and the decoder, which will take this vector (the “encoded” version of the input) and be able to produce a “translation” of it as a sequence of characters.
Let’s define the NetEncoder that will deal with the strings:
✕
targetEnc=NetEncoder[{"Characters",{Alphabet[],{StartOfString,EndOfString}>Automatic},"UnitVector"}] 
… and the one that will deal with the Audio objects:
✕
inputEnc=NetEncoder[{"AudioMFCC","TargetLength">All,"NumberOfCoefficients">28,"SampleRate">16000,"WindowSize" > 1024,"Offset"> 571,"Normalization">True}] 
Our encoder network will consist of a single GatedRecurrentLayer and a SequenceLastLayer to extract the last state, which will become our encoded representation of the input signal:
✕
encoderNet=NetChain[{GatedRecurrentLayer[128,"Dropout">{"VariationalInput">0.3}],SequenceLastLayer[]}] 
The decoder network will take a vector of 128 elements and a sequence of vectors as input, and will return a sequence of vectors:
✕
decoderNet=NetGraph[{ SequenceMostLayer[], GatedRecurrentLayer[128,"Dropout">{"VariationalInput">0.3}], NetMapOperator[LinearLayer[]], SoftmaxLayer[]}, {NetPort["Input"]>1>2>3>4, NetPort["State"]>NetPort[2,"State"]} ] 
We then need to define a network to train the encoder and decoder. This configuration is usually called a “teacher forcing” network:
✕
teacherForcingNet=NetGraph[<"encoder">encoderNet,"decoder">decoderNet,"loss">CrossEntropyLossLayer["Probabilities"],"rest">SequenceRestLayer[]>, {NetPort["Input"]>"encoder">NetPort["decoder","State"], NetPort["Target"]>NetPort["decoder","Input"], "decoder">NetPort["loss","Input"], NetPort["Target"]>"rest">NetPort["loss","Target"]}, "Input">inputEnc,"Target">targetEnc] 
Using NetInformation, we can look at the whole structure with one glance:
✕
NetInformation[teacherForcingNet,"FullSummaryGraphic"] 
The idea is that the decoder is presented with the encoded input and most of the target, and its job is to predict the next character. We can now go ahead and train the net:
✕
trainedEncDec=NetTrain[teacherForcingNet,trainingDataString,ValidationSet>Scaled[.05]] 
Now let’s inspect what happened. First of all, we have a trained encoder:
✕
trainedEncoder=NetReplacePart[NetExtract[trainedEncDec,"encoder"],"Input">inputEnc] 
This takes an Audio object and outputs a single vector of 150 elements. Hopefully, all of the interesting information of the original signal is included here:
✕
example=RandomChoice[testingDataString] 
Let’s use the trained encoder to encode the example input:
✕
encodedVector=trainedEncoder[example["Input"]]; ListLinePlot[encodedVector] 
Of course, this doesn’t tell us much on its own, but we could use the trained encoder as feature extractor to visualize all of the testing set:
✕
FeatureSpacePlot[Style[#["Input"],ColorData[97][#["Output"]+1]]>#["Output"]&/@testingData,FeatureExtractor>trainedEncoder] 
To extract information from the encoded vector, we need help from our trusty decoder (which has been trained as well):
✕
trainedDecoder=NetExtract[trainedEncDec,"decoder"] 
Let’s add some processing of the input and output:
✕
decoder=NetReplacePart[trainedDecoder,{"Input">targetEnc,"Output">NetDecoder[targetEnc]}] 
If we feed the decoder the encoded state and a seed string to start the reconstruction and iterate the process, the decoder will do its job nicely:
✕
res=decoder[<"State">encodedVector,"Input">"c">] res=decoder[<"State">encodedVector,"Input">res>] res=decoder[<"State">encodedVector,"Input">res>] 
We can make this decoding process more compact, though; we want to construct a net that will compute the output automatically until the endofstring character is reached. As a first step, let’s extract the two main components of the decoder net:
✕
gru=NetExtract[trainedEncDec,{"decoder",2}] linear=NetExtract[trainedEncDec,{"decoder",3,"Net"}] 
Define some additional processing of the input and output of the net that includes special classes to indicate the start and end of the string:
✕
classEnc=NetEncoder[{"Class",Append[Alphabet[],StartOfString],"UnitVector"}]; classDec=NetDecoder[{"Class",Append[Alphabet[],EndOfString]}]; 
Define a characterlevel predictor that takes a single character, runs one step of the GatedRecurrentLayer and produces a single softmax prediction:
✕
charPredictor=NetChain[{ReshapeLayer[{1,27}],gru,ReshapeLayer[{128}],linear,SoftmaxLayer[]},"Input">classEnc,"Output">classDec] 
Now we can use NetStateObject to inject the encoded vector into the state of the recurrent layer:
✕
sobj=NetStateObject[charPredictor,<{2,"State"}>encodedVector>] 
If we now feed this predictor the StartOfString character, this will predict the next character:
✕
sobj[StartOfString] 
Then we can iterate the process:
✕
sobj[%] sobj[%] sobj[%] 
We can now encapsulate this process in a single function:
✕
predict[input_]:=Module[{encoded,sobj,res}, encoded=trainedEncoder[input]; sobj=NetStateObject[charPredictor,<{2,"State"}>encoded>]; res=NestWhileList[sobj,StartOfString,#=!=EndOfString&]; StringJoin@res[[2;;2]] ] 
This way, we can directly compute the full output:
✕
predict[example["Input"]] 
Again, we need to define a function that, given an output for the net, computes the probability per each class:
✕
probs=AssociationThread[Values[labels]>0]; getProbabilities[in_]:=Append[probs,nearest@predict[in]>1]; 
Now we can use ClassifierMeasurements by giving as input an association of probabilities and the correct labels per each example:
✕
cm=ClassifierMeasurements[getProbabilities/@testingDataString[[All,"Input"]],testingDataString[[All,"Target"]]] 
✕
cm["Accuracy"] cm["ConfusionMatrixPlot"] 
Audio signals are less ubiquitous than images in the machine learning world, but that doesn’t mean they are less interesting to analyze. As we continue to complete and optimize audio analysis using modern machine learning and neural net approaches in the Wolfram Language, we are also excited to use it ourselves to build highlevel applications in the domains of speech analysis, music understanding and many other areas.
The more one does computational thinking, the better one gets at it. And today we’re launching the Wolfram Challenges site to give everyone a source of bitesized computational thinking challenges based on the Wolfram Language. Use them to learn. Use them to stay sharp. Use them to prove how great you are.
The Challenges typically have the form: “Write a function to do X”. But because we’re using the Wolfram Language—with all its builtin computational intelligence—it’s easy to make the X be remarkably sophisticated.
The site has a range of levels of Challenges. Some are good for beginners, while others will require serious effort even for experienced programmers and computational thinkers. Typically each Challenge has at least some known solution that’s at most a few lines of Wolfram Language code. But what are those lines of code?
There may be many different approaches to a particular Challenge, leading to very different kinds of code. Sometimes the code will be smaller, sometimes it will run faster, and so on. And for each Challenge, the site maintains a leaderboard that shows who’s got the smallest, the fastest, etc. solution so far.
What does it take to be able to tackle Challenges on the site? If you’ve read my An Elementary Introduction to the Wolfram Language, for example, you should be well prepared—maybe with some additional help on occasion from the main Wolfram Language documentation. But even if you’re more of a beginner, you should still be able to do simpler Challenges, perhaps looking at parts of my book when you need to. (If you’re an experienced programmer, a good way to jumpstart yourself is to look at the Fast Introduction for Programmers.)
There are lots of different kinds of Challenges on the site. Each Challenge is tagged with topic areas. And on the front page there are a number of “tracks” that you can use as guides to sequences of related Challenges. Here are the current Challenges in the RealWorld Data track:
Click one you want to try—and you’ll get a webpage that explains the Challenge:
Now you can choose either to download the Challenge notebook to the desktop, or just open it directly in your web browser in the Wolfram Cloud. (It’s free to use the Wolfram Cloud for this, though you’ll have to have a login—otherwise the system won’t be able to give you credit for the Challenges you’ve solved.)
Here’s the cloud version of this particular notebook:
You can build up your solution in the Scratch Area, and try it out there. Then when you’re ready, put your code where it says “Enter your code here”. Then press Submit.
What Submit does is to send your solution to the Wolfram Cloud—where it’ll be tested to see if it’s correct. If it’s not correct, you’ll get something like this:
But if it’s correct, you’ll get this, and you’ll be able to go to the leaderboard and see how your solution compared to other people’s. You can submit the same Challenge as many times as you want. (By the way, you can pick your name and icon for the leaderboard from the Profile tab.)
The range of Challenges on the site is broad both in terms of difficulty level and topic. (And, by the way, we’re planning to progressively grow the site, not least through material from outside contributors.)
Here’s an example of a simple Challenge, that for example I can personally solve in a few seconds:
Here’s a significantly more complicated Challenge, that took me a solid 15 minutes to solve at all well:
Some of the Challenges are in a sense “pure algorithm challenges” that don’t depend on any outside data:
Some of the Challenges are “realworld”, and make use of the Wolfram Knowledgebase:
And some of the Challenges are “mathy”, and make use of the math capabilities of the Wolfram Language:
We’ve been planning to launch a site like Wolfram Challenges for years, but it’s only now, with the current state of the Wolfram Cloud, that we’ve been able to set it up as we have today—so that anyone can just open a web browser and start solving Challenges.
Still, we’ve had unannounced preliminary versions for about three years now—complete with a steadily growing number of Challenges. And in fact, a total of 270 people have discovered the preliminary version—and produced in all no less than 11,400 solutions. Some people have solved the same Challenge many times, coming up with progressively shorter or progressively faster solutions. Others have moved on to different Challenges.
It’s interesting to see how diverse the solutions to even a single Challenge can be. Here are word clouds of the functions used in solutions to three different Challenges:
And when it comes to lengths of solutions (here in characters of code), there can be quite a variation for a particular Challenge:
Here’s the distribution of solution lengths for all solutions submitted during the prelaunch period, for all Challenges:
It’s not clear what kind of distribution this is (though it seems close to lognormal). But what’s really nice is how concentrated it is on solutions that aren’t much more than a line long. (81% of them would even fit in a 280character tweet!)
And in fact what we’re seeing can be viewed as a great tribute to the Wolfram Language. In any other programming language most Challenges—if one could do them at all—would take pages of code. But in the Wolfram Language even sophisticated Challenges can often be solved with just tweetlength amounts of code.
Why is this? Well, basically it’s because the Wolfram Language is a different kind of language: it’s a knowledgebased language where lots of knowledge about computation and other things is built right into the language (thanks to 30+ years of hard work on our part).
But then are the Challenges still “real”? Of course! It’s just that the Wolfram Language lets one operate at a higher level. One doesn’t have to worry about writing out the lowlevel mechanics of how even sophisticated operations get implemented—one can just concentrate on the pure highlevel computational thinking of how to get the Challenge done.
OK, so what have been some of the challenges in setting up the Wolfram Challenges site? Probably the most important is how to check whether a particular solution is correct. After all, we’re not just asking to compute some single result (say, 42) that we can readily compare with. We’re asking to create a function that can take a perhaps infinite set of possible arguments, and in each case give the correct result.
So how can we know if the function is correct? In some simple cases, we can actually see if the code of the function can be transformed in a meaningpreserving way into code that we already know is correct. But most of the time—like in most practical software quality assurance—the best thing to do is just to try test cases. Some will be deterministically chosen—say based on checking simple or corner cases. Others can be probabilistically generated.
But in the end, if we find that the function isn’t correct, we want to give the user a simple case that demonstrates this. Often in practice we may first see failure in some fairly complicated case—but then the system tries to simplify the failure as much as possible.
OK, so another issue is: how does one tell whether a particular value of a function is correct? If the value is just something like an integer (say, 343) or a string (say, “hi”), then it’s easy. But what if it’s an approximate number (say, 3.141592…)? Well, then we have to start worrying about numerical precision. And what if it’s a mathematical expression (say, 1 + 1/x)? What transformations should we allow on the expression?
There are many other cases too. If it’s a network, we’ll probably want to say it’s correct if it’s isomorphic to what we expect (i.e. the same up to relabeling nodes). If it’s a graphic, we’ll probably want to say it’s correct if it visually looks the same as we expected, or at least is close enough. And if we’re dealing with realworld data, then we have to make sure to recompute our expected result, to take account of data in our knowledgebase that’s changed because of changes out there in the real world.
Alright, so let’s say we’ve concluded that a particular function is correct. Well now, to fill in the leaderboard, we have to make some measurements on it. First, how long is the code?
We can just format the code in InputForm, then count characters. That gives us one measure. One can also apply ByteCount to just count bytes in the definition of the function. Or we can apply LeafCount, to count the number of leaves in the expression tree for the definition. The leaderboard separately tracks the values for all these measures of “code size”.
OK, so how about the speed of the code? Well, that’s a bit tricky. First because speed isn’t something abstract like “total number of operations on a Turing machine”—it’s actual speed running a computer. And so it has be normalized for the speed of the computer hardware. Then it has to somehow discard idiosyncrasies (say associated with caching) seen in particular test runs, as achieved by RepeatedTiming. Oh, and even more basically, it has to decide which instances of the function to test, and how to average them. (And it has to make sure that it won’t waste too much time chasing an incredibly slow solution.)
Well, to actually do all these things, one has to make a whole sequence of specific decisions. And in the end what we’ve done is to package everything up into a single “speed score” that we report in the leaderboard.
A final metric in the leaderboard is “memory efficiency”. Like “speed score”, this is derived in a somewhat complicated way from actual test runs of the function. But the point is that within narrow margins, the results should be repeatable between identical solutions. (And, yes, the speed and memory leaderboards might change when they’re run in a new version of the Wolfram Language, with different optimizations.)
We first started testing what’s now the Wolfram Challenges site at the Wolfram Summer School in 2016—and it was rapidly clear that many people found the kinds of Challenges we’d developed quite engaging. At first we weren’t sure how long—and perhaps whimsical—to make the Challenges. We experimented with having whole “stories” in each Challenge (like some math competitions and things like Project Euler do). But pretty soon we decided to restrict Challenges to be fairly short to state—albeit sometimes giving them slightly whimsical names.
We tested our Challenges again at the 2017 Wolfram Summer School, as well as at the Wolfram High School Summer Camp—and we discovered that the Challenges were addictive enough that some people systematically went through trying to solve all of them.
We were initially not sure what forms of Challenges to allow. But after a while we made the choice to (at least initially) concentrate on “write a function to do X”, rather than, for example, just “compute X”. Our basic reason was that we wanted the solutions to the Challenges to be more openended.
If the challenge is “compute X”, then there’s typically just one final answer, and once you have it, you have it. But with “write a function to do X”, there’s always a different function to write—that might be faster, smaller, or just different. At a practical level, with “compute X” it’s easier to “spoil the fun” by having answers posted on the web. With “write a function”, yes, there could be one version of code for a function posted somewhere, but there’ll always be other versions to write—and if you always submit versions that have been seen before it’ll soon be pretty clear you have to have just copied them from somewhere.
As it turns out, we’ve actually had quite a bit of experience with the “compute X” format. Because in my book An Elementary Introduction to the Wolfram Language all 655 exercises are basically of the form “write code to compute X”. And in the online version of the book, all these exercises are automatically graded.
Now, if we were just doing “cheap” automatic grading, we’d simply look to see if the code produces the correct result when it runs. But that doesn’t actually check the code. After all, if the answer was supposed to be 42, someone could just give 42 (or maybe 41 + 1) as the “code”.
Our actual automatic grading system is much more sophisticated. It certainly looks at what comes out when the code runs (being careful not to blindly evaluate Quit in a piece of code—and taking account of things like random numbers or graphics or numerical precision). But the real meat of the system is the analysis of the code itself, and the things that happen when it runs.
Because the Wolfram Language is symbolic, “code” is the same kind of thing as “data”. And the automatic grading system makes extensive use of this—not least in applying sequences of symbolic code transformations to determine whether a particular piece of code that’s been entered is equivalent to one that’s known to represent an appropriate solution. (The system has ways to handle “completely novel” code structures too.)
Code equivalence is a difficult (in fact, in general, undecidable) problem. A slightly easier problem (though still in general undecidable) is equivalence of mathematical expressions. And a place where we’ve used this kind of equivalence extensively is in our Wolfram Problem Generator:
Of course, exactly what equivalence we want to allow may depend on the kind of problem we’re generating. Usually we’ll want 1 + x and x + 1 to be considered equivalent. But (1 + x)/x might or might not want to be considered equivalent to 1 + 1/x. It’s not easy to get these things right (and many online grading systems do horribly at it). But by using some of the sophisticated math and symbolic transformation capabilities available in the Wolfram Language, we’ve managed to make this work well in Wolfram Problem Generator.
The Wolfram Challenges site as it exists today is only the beginning. We intend it to grow. And the best way for it to grow—like our longrunning Wolfram Demonstrations Project—is for people to contribute great new Challenges for us to include.
At the bottom of the Wolfram Challenges home page you can download the Challenges Authoring Notebook:
Fill this out, press “Submit Challenge”—and off this will go to us for review.
I’m not surprised that Wolfram Challenges seem to appeal to people who like solving math puzzles, crosswords, brain teasers, sudoku and the like. I’m also not surprised that they appeal to people who like gaming and coding competitions. But personally—for better or worse—I don’t happen to fit into any of these categories. And in fact when we were first considering creating Wolfram Challenges I said “yes, lots of people will like it, but I won’t be one of them”.
Well, I have to say I was wrong about myself. Because actually I really like doing these Challenges—and I’m finding I have to avoid getting started on them because I’ll just keep doing them (and, yes, I’m a finisher, so there’s a risk I could just keep going until I’ve done them all, which would be a very serious investment of time).
So what’s different about these Challenges? I think the answer for me is that they feel much more real. Yes, they’ve been made up to be Challenges. But the kind of thinking that’s needed to solve them is essentially just the same as the kind of thinking I end up doing all the time in “real settings”. So when I work on these Challenges, I don’t feel like I’m “just doing something recreational”; I feel like I’m honing my skills for real things.
Now I readily recognize that not everyone’s motivation structure is the same—and many people will like doing these Challenges as true recreations. But I think it’s great that Challenges can also help build real skills. And of course, if one sees that someone has done lots of these Challenges, it shows that they have some real skills. (And, yes, we’re starting to use Challenges as a way to assess applicants, say, for our summer programs.)
It’s worth saying there are some other nice “potentially recreational” uses of the Wolfram Language too.
One example is competitive livecoding. The Wolfram Language is basically unique in being a language in which interesting programs can be written fast enough that it’s fun to watch. Over the years, I’ve done large amounts of (noncompetitive) livecoding—both in person and livestreamed. But in the past couple of years we’ve been developing the notion of competitive livecoding as a kind of new sport.
We’ve done some trial runs at our Wolfram Technology Conference—and we’re working towards having robust rules and procedures. In what we’ve done so far, the typical challenges have been of the “compute X” form—and people have taken between a few seconds and perhaps ten minutes to complete them. We’ve used what’s now our Wolfram Chat functionality to distribute Challenges and let contestants submit solutions. And we’ve used automated testing methods—together with human “refereeing”—to judge the competitions.
A different kind of recreational application of the Wolfram Language is our TweetaProgram service, released in 2014. The idea here is to write Wolfram Language programs that are short enough to fit in a tweet (and when we launched TweetaProgram that meant just 128 characters)—and to make them produce output that is as interesting as possible:
We’ve also had a live analog of this at our Wolfram Technology Conference for some time: our annual OneLiner Competition. And I have to say that even though I (presumably) know the Wolfram Language well, I’m always amazed at what people actually manage to do with just a single line of Wolfram Language code.
At our most recent Wolfram Technology Conference, in recognition of our advances in machine learning, we decided to also do a “MachineLearning Art Competition”—to make the most interesting possible restyled “Wolfie”:
In the future, we’re planning to do machine learning challenges as part of Wolfram Challenges too. In fact, there are several categories of Challenges we expect to add. We’ve already got Challenges that make use of the Wolfram Knowledgebase, and the builtin data it contains. But we’re also planning to add Challenges that use external data from the Wolfram Data Repository. And we want to add Challenges that involve creating things like neural networks.
There’s a new issue that arises here—and that’s actually associated with a large category of possible Challenges. Because with most uses of things like neural networks, one no longer expects to produce a function that definitively “gets the right answer”. Instead, one just wants a function that does the best possible job on a particular task.
There are plenty of examples of Challenges one can imagine that involve finding “the lowestcost solution”, or the “best fit”. And it’s a similar setup with typical machine learning tasks: find a function (say based on a neural network) that performs best on classifying a certain test set, etc.
And, yes, the basic structure of Wolfram Challenges is well set up to handle a situation like this. It’s just that instead of it definitively telling you that you’ve got a correct solution for a particular Challenge, it’ll just tell you how your solution ranks relative to others on the leaderboard.
The Challenges in the Wolfram Challenges site always have very welldefined end goals. But one of the great things about the Wolfram Language is how easy it is to use it to explore and create in an openended way. But as a kind of analog of Challenges one can always give seeds for this. One example is the Go Further sections of the Explorations in Wolfram Programming Lab. And other examples are the many kinds of project suggestions we make for things like our summer programs.
What is the right output for an openended exploration? I think a good answer in many cases is a computational essay, written in a Wolfram Notebook, and “telling a story” with a mixture of ordinary text and Wolfram Language code. Of course, unlike Challenges, where one’s doing something that’s intended to be checked and analyzed by machine, computational essays are fundamentally about communicating with humans—and don’t have right or wrong “answers”.
One of my overarching goals in creating the Wolfram Language has been to bring computational knowledge and computational thinking to as many people as possible. And the launch of the Wolfram Challenges site is the latest step in the long journey of doing this.
It’s a great way to engage with programming and computational thinking. And it’s set up to always let you know how you’re getting on. Did you solve that Challenge? How did you do relative to other people who’ve also solved the Challenge?
I’m looking forward to seeing just how small and efficient people can make the solutions to these Challenges. (And, yes, large numbers of equivalent solutions provide great raw material for doing machine learning on program transformations and optimization.)
Who will be the leaders on the leaderboards of Wolfram Challenges? I think it’ll be a wide range of people—with different backgrounds and education. Some will be young; some will be old. Some will be from the most techrich parts of the world; some, I hope, will be from techpoor areas. Some will already be energetic contributors to the Wolfram Language community; others, I hope, will come to the Wolfram Language through Challenges—and perhaps even be “discovered” as talented programmers and computational thinkers this way.
But most of all, I hope lots of people get lots of enjoyment and fulfillment out of Wolfram Challenges—and get a chance to experience that thrill that comes with figuring out a particularly clever and powerful solution that you can then see run on your computer.
]]>We sat down with Daniel to learn more about his research and how the Wolfram Language plays a part in it.
This was actually a perfect choice in my research area, and the timing was perfect, since within one week after I joined the group, there was the first gravitational wave detection by LIGO, and things got very exciting from there.
I was very fortunate to work in the most exciting fields of astronomy as well as computer science. At the [NCSA] Gravity Group, I had complete freedom to work on any project that I wanted, and funding to avoid any teaching duties, and a lot of support and guidance from my advisors and mentors who are experts in astrophysics and supercomputing. Also, NCSA was an ideal environment for interdisciplinary research.
Initially, my research was focused on developing gravitational waveform models using postNewtonian methods, calibrated with massively parallel numerical relativity simulations using the Einstein Toolkit on the Blue Waters petascale supercomputer.
These waveform models are used to generate templates that are required for the existing matchedfiltering method (a templatematching method) to detect signals in the data from LIGO and estimate their properties.
However, these templatematching methods are slow and extremely computationally expensive, and not scalable to all types of signals. Furthermore, they are not optimal for the complex nonGaussian noise background in the LIGO detectors. This meant a new approach was necessary to solve these issues.
My article was featured in the special issue commemorating the Nobel Prize in 2017.
Even though peer review is done for free by referees in the scientific community and the expenses to host online articles are negligible, most highprofile journals today are behind expensive paywalls and charge thousands of dollars for publication. However, Physics Letters B is completely open access to everyone in the world for free and has no publication charges for the authors. I believe all journals should follow this example to maximize scientific progress by promoting open science.
This was the main reason why we chose Physics Letters B as the very first journal where we submitted this article.
I think the attendees and judges found this very impressive, since it was connecting highperformance parallel numerical simulations with artificial intelligence methods based on deep learning to enable realtime analysis of big data from LIGO for gravitational wave and multimessenger astrophysics. Basically, this research is at the interface of all these exciting topics receiving a lot of hype recently.
I was always interested in artificial intelligence since my childhood, but I had no background in deep learning or even machine learning until November 2016, when I attended the Supercomputing Conference (SC16).
There was a lot of hype about deep learning at this conference, especially a lot of demos and workshops by NVIDIA, which got me excited to try out these techniques for my research. This was also right after the new neural network functionality was released in Version 11 of the Wolfram Language. I already had the training data of gravitational wave signals from my research with the NCSA Gravity Group, as mentioned before. So all these came together, and this was a perfect time to try out applying deep learning to tackle the problem of gravitational wave analysis.
Since I had no background in this field, I started out by taking an online course by Geoffrey Hinton on Coursera and CS231 at Stanford, and quickly read through the Deep Learning book by Bengio [Courville and Goodfellow], all in about a week.
Then it took only a couple of days to get used to the neural net framework in the Wolfram Language by reading the documentation. I decided to give time series inputs directly into 1D convolutional neural networks instead of images (spectrograms). Amazingly, the very first convolutional network I tried performed better than expected for gravitational wave analysis, which was very encouraging.
Here are some advantages of using deep learning over matched filtering:
1) Speed: The analysis can be carried out within milliseconds using deep learning (with minimal computational resources), which will help in finding the electromagnetic counterpart using telescopes faster. Enabling rapid followup observations can lead to new physical insights.
2) Covering more parameters: Only a small subset of the full parameter space of signals can be searched for using matched filtering (template matching), since the computational cost explodes exponentially with the number of parameters. Deep learning is highly scalable and requires only a onetime training process, so the highdimensional parameter space can be covered.
3) Generalization to new sources: The article shows that signals from new classes of sources beyond the training data, such as spin precessing or eccentric compact binaries, can be automatically detected with this method with the same sensitivity. This is because, unlike templatematching techniques, deep learning can interpolate to points within the training data and generalize beyond it to some extent.
4) Resilience to nonGaussian noise: The results show that this deep learning method can distinguish signals from transient nonGaussian noises (glitches) and works even when a signal is contaminated by a glitch, unlike matched filtering. For instance, the occurrence of a glitch in coincidence with the recent detection of the neutron star merger delayed the analysis by several hours using existing methods and required manual inspection. The deep learning technique can automatically find these events and estimate their parameters.
5) Interpretability: Once the deep learning method detects a signal and predicts its parameters, this can be quickly crossvalidated using matched filtering with a few templates around these predicted parameters. Therefore, this can be seen as a method to accelerate matched filtering by narrowing down the search space—so the interpretability of the results is not lost.
I have been using Mathematica since I was an undergraduate at IIT Bombay. I have used it for symbolic calculation as well as numerical computation.
The Wolfram Language is very coherent, unlike other languages such as Python, and includes all the functionality across different domains of science and engineering without relying on any external packages that have to be loaded. All the 6,000 or so functions have explicit names and are designed with a very similar syntax, which means that most of the time you can simply guess the name and usage without referring to any documentation. The documentation is excellent, and it is all in one place.
Overall, the Wolfram Language saves a researcher’s time by a factor of 2–3x compared to other programming languages. This means you can do twice as much research. If everyone used Mathematica, we could double the progress of science!
I also used it for all my coursework, and submitted Mathematica notebooks exported into PDFs, while everyone else in my class was still writing things down with pen and paper.
The Wolfram Language neural network framework was extremely helpful for me. It is a very highlevel framework and doesn’t require you to worry about what is happening under the hood. Even someone with zero background in deep learning can use it successfully for their projects by simply referring to just the documentation.
Using GPUs to do training with the Wolfram Language was as simple as including the string TargetDevice>"GPU" in the code. With this small change, everything ran on GPUs like magic on any of my machines on Windows, OSX or Linux, including my laptop, Blue Waters, the Campus Cluster, the Volta and Pascal NVIDIA DGX1 deep learning supercomputers and the hybrid machine with four P100 GPUs at the NCSA Innovative Systems Lab.
I used about 12 GPUs in parallel to try out different neural network architectures as well.
I completed the whole project, including the research, writing the paper and posting on arXiv, within two weeks after I came up with the idea at SC16, even though I had never done any deep learning–related work before. This was only possible because I used the Wolfram Language.
I had drafted the initial version of the research paper as a Mathematica notebook. This allowed me to write paragraphs of text and typeset everything, even mathematical equations and figures, and organize into sections and subsections just like in a Word document. At the end, I could export everything into a LaTeX file and submit to the journal.
Everything, including the data preparation, preprocessing, training and inference with the deep convolutional neural nets, along with the preparation of figures and diagrams of the neural net architecture, was done with the Wolfram Language.
Apart from programming, I regularly use Mathematica notebooks as a word processor and to create slides for presentations. All this functionality is included with Mathematica.
Read the documentation, which is one of the greatest strengths of the language.
There are a lot of included examples about using deep learning for various types of problems, such as classification, regression in fields such as time series analysis, natural language processing, image processing, etc.
The Wolfram Neural Net Repository is a unique feature in the Wolfram Language that is super helpful. You can directly import stateoftheart neural network models that are pretrained for hundreds of different tasks and use them in your code. You can also perform “net surgery” on these models to customize them as you please for your research/applications.
The Mathematica Stack Exchange is a very helpful resource, as is the Fast Introduction for Programmers, along with Mathematica Programming—An Advanced Introduction by Leonid Shifrin.
Deep Learning for RealTime Gravitational Wave Detection and Parameter Estimation: Results with Advanced LIGO Data (Physics Letters B)
Glitch Classification and Clustering for LIGO with Deep Transfer Learning (NIPS 2017, Deep Learning for Physical Science)
Deep Neural Networks to Enable RealTime Multimessenger Astrophysics (Physics Review D)
]]>Last September we released Version 11.2 of the Wolfram Language and Mathematica—with all sorts of new functionality, including 100+ completely new functions. Version 11.2 was a big release. But today we’ve got a still bigger release: Version 11.3 that, among other things, includes nearly 120 completely new functions.
This June 23rd it’ll be 30 years since we released Version 1.0, and I’m very proud of the fact that we’ve now been able to maintain an accelerating rate of innovation and development for no less than three decades. Critical to this, of course, has been the fact that we use the Wolfram Language to develop the Wolfram Language—and indeed most of the things that we can now add in Version 11.3 are only possible because we’re making use of the huge stack of technology that we’ve been systematically building for more than 30 years.
We’ve always got a large pipeline of R&D underway, and our strategy for .1 versions is to use them to release everything that’s ready at a particular moment in time. Sometimes what’s in a .1 version may not completely fill out a new area, and some of the functions may be tagged as “experimental”. But our goal with .1 versions is to be able to deliver the latest fruits of our R&D efforts on as timely a basis as possible. Integer (.0) versions aim to be more systematic, and to provide full coverage of new areas, rounding out what has been delivered incrementally in .1 versions.
In addition to all the new functionality in 11.3, there’s a new element to our process. Starting a couple of months ago, we began livestreaming internal design review meetings that I held as we brought Version 11.3 to completion. So for those interested in “how the sausage is made”, there are now almost 122 hours of recorded meetings, from which you can find out exactly how some of the things you can now see released in Version 11.3 were originally invented. And in this post, I’m going to be linking to specific recorded livestreams relevant to features I’m discussing.
OK, so what’s new in Version 11.3? Well, a lot of things. And, by the way, Version 11.3 is available today on both desktop (Mac, Windows, Linux) and the Wolfram Cloud. (And yes, it takes extremely nontrivial software engineering, management and quality assurance to achieve simultaneous releases of this kind.)
In general terms, Version 11.3 not only adds some completely new directions, but also extends and strengthens what’s already there. There’s lots of strengthening of core functionality: still more automated machine learning, more robust data import, knowledgebase predictive prefetching, more visualization options, etc. There are all sorts of new conveniences: easier access to external languages, immediate input iconization, direct currying, etc. And we’ve also continued to aggressively push the envelope in all sorts of areas where we’ve had particularly active development in recent years: machine learning, neural nets, audio, asymptotic calculus, external language computation, etc.
Here’s a word cloud of new functions that got added in Version 11.3:
There are so many things to say about 11.3, it’s hard to know where to start. But let’s start with something topical: blockchain. As I’ll be explaining at much greater length in future posts, the Wolfram Language—with its builtin ability to talk about the real world—turns out to be uniquely suited to defining and executing computational smart contracts. The actual Wolfram Language computation for these contracts will (for now) happen off the blockchain, but it’s important for the language to be able to connect to blockchains—and that’s what’s being added in Version 11.3. [Livestreamed design discussion.]
The first thing we can do is just ask about blockchains that are out there in the world. Like here’s the most recent block added to the main Ethereum blockchain:
✕ BlockchainBlockData[1, BlockchainBase > "Ethereum"] 
Now we can pick up one of the transactions in that block, and start looking at it:
✕ BlockchainTransactionData[\ "735e1643c33c6a632adba18b5f321ce0e13b612c90a3b9372c7c9bef447c947c", BlockchainBase > "Ethereum"] 
And we can then start doing data science—or whatever analysis—we want about the structure and content of the blockchain. For the initial release of Version 11.3, we’re supporting Bitcoin and Ethereum, though other public blockchains will be added soon.
But already in Version 11.3, we’re supporting a private (Bitcoincore) Wolfram Blockchain that’s hosted in our Wolfram Cloud infrastructure. We’ll be periodically publishing hashes from this blockchain out in the world (probably in things like physical newspapers). And it’ll also be possible to run versions of it in private Wolfram Clouds.
It’s extremely easy to write something to the Wolfram Blockchain (and, yes, it charges a small number of Cloud Credits):
✕ BlockchainPut[Graphics[Circle[]]] 
The result is a transaction hash, which one can then look up on the blockchain:
✕ BlockchainTransactionData[\ "9db73562fb45a75dd810456d575abbeb313ac19a2ec5813974c108a6935fcfb9"] 
Here’s the circle back again from the blockchain:
✕

By the way, the Hash function in the Wolfram Language has been extended in 11.3 to immediately support the kinds of hashes (like “RIPEMD160SHA256”) that are used in cryptocurrency blockchains. And by using Encrypt and related functions, it’s possible to start setting up some fairly sophisticated things on the blockchain—with more coming soon.
Alright, so now let’s talk about something really big that’s new—at least in experimental form—in Version 11.3. One of our longterm goals in the Wolfram Language is to be able to compute about anything in the world. And in Version 11.3 we’re adding a major new class of things that we can compute about: complex engineering (and other) systems. [Livestreamed design discussions 1 and 2.]
Back in 2012 we introduced Wolfram SystemModeler: an industrialstrength system modeling environment that’s been used to model things like jet engines with tens of thousands of components. SystemModeler lets you both run simulations of models, and actually develop models using a sophisticated graphical interface.
What we’re adding (experimentally) in Version 11.3 is the builtin capability for the Wolfram Language to run models from SystemModeler—or in fact basically any model described in the Modelica language.
Let’s start with a simple example. This retrieves a particular model from our builtin repository of models:
✕
SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"] 
If you press the [+] you see more detail:
But the place where it gets really interesting is that you can actually run this model. SystemModelPlot makes a plot of a “standard simulation” of the model:
✕ SystemModelPlot[ SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"]] 
What actually is the model underneath? Well, it’s a set of equations that describe the dynamics of how the components of the system behave. And for a very simple system like this, these equations are already pretty complicated:
✕ SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"][\ "SystemEquations"] 
It comes with the territory in modeling realworld systems that there tend to be lots of components, with lots of complicated interactions. SystemModeler is set up to let people design arbitrarily complicated systems graphically, hierarchically connecting together components representing physical or other objects. But the big new thing is that once you have the model, then with Version 11.3 you can immediately work with it in the Wolfram Language.
Every model has lots of properties:
✕ [SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"] \ "Properties"] 
One of these properties gives the variables that characterize the system. And, yes, even in a very simple system like this, there are already lots of those:
✕ [SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"] \ "SystemVariables"] 
Here’s a plot of how one of those variables behaves in the simulation:
✕ SystemModelPlot[[SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"], "idealTriac.capacitor.p.i"]] 
A typical thing one wants to do is to investigate how the system behaves when parameters are changed. This simulates the system with one of its parameters changed, then makes a plot:
✕ SystemModelSimulate[[SystemModel["Modelica.Electrical.Analog.Examples.IdealTriacCircuit"]], {"V.freqHz" > 2.5}>] 
✕ SystemModelPlot[%, "idealTriac.capacitor.p.i"] 
We could go on from here to sample lots of different possible inputs or parameter values, and do things like studying the robustness of the system to changes. Version 11.3 provides a very rich environment for doing all these things as an integrated part of the Wolfram Language.
In 11.3 there are already over 1000 readytorun models included—of electrical, mechanical, thermal, hydraulic, biological and other systems. Here’s a slightly more complicated example—the core part of a car:
✕ SystemModel["IndustryExamples.AutomotiveTransportation.Driveline.\ DrivelineModel"] 
If you expand the icon, you can mouse over the parts to find out what they are:
This gives a quick summary of the model, showing that it involves 1110 variables:
✕ SystemModel["IndustryExamples.AutomotiveTransportation.Driveline.\ DrivelineModel"]["Summary"] 
In addition to complete readytorun models, there are also over 6000 components included in 11.3, from which models can be constructed. SystemModeler provides a full graphical environment for assembling these components. But one can also do it purely with Wolfram Language code, using functions like ConnectSystemModelComponents (which essentially defines the graph of how the connectors of different components are connected):
✕ components = {"R" \[Element] "Modelica.Electrical.Analog.Basic.Resistor", "L" \[Element] "Modelica.Electrical.Analog.Basic.Inductor", "AC" \[Element] "Modelica.Electrical.Analog.Sources.SineVoltage", "G" \[Element] "Modelica.Electrical.Analog.Basic.Ground"}; 
✕ connections = {"G.p" > "AC.n", "AC.p" > "L.n", "L.p" > "R.n", "R.p" > "AC.n"}; 
✕ model = ConnectSystemModelComponents[components, connections] 
You can also create models directly from their underlying equations, as well as making “blackbox models” purely from data or empirical functions (say from machine learning).
It’s taken a long time to build all the system modeling capabilities that we’re introducing in 11.3. And they rely on a lot of sophisticated features of the Wolfram Language—including largescale symbolic manipulation, the ability to robustly solve systems of differentialalgebraic equations, handling of quantities and units, and much more. But now that system modeling is integrated into the Wolfram Language, it opens all sorts of important new opportunities—not only in engineering, but in all fields that benefit from being able to readily simulate multicomponent realworld systems.
We first introduced notebooks in Version 1.0 back in 1988—so by now we’ve been polishing how they work for no less than 30 years. Version 11.3 introduces a number of new features. A simple one is that closed cell groups now by default have an “opener button”, as well as being openable using their cell brackets:
I find this helpful, because otherwise I sometimes don’t notice closed groups, with extra cells inside. (And, yes, if you don’t like it, you can always switch it off in the stylesheet.)
Another small but useful change is the introduction of “indefinite In/Out labels”. In a notebook that’s connected to an active kernel, successive cells are labeled In[1], Out[1], etc. But if one’s no longer connected to the same kernel (say, because one saved and reopened the notebook), the In/Out numbering no longer makes sense. So in the past, there were just no In, Out labels shown. But as of Version 11.3, there are still labels, but they’re grayed down, and they don’t have any explicit numbers in them:
Another new feature in Version 11.3 is Iconize. Here’s the basic problem it solves. Let’s say you’ve got some big piece of data or other input that you want to store in the notebook, but you don’t want it to visually fill up the notebook. Well, one thing you can do is to put it in closed cells. But then to use the data you have to do something like creating a variable and so on. Iconize provides a simple, inline way to save data in a notebook.
Here’s how you make an iconized version of an expression:
✕ Iconize[Range[10]] 
Now you can use this iconized form in place of giving the whole expression; it just immediately evaluates to the full expression:
✕ Reverse[{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}] 
Another convenient use of Iconize is to make code easier to read, while still being complete. For example, consider something like this:
✕ Plot[Sin[Tan[x]], {x, 0, 10}, Filling > Axis, PlotTheme > "Scientific"] 
You can select the options here, then go to the rightclick menu and say to Iconize them:
The result is an easiertoread piece of code—that still evaluates just as it did before:
✕ Plot[Sin[Tan[x]], {x, 0, 10}, Sequence[ Filling > Axis, PlotTheme > "Scientific"]] 
In Version 11.2 we introduced ExternalEvaluate, for evaluating code in external languages (initially Python and JavaScript) directly from the Wolfram Language. (This is supported on the desktop and in private clouds; for security and provisioning reasons, the public Wolfram Cloud only runs pure Wolfram Language code.)
In Version 11.3 we’re now making it even easier to enter external code in notebooks. Just start an input cell with a > and you’ll get an external code cell (you can stickily select the language you want):
✕ ExternalEvaluate["Python", "import platform; platform.platform()"] 
And, yes, what comes back is a Wolfram Language expression that you can compute with:
✕ StringSplit[%, ""] 
We put a lot of emphasis on documenting the Wolfram Language—and traditionally we’ve had basically three kinds of components to our documentation: “reference pages” that cover a single function, “guide pages” that give a summary with links to many functions, and “tutorials” that provide narrative introductions to areas of functionality. Well, as of Version 11.3 there’s a fourth kind of component: workflows—which is what the gray tiles at the bottom of the “root guide page” lead to.
When everything you’re doing is represented by explicit Wolfram Language code, the In/Out paradigm of notebooks is a great way to show what’s going on. But if you’re clicking around, or, worse, using external programs, this isn’t enough. And that’s where workflows come in—because they use all sorts of graphical devices to present sequences of actions that aren’t just entering Wolfram Language input.
So if you’re getting coordinates from a plot, or deploying a complex form to the web, or adding a banner to a notebook, then expect to follow the new workflow documentation that we have. And, by the way, you’ll find links to relevant workflows from reference pages for functions.
Another big new interfacerelated thing in Version 11.3 is Presenter Tools—a complete environment for creating and running presentations that include live interactivity. What makes Presenter Tools possible is the rich notebook system that we’ve built over the past 30 years. But what it does is to add all the features one needs to conveniently create and run really great presentations.
People have been using our previous SlideShow format to give presentations with Wolfram Notebooks for about 20 years. But it was never a complete solution. Yes, it provided nice notebook features like live computation in a slide show environment, but it didn’t do “PowerPointlike” things such as automatically scaling content to screen resolution. To be fair, we expected that operating systems would just intrinsically solve problems like content scaling. But it’s been 20 years and they still haven’t. So now we’ve built the new Presenter Tools that both solves such problems, and adds a whole range of features to create great presentations with notebooks as easy as possible.
To start, just choose File > New > Presenter Notebook. Then pick your template and theme, and you’re off and running:
Here’s what it looks like when you’re editing your presentation (and you can change themes whenever you want):
When you’re ready to present, just press Start Presentation. Everything goes full screen and is automatically scaled to the resolution of the screen you’re using. But here’s the big difference from PowerPointlike systems: everything is live, interactive, editable, and scrollable. For example, you can have a Manipulate right inside a slide, and you can immediately interact with it. (Oh, and everything can be dynamic, say recreating graphics based on data that’s being imported in real time.) You can also use things like cell groups to organize content in slides. And you can edit what’s on a slide, and for example, do livecoding, running your code as you go.
When you’re ready to go to a new slide, just press a single key (or have your remote do it for you). By default, the key is Page Down (so you can still use arrow keys in editing), but you can set a different key if you want. You can have Presenter Tools show your slides on one display, then display notes and controls on another display. When you make your slides, you can include SideNotes and SideCode. SideNotes are “PowerPointlike” textual notes. But SideCode is something different. It’s actually based on something I’ve done in my own talks for years. It’s code you’ve prepared, that you can “magically” insert onto a slide in real time during your presentation, immediately evaluating it if you want.
I’ve given a huge number of talks using Wolfram Notebooks over the years. A few times I’ve used the SlideShow format, but mostly I’ve just done everything in an ordinary notebook, often keeping notes on a separate device. But now I’m excited that with Version 11.3 I’ve got basically exactly the tools I need to prepare and present talks. I can predefine some of the content and structure, but then the actual talk can be very dynamic and spontaneous—with live editing, livecoding and all sorts of interactivity.
While we’re discussing interface capabilities, here’s another new one: Wolfram Chat. When people are interactively working together on something, it’s common to hear someone say “let me just send you a piece of code” or “let me send you a Manipulate”. Well, in Version 11.3 there’s now a very convenient way to do this, built directly into the Wolfram Notebook system—and it’s called Wolfram Chat. [Livestreamed design discussion.]
Just select File > New > Chat; you’ll get asked who you want to “chat with”—and it could be anyone anywhere with a Wolfram ID (though of course they do have to accept your invitation):
Then you can start a chat session, and, for example, put it alongside an ordinary notebook:
The neat thing is that you can send anything that can appear in a notebook, including images, code, dynamic objects, etc. (though it’s sandboxed so people can’t send “code bombs” to each other).
There are lots of obvious applications of Wolfram Chat, not only in collaboration, but also in things like classroom settings and technical support. And there are some other applications too. Like for running livecoding competitions. And in fact one of the ways we stresstested Wolfram Chat during development was to use it for the livecoding competition at the Wolfram Technology Conference last fall.
One might think that chat is something straightforward. But actually it’s surprisingly tricky, with a remarkable number of different situations and cases to cover. Under the hood, Wolfram Chat is using both the Wolfram Cloud and the new pubsub channel framework that we introduced in Version 11.0. In Version 11.3, Wolfram Chat is only being supported for desktop Wolfram Notebooks, but it’ll be coming soon to notebooks on the web and on mobile.
We’re always polishing the Wolfram Language to make it more convenient and productive to use. And one way we do this is by adding new little “convenience functions” in every version of the language. Often what these functions do is pretty straightforward; the challenge (which has often taken years) is to come up with really clean designs for them. (You can see quite a bit of the discussion about the new convenience functions for Version 11.3 in livestreams we’ve done recently.)
Here’s a function that it’s sort of amazing we’ve never explicitly had before—a function that just constructs an expression from its head and arguments:
✕ Construct[f, x, y] 
Why is this useful? Well, it can save explicitly constructing pure functions with Function or &, for example in a case like this:
✕ Fold[Construct, f, {a, b, c}] 
Another function that at some level is very straightforward (but about whose name we agonized for quite a while) is Curry. Curry (named after “currying”, which is in turn named after Haskell Curry) essentially makes operator forms, with Curry[f,n] “currying in” n arguments:
✕ Curry[f, 3][a][b][c][d][e] 
The oneargument form of Curry itself is:
✕ Curry[f][x][y] 
Why is this useful? Well, some functions (like Select, say) have builtin “operator forms”, in which you give one argument, then you “curry in” others:
✕ Select[# > 5 &][Range[10]] 
But what if you wanted to create an operator form yourself? Well, you could always explicitly construct it using Function or &. But with Curry you don’t need to do that. Like here’s an operator form of D, in which the second argument is specified to be x:
✕ Curry[D][x] 
Now we can apply this operator form to actually do differentiation with respect to x:
✕ %[f[x]] 
Yes, Curry is at some level rather abstract. But it’s a nice convenience if you understand it—and understanding it is a good exercise in understanding the symbolic structure of the Wolfram Language.
Talking of operator forms, by the way, NearestTo is an operatorform analog of Nearest (the oneargument form of Nearest itself generates a NearestFunction):
✕ NearestTo[2.3][{1, 2, 3, 4, 5}] 
Here’s an example of why this is useful. This finds the 5 chemical elements whose densities are nearest to 10 g/cc:
✕ Entity["Element", "Density" > NearestTo[\!\(\* NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "10 g/cc", Typeset`boxes$$ = TemplateBox[{"10", RowBox[{"\"g\"", " ", "\"/\"", " ", SuperscriptBox["\"cm\"", "3"]}], "grams per centimeter cubed", FractionBox["\"Grams\"", SuperscriptBox["\"Centimeters\"", "3"]]}, "Quantity", SyntaxForm > Mod], Typeset`allassumptions$$ = {}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, Typeset`querystate$$ = { "Online" > True, "Allowed" > True, "mparse.jsp" > 0.777394`6.342186177878503, "Messages" > {}}}, DynamicBox[ToBoxes[ AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache>{94., {8., 19.}}, TrackedSymbols:>{ Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues:>{}, UndoTrackedVariables:>{Typeset`open$$}], BaseStyle>{"Deploy"}, DeleteWithContents>True, Editable>False, SelectWithContents>True]\), 5]] // EntityList 
In Version 10.1 in 2015 we introduced a bunch of functions that operate on sequences in lists. Version 11.3 adds a couple more such functions. One is SequenceSplit. It’s like StringSplit for lists: it splits lists at the positions of particular sequences:
✕ uenceSplit[{a, b, x, x, c, d, x, e, x, x, a, b}, {x, x}] 
Also new in the “Sequence family” is the function SequenceReplace:
✕ SequenceReplace[{a, b, x, x, c, d, x, e, x, x, a, b}, {x, n_} > {n, n, n}] 
Just as we’re always polishing the core programming functionality of the Wolfram Language, we’re also always polishing things like visualization.
In Version 11.0, we added GeoHistogram, here showing “volcano density” in the US:
✕ GeoHistogram[GeoPosition[GeoEntities[\!\(\* NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "USA", Typeset`boxes$$ = TemplateBox[{"\"United States\"", RowBox[{"Entity", "[", RowBox[{"\"Country\"", ",", "\"UnitedStates\""}], "]"}], "\"Entity[\\\"Country\\\", \\\"UnitedStates\\\"]\"", "\"country\""}, "Entity"], Typeset`allassumptions$$ = {{ "type" > "Clash", "word" > "USA", "template" > "Assuming \"${word}\" is ${desc1}. Use as \ ${desc2} instead", "count" > "2", "Values" > {{ "name" > "Country", "desc" > "a country", "input" > "*C.USA_*Country"}, { "name" > "FileFormat", "desc" > "a file format", "input" > "*C.USA_*FileFormat"}}}}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, Typeset`querystate$$ = { "Online" > True, "Allowed" > True, "mparse.jsp" > 0.373096`6.02336558644664, "Messages" > {}}}, DynamicBox[ToBoxes[ AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache>{197., {7., 16.}}, TrackedSymbols:>{ Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues:>{}, UndoTrackedVariables:>{Typeset`open$$}], BaseStyle>{"Deploy"}, DeleteWithContents>True, Editable>False, SelectWithContents>True]\), "Volcano"]]] 
In Version 11.3, we’ve added GeoSmoothHistogram:
✕ GeoSmoothHistogram[GeoPosition[GeoEntities[\!\(\* NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "USA", Typeset`boxes$$ = TemplateBox[{"\"United States\"", RowBox[{"Entity", "[", RowBox[{"\"Country\"", ",", "\"UnitedStates\""}], "]"}], "\"Entity[\\\"Country\\\", \\\"UnitedStates\\\"]\"", "\"country\""}, "Entity"], Typeset`allassumptions$$ = {{ "type" > "Clash", "word" > "USA", "template" > "Assuming \"${word}\" is ${desc1}. Use as \ ${desc2} instead", "count" > "2", "Values" > {{ "name" > "Country", "desc" > "a country", "input" > "*C.USA_*Country"}, { "name" > "FileFormat", "desc" > "a file format", "input" > "*C.USA_*FileFormat"}}}}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, Typeset`querystate$$ = { "Online" > True, "Allowed" > True, "mparse.jsp" > 0.373096`6.02336558644664, "Messages" > {}}}, DynamicBox[ToBoxes[ AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache>{197., {7., 16.}}, TrackedSymbols:>{ Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues:>{}, UndoTrackedVariables:>{Typeset`open$$}], BaseStyle>{"Deploy"}, DeleteWithContents>True, Editable>False, SelectWithContents>True]\), "Volcano"]]] 
Also new in Version 11.3 are callouts in 3D plots, here random words labeling random points (but note how the words are positioned to avoid each other):
✕ ListPointPlot3D[Table[Callout[RandomReal[10, 3], RandomWord[]], 25]] 
We can make a slightly more meaningful plot of words in 3D by using the new machinelearningbased FeatureSpacePlot3D (notice for example that “vocalizing” and “crooning” appropriately end up close together):
✕ FeatureSpacePlot3D[RandomWord[20]] 
Talking of machine learning, Version 11.3 continues our aggressive development of automated machine learning, building both general tools, and specific functions that make use of machine learning.
An interesting example of a new function is FindTextualAnswer, which takes a piece of text, and tries to find answers to textual questions. Here we’re using the Wikipedia article on “rhinoceros”, asking how much a rhino weighs:
✕ FindTextualAnswer[ WikipediaData["rhinoceros"], "How much does a rhino weigh?"] 
It almost seems like magic. Of course it doesn’t always work, and it can do things that we humans would consider pretty stupid. But it’s using very stateoftheart machine learning methodology, together with a lot of unique training data based on WolframAlpha. We can see a little more of what it does if we ask not just for its top answer about rhino weights, but for its top 5:
✕ FindTextualAnswer[ WikipediaData["rhinoceros"], "How much does a rhino weigh?", 5] 
Hmmm. So what’s a more definitive answer? Well, for that we can use our actual curated knowledgebase:
✕ \!\( NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "rhino weight", Typeset`boxes$$ = RowBox[{ TemplateBox[{"\"rhinoceroses\"", RowBox[{"Entity", "[", RowBox[{"\"Species\"", ",", "\"Family:Rhinocerotidae\""}], "]"}], "\"Entity[\\\"Species\\\", \\\"Family:Rhinocerotidae\\\"]\"", "\"species specification\""}, "Entity"], "[", TemplateBox[{"\"weight\"", RowBox[{"EntityProperty", "[", RowBox[{"\"Species\"", ",", "\"Weight\""}], "]"}], "\"EntityProperty[\\\"Species\\\", \\\"Weight\\\"]\""}, "EntityProperty"], "]"}], Typeset`allassumptions$$ = {{ "type" > "MultiClash", "word" > "", "template" > "Assuming ${word1} is referring to ${desc1}. Use \ \"${word2}\" as ${desc2}. Use \"${word3}\" as ${desc3}.", "count" > "3", "Values" > {{ "name" > "Species", "word" > "rhino", "desc" > "a species specification", "input" > "*MC.%7E_*Species"}, { "name" > "Person", "word" > "rhino", "desc" > "a person", "input" > "*MC.%7E_*Person"}, { "name" > "Formula", "word" > "", "desc" > "a formula", "input" > "*MC.%7E_*Formula"}}}}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1}, Typeset`querystate$$ = { "Online" > True, "Allowed" > True, "mparse.jsp" > 0.812573`6.361407381082941, "Messages" > {}}}, DynamicBox[ToBoxes[ AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache>{96., {7., 16.}}, TrackedSymbols:>{ Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues:>{}, UndoTrackedVariables:>{Typeset`open$$}], BaseStyle>{"Deploy"}, DeleteWithContents>True, Editable>False, SelectWithContents>True]\) 
Or in tons:
✕ UnitConvert[%, \!\(\* NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "tons", Typeset`boxes$$ = TemplateBox[{ InterpretationBox[" ", 1], "\"sh tn\"", "short tons", "\"ShortTons\""}, "Quantity", SyntaxForm > Mod], Typeset`allassumptions$$ = {{ "type" > "Clash", "word" > "tons", "template" > "Assuming \"${word}\" is ${desc1}. Use as \ ${desc2} instead", "count" > "2", "Values" > {{ "name" > "Unit", "desc" > "a unit", "input" > "*C.tons_*Unit"}, { "name" > "Word", "desc" > "a word", "input" > "*C.tons_*Word"}}}, { "type" > "Unit", "word" > "tons", "template" > "Assuming ${desc1} for \"${word}\". Use ${desc2} \ instead", "count" > "10", "Values" > {{ "name" > "ShortTons", "desc" > "short tons", "input" > "UnitClash_*tons.*ShortTons"}, { "name" > "LongTons", "desc" > "long tons", "input" > "UnitClash_*tons.*LongTons"}, { "name" > "MetricTons", "desc" > "metric tons", "input" > "UnitClash_*tons.*MetricTons"}, { "name" > "ShortTonsForce", "desc" > "short tonsforce", "input" > "UnitClash_*tons.*ShortTonsForce"}, { "name" > "TonsOfTNT", "desc" > "tons of TNT", "input" > "UnitClash_*tons.*TonsOfTNT"}, { "name" > "DisplacementTons", "desc" > "displacement tons", "input" > "UnitClash_*tons.*DisplacementTons"}, { "name" > "LongTonsForce", "desc" > "long tonsforce", "input" > "UnitClash_*tons.*LongTonsForce"}, { "name" > "MetricTonsForce", "desc" > "metric tonsforce", "input" > "UnitClash_*tons.*MetricTonsForce"}, { "name" > "TonsOfRefrigerationUS", "desc" > "US commercial tons of refrigeration", "input" > "UnitClash_*tons.*TonsOfRefrigerationUS"}, { "name" > "TonsOfRefrigerationUKCommercial", "desc" > "UK commercial tons of refrigeration (power)", "input" > "UnitClash_*tons.*\ TonsOfRefrigerationUKCommercial"}}}}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1}, Typeset`querystate$$ = { "Online" > True, "Allowed" > True, "mparse.jsp" > 0.303144`5.933193970346431, "Messages" > {}}}, DynamicBox[ToBoxes[ AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache>{47., {7., 16.}}, TrackedSymbols:>{ Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues:>{}, UndoTrackedVariables:>{Typeset`open$$}], BaseStyle>{"Deploy"}, DeleteWithContents>True, Editable>False, SelectWithContents>True]\)] 
FindTextualAnswer is no substitute for our whole data curation and computable data strategy. But it’s useful as a way to quickly get a first guess of an answer, even from completely unstructured text. And, yes, it should do well at critical reading exercises, and could probably be made to do well at Jeopardy! too.
We humans respond a lot to human faces, and with modern machine learning it’s possible to do all sorts of facerelated computations—and in Version 11.3 we’ve added systematic functions for this. Here FindFaces pulls out faces (of famous physicists) from a photograph:
✕ FindFaces[CloudGet["https://wolfr.am/sWoDYqbb"], "Image"] 
FacialFeatures uses machine learning methods to estimate various attributes of faces (such as the apparent age, apparent gender and emotional state):
✕ FacialFeatures[CloudGet["https://wolfr.am/sWRQARe8"]]//Dataset 
These features can for example be used as criteria in FindFaces, here picking out physicists who appear to be under 40:
✕ FindFaces[CloudGet["https://wolfr.am/sWoDYqbb"], #Age < 40 &, "Image"] 
There are now all sorts of functions in the Wolfram Language (like FacialFeatures) that use neural networks inside. But for several years we’ve also been energetically building a whole subsystem in the Wolfram Language to let people work directly with neural networks. We’ve been building on top of lowlevel libraries (particularly MXNet, to which we’ve been big contributors), so we can make use of all the latest GPU and other optimizations. But our goal is to build a highlevel symbolic layer that makes it as easy as possible to actually set up neural net computations. [Livestreamed design discussions 1, 2 and 3.]
There are many parts to this. Setting up automatic encoding and decoding to standard Wolfram Language constructs for text, images, audio and so on. Automatically being able to knit together individual neural net operations, particularly ones that deal with things like sequences. Being able to automate training as much as possible, including automatically doing hyperparameter optimization.
But there’s something perhaps even more important too: having a large library of existing, trained (and untrained) neural nets, that can both be used directly for computations, and can be used for transfer learning, or as feature extractors. And to achieve this, we’ve been building our Neural Net Repository:
There are networks here that do all sorts of remarkable things. And we’re adding new networks every week. Each network has its own page, that includes examples and detailed information. The networks are stored in the cloud. But all you have to do to pull them into your computation is to use NetModel:
✕ NetModel["3D Face Alignment Net Trained on 300W Large Pose Data"] 
Here’s the actual network used by FindTextualAnswer:
✕ NetModel["Wolfram FindTextualAnswer Net for WL 11.3"] 
One thing that’s new in Version 11.3 is the iconic representation we’re using for networks. We’ve optimized it to give you a good overall view of the structure of net graphs, but then to allow interactive drilldown to any level of detail. And when you train a neural network, the interactive panels that come up have some spiffy new features—and with NetTrainResultsObject, we’ve now made the actual training process itself computable.
Version 11.3 has some new layer types like CTCLossLayer (particularly to support audio), as well as lots of updates and enhancements to existing layer types (10x faster LSTMs on GPUs, automatic variablelength convolutions, extensions of many layers to support arbitrarydimension inputs, etc.). In Version 11.3 we’ve had a particular focus on recurrent networks and sequence generation. And to support this, we’ve introduced things like NetStateObject—that basically allows a network to have a persistent state that’s updated as a result of input data the network receives.
In developing our symbolic neural net framework we’re really going in two directions. The first is to make everything more and more automated, so it’s easier and easier to set up neural net systems. But the second is to be able to readily handle more and more neural net structures. And in Version 11.3 we’re adding a whole collection of “network surgery” functions—like NetTake, NetJoin and NetFlatten—to let you go in and tweak and hack neural nets however you want. Of course, our system is designed so that even if you do this, our whole automated system—with training and so on—still works just fine.
For more than 30 years, we’ve been on a mission to make as much mathematics as possible computational. And in Version 11.3 we’ve finally started to crack an important holdout area: asymptotic analysis.
Here’s a simple example: find an approximate solution to a differential equation near x = 0:
✕ AsymptoticDSolveValue[x^2 y'[x] + (x^2 + 1) y[x] == 0, y[x], {x, 0, 10}] 
At first, this might just look like a power series solution. But look more carefully: there’s an e^{(1/x)} factor that would just give infinity at every order as a power series in x. But with Version 11.3, we’ve now got asymptotic analysis functions that handle all sorts of scales of growth and oscillation, not just powers.
Back when I made my living as a physicist, it always seemed like some of the most powerful dark arts centered around perturbation methods. There were regular perturbations and singular perturbations. There were things like the WKB method, and the boundary layer method. The point was always to compute an expansion in some small parameter, but it seemed to always require different trickery in different cases to achieve it. But now, after a few decades of work, we finally in Version 11.3 have a systematic way to solve these problems. Like here’s a differential equation where we’re looking for the solution for small ε:
✕ AsymptoticDSolveValue[{\[Epsilon] y''[x] + (x + 1) y[x] == 0, y[0] == 1, y[1] == 0}, y[x], x, {\[Epsilon], 0, 2}] 
Back in Version 11.2, we added a lot of capabilities for dealing with more sophisticated limits. But with our asymptotic analysis techniques we’re now also able to do something else, that’s highly relevant for all sorts of problems in areas like number theory and computational complexity theory, which is to compare asymptotic growth rates.
This is asking: is 2^{nk} asymptotically less than (n^{m})! as n>∞? The result: yes, subject to certain conditions:
✕ AsymptoticLess[ 2^n^k, (n^m)!, n > \[Infinity]] 
One of the features of WolframAlpha popular among students is its “Show Steps” functionality, in which it synthesizes “onthefly tutorials” showing how to derive answers it gives. But what actually are the steps, in, say, a Show Steps result for algebra? Well, they’re “elementary operations” like “add the corresponding sides of two equations”. And in Version 11.3, we’re including functions to just directly do things like this:
✕ AddSides[a == b, c == d] 
✕ MultiplySides[a == b, c == d] 
And, OK, it seems like these are really trivial functions, that basically just operate on the structure of equations. And that’s actually what I thought when I said we should implement them. But as our Algebra R&D team quickly pointed out, there are all sorts of gotchas (“what if b is negative?”, etc.), that are what students often get wrong—but that with all of the algorithmic infrastructure in the Wolfram Language it’s easy for us to get right:
✕ MultiplySides[x/b > 7, b] 
The Wolfram Language is mostly about computing results. But given a result, one can also ask why it’s correct: one can ask for some kind of proof that demonstrates that it’s correct. And for more than 20 years I’ve been wondering how to find and represent general proofs in a useful and computable way in the Wolfram Language. And I’m excited that finally in Version 11.3 the function FindEquationalProof provides an example—which we’ll be generalizing and building on in future versions. [Livestreamed design discussion.]
My alltime favorite success story for automated theorem proving is the tiny (and in fact provably simplest) axiom system for Boolean algebra that I found in 2000. It’s just a single axiom, with a single operator that one can think of as corresponding to the Nand operation. For 11 years, FullSimplify has actually been able to use automated theoremproving methods inside, to be able to compute things. So here it’s starting from my axiom for Boolean algebra, then computing that Nand is commutative:
✕ FullSimplify[nand[p, q] == nand[q, p], ForAll[{a, b, c}, nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]] 
But this just tells us the result; it doesn’t give any kind of proof. Well, in Version 11.3, we can now get a proof:
✕ proof = FindEquationalProof[nand[p, q] == nand[q, p], ForAll[{a, b, c}, nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]] 
What is the proof object? We can see from the summary that the proof takes 102 steps. Then we can ask for a “proof graph”. The green arrow at the top represents the original axiom; the red square at the bottom represents the thing being proved. All the nodes in the middle are intermediate lemmas, proved from each other according to the connections shown.
✕ proof = FindEquationalProof[nand[p, q] == nand[q, p], ForAll[{a, b, c}, nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]]; proof["ProofGraph"] 
What’s actually in the proof? Well, it’s complicated. But here’s a dataset that gives all the details:
✕ proof = FindEquationalProof[nand[p, q] == nand[q, p], ForAll[{a, b, c}, nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]]; proof["ProofDataset"] 
You can get a somewhat more narrative form as a notebook too:
✕ proof = FindEquationalProof[nand[p, q] == nand[q, p], ForAll[{a, b, c}, nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]]; proof["ProofNotebook"] 
And then you can also get a “proof function”, which is a piece of code that can be executed to verify the result:
✕ proof = FindEquationalProof[nand[p, q] == nand[q, p], ForAll[{a, b, c}, nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]]; proof["ProofFunction"] 
Unsurprisingly, and unexcitingly, it gives True if you run it:
✕ proof = FindEquationalProof[nand[p, q] == nand[q, p], ForAll[{a, b, c}, nand[nand[nand[a, b], c], nand[a, nand[nand[a, c], a]]] == c]]; proof["ProofFunction"][] 
Now that we can actually generate symbolic proof structures in the Wolfram Language, there’s a lot of empirical metamathematics to do—as I’ll discuss in a future post. But given that FindEquationalProof works on arbitrary “equationlike” symbolic relations, it can actually be applied to lots of things—like verifying protocols and policies, for example in popular areas like blockchain.
The Wolfram Knowledgebase grows every single day—partly through systematic data feeds, and partly through new curated data and domains being explicitly added. If one asks what happens to have been added between Version 11.2 and Version 11.3, it’s a slightly strange grab bag. There are 150+ new properties about public companies. There are 900 new named features on Pluto and Mercury. There are 16,000 new anatomical structures, such as nerve pathways. There are nearly 500 new “notable graphs”. There are thousands of new mountains, islands, notable buildings, and other georelated features. There are lots of new properties of foods, and new connections to diseases. And much more.
But in terms of typical everyday use of the Wolfram Knowledgebase the most important new feature in Version 11.3 is the entity prefetching system. The knowledgebase is obviously big, and it’s stored in the cloud. But if you’re using a desktop system, the data you need is “magically” downloaded for you.
Well, in Version 11.3, the magic got considerably stronger. Because now when you ask for one particular item, the system will try to figure out what you’re likely to ask for next, and it’ll automatically start asynchronously prefetching it, so when you actually ask for it, it’ll already be there on your computer—and you won’t have to wait for it to download from the cloud. (If you want to do the prefetching “by hand”, there’s the function EntityPrefetch to do it. Note that if you’re using the Wolfram Language in the cloud, the knowledgebase is already “right there”, so there’s no downloading or prefetching to do.)
The whole prefetching mechanism is applied quite generally. So, for example, if you use Interpreter to interpret some input (say, US state abbreviations), information about how to do the interpretations will also get prefetched—so if you’re using the desktop, the interpretations can be done locally without having to communicate with the cloud.
You’ve been able to send email from the Wolfram Language (using SendMail) for a decade. But starting in Version 11.3, it can use full HTML formatting, and you can embed lots of things in it—not just graphics and images, but also cloud objects, datasets, audio and so on. [Livestreamed design discussion.]
Version 11.3 also introduces the ability to send text messages (SMS and MMS) using SendMessage. For security reasons, though, you can only send to your own mobile number, as given by the value of $MobilePhone (and, yes, obviously, the number gets validated).
The Wolfram Language has been able to import mail messages and mailboxes for a long time, and with MailReceiverFunction it’s also able to respond to incoming mail. But in Version 11.3 something new that’s been added is the capability to deal with live mailboxes.
First, connect to an (IMAP, for now) mail server (I’m not showing the authentication dialog that comes up):
✕ mail = MailServerConnect[] 
Then you can basically use the Wolfram Language as a programmable mail client. This gives you a dataset of current unread messages in your mailbox:
✕ MailSearch[ "fahim">] 
Now we can pick out one of these messages, and we get a symbolic MailItem object, that for example we can delete:
✕ MailSearch[ "fahim">][[1]] 
✕ MailExecute["Delete", %%["MailItem"]] 
Version 11.3 supports a lot of new systemslevel operations. Let’s start with a simple but useful one: remote program execution. The function RemoteRun is basically like Unix rsh: you give it a host name (or IP address) and it runs a command there. The Authentication option lets you specify a username and password. If you want to run a persistent program remotely, you can now do that with RemoteRunProcess, which is the remote analog of the local RunProcess.
In dealing with remote computer systems, authentication is always an issue—and for several years we’ve been building a progressively more sophisticated symbolic authentication framework in the Wolfram Language. In Version 11.3 there’s a new AuthenticationDialog function, which pops up a whole variety of appropriately configured authentication dialogs. Then there’s GenerateSecuredAuthenticationKey—which generates OAuth SecuredAuthenticationKey objects that people can use to authenticate calls into the Wolfram Cloud from the outside.
Also at a systems level, there are some new import/export formats, like BSON (JSONlike binary serialization format) and WARC (web archive format). There are also HTTPResponse and HTTPRequest formats, that (among many other things) you can use to basically write a web server in the Wolfram Language in a couple of lines.
We introduced ByteArray objects into the Wolfram Language quite a few years ago—and we’ve been steadily growing support for them. In Version 11.3, there are BaseEncode and BaseDecode for converting between byte arrays and Base64 strings. Version 11.3 also extends Hash (which, among other things, works on byte arrays), adding various types of hashing (such as double SHA256 and RIPEMD) that are used for modern blockchain and cryptocurrency purposes.
We’re always adding more kinds of data that we can make computable in the Wolfram Language, and in Version 11.3 one addition is system process data, of the sort that you might get from a Unix ps command:
✕ SystemProcessData[] 
Needless to say, you can do very detailed searches for processes with specific properties. You can also use SystemProcesses to get an explicit list of ProcessObject symbolic objects, which you can interrogate and manipulate (for example, by using KillProcess).
✕ RandomSample[SystemProcesses[], 3] 
Of course, because everything is computable, it’s easy to do things like make plots of the start times of processes running on your computer (and, yes, I last rebooted a few days ago):
✕ TimelinePlot[SystemProcessData[][All, "StartTime"]] 
If you want to understand what’s going on around your computer, Version 11.3 provides another powerful tool: NetworkPacketRecording. You may have to do some permissions setup, but then this function can record network packets going through any network interface on your computer.
Here’s just 0.1 seconds of packets going in and out of my computer as I quietly sit here writing this post:
✕ NetworkPacketRecording[.1] 
You can drill down to look at each packet; here’s the first one that was recorded:
✕ NetworkPacketRecording[.1][[1]] 
Why is this interesting? Well, I expect to use it for debugging quite regularly—and it’s also useful for studying computer security, not least because you can immediately feed everything into standard Wolfram Language visualization, machine learning and other functionality.
This is already a long post—but there are lots of other things in 11.3 that I haven’t even mentioned. For example, there’ve been all sorts of updates for importing and exporting. Like much more efficient and robust XLS, CSV, and TSV import. Or export of animated PNGs. Or support for metadata in sound formats like MP3 and WAV. Or more sophisticated color quantization in GIF, TIFF, etc. [Livestreamed design discussions 1 and 2.]
We introduced symbolic Audio objects in 11.0, and we’ve been energetically developing audio functionality ever since. Version 11.3 has made audio capture more robust (and supported it for the first time on Linux). It’s also introduced functions like AudioPlay, AudioPause and AudioStop that control open AudioStream objects.
Also new is AudioDistance, which supports various distance measures for audio. Meanwhile, AudioIntervals can now automatically break audio into sections that are separated by silence. And, in a somewhat different area, $VoiceStyles gives the list of possible voices available for SpeechSynthesize.
Here’s a little new math function—that in this case gives a sequence of 0s and 1s in which every length4 block appears exactly once:
✕ DeBruijnSequence[{0, 1}, 4] 
The Wolfram Language now has sophisticated support for quantities and units—both explicit quantities (like 2.5 kg) and symbolic “quantity variables” (“p which has units of pressure”). But once you’re inside, doing something like solving an equation, you typically want to “factor the units out”. And in 11.3 there’s now a function that systematically does this: NondimensionalizationTransform. There’s also a new mechanism in 11.3 for introducing new kinds of quantities, using IndependentPhysicalQuantity.
Much of the builtin Wolfram Knowledgebase is ultimately represented in terms of entity stores, and in Version 11 we introduced an explicit EntityStore construct for defining new entity stores. Version 11.3 introduces the function EntityRegister, which lets you register an entity store, so that you can refer to the types of entities it contains just like you would refer to builtin types of entities (like cities or chemicals).
Another thing that’s being introduced as an experiment in Version 11.3 is the MongoLink package, which supports connection to external MongoDB databases. We use MongoLink ourselves to manage terabyteandbeyond datasets for things like machine learning training. And in fact MongoLink is part of our largescale development effort—whose results will be seen in future versions—to seamlessly support extremely large amounts of externally stored data.
In Version 11.2 we introduced ExternalEvaluate to run code in external languages like Python. In Version 11.3 we’re experimenting with generalizing ExternalEvaluate to control web browsers, by setting up a WebDriver framework. You can give all sorts of commands, both ones that have the same effect as clicking around an actual web browser, and ones that extract things you can see on the page.
Here’s how you can use Chrome (we support both it and Firefox) to open a webpage, then capture it:
✕ ExternalEvaluate["WebDriverChrome", {"OpenWebPage" > "https://www.wolfram.com", "CaptureWebPage"}]//Last 
Well, this post is getting long, but there’s certainly more I could say. Here’s a more complete list of functions that are new or updated in Version 11.3:
But to me it’s remarkable how much there is that’s in a .1 release of the Wolfram Language—and that’s emerged in just the few months since the last .1 release. It’s a satisfying indication of the volume of R&D that we’re managing to complete—by building on the whole Wolfram Language technology stack that we’ve created. And, yes, even in 11.3 there are a great many new corners to explore. And I hope that lots of people will do this, and will use the latest tools we’ve created to discover and invent all sorts of new and important things in the world.
]]>
While the Wolfram Language has extensive functionality for web operations, this example requires only the most basic: Import. By default, Import will grab the entire plaintext of a page:
✕
url = "http://www.wrh.noaa.gov/forecast/wxtables/index.php?lat=38.02&\ lon=122.13"; 
✕
Import[url] 
Sometimes plaintext scraping is a good start (e.g. for a text analysis workflow). But it’s important to remember there’s a layer of structured HTML telling your browser how to display everything. The elements we use as visual cues can also help the computer organize data, in many cases better and faster than our eyes.
In this case, we are trying to get data from a table. Information presented in tabular format is often stored in list and table HTML elements. You can extract all of the lists and tables on a page using the “Data” element of Import:
✕
data = Import[url, "Data"] 
Now that you have a list of elements, you can sift through to pick out the information you need. For visually inspecting a list like this, syntax highlighting can save a lot of time (and eye strain!). In the Wolfram Language, placing the cursor directly inside any grouping symbol—parentheses, brackets or in this case, curly braces—highlights that symbol, along with its opening/closing counterpart. Examining these sublists is an easy way to get a feel for the structure of the overall list. Clicking inside the first inner brace of the imported data shows that the first element is a list of links from the navigation bar:
This means the list of actual weather information (precipitation, temperature, humidity, wind, etc.) is located in the second element. By successively clicking inside curly braces, you can find the smallest list that contains all the weather data—unsurprisingly, it’s the one that starts with “Custom Weather Forecast Table”:
✕
data[[2]] 
Now use FirstPosition to get the correct list indices:
✕
FirstPosition[data, "Custom Weather Forecast Table"] 
Dropping the final index to go up one level, here’s the full table:
✕
table = data[[2, 2, 1, 2]] 
Now that you have the data, you can do some analysis. On the original webpage, some rows of the table only have one value per day, while the others have four. In the imported data, this translates to differing row lengths—either seven items or 28, with optional row headings:
✕
Length /@ table 
So if you want tomorrow’s temperatures, you can find the row with the appropriate heading and take the first four entries after the heading:
✕
FirstPosition[table, "Temp"] 
✕
table[[10, 2 ;; 5]] 
Conveniently, the temperature data is recognized as numerical, so it’s easy to pass directly to functions. Here is the Mean of all temperatures for the week (I use Rest to omit the row labels that start each list):
✕
N@Mean@Rest@table[[10]] 
And here’s a ListLinePlot of all temperatures for the week:
✕
ListLinePlot[Rest@table[[10]]] 
Interpreter can be used for parsing other data types. For a simple example, take the various weather elements that are reported as percentages:
✕
percents = table[[{5, 11, 13}]] 
These values are currently represented as strings, which aren’t friendly to numerical computations. Applying Interpreter["Percent"] automatically converts each value to a numerical Quantity with percent as the unit:
✕
{precip, clouds, humidity} = Interpreter["Percent"] /@ (Rest /@ percents) 
Now that they’re recognized as percentages, you can plot them together:
✕
labels = First /@ percents 
✕
ListLinePlot[{precip, clouds, humidity}, PlotLabels > labels] 
By extracting the date and time information attached to those values and parsing them with DateObject, you can convert the data into a TimeSeries object:
✕
dates = DateObject /@ Flatten@Table[ table[[2, j]] <> " " <> i, {j, Length@table[[2]]}, {i, table[[9, 2 ;; 5]]}]; 
✕
ts = TimeSeries /@ (Transpose[{dates, #}] & /@ {precip, clouds, humidity}); 
This is perfect for a DateListPlot, which labels the x axis with dates:
✕
DateListPlot[ts, PlotLabels > labels] 
Getting the data you need is easy with the Wolfram Language, but that’s just the beginning of the story! With our integrated data framework, you can do so much more: automate the import process, simplify data access and even create your own permanent data resources.
In my next post, I’ll explore some advanced structuring and cleaning techniques, demonstrating how to create a structured dataset from scraped data.
Some trees are planted in an orchard. What is the maximum possible number of distinct lines of three trees? In his 1821 book Rational Amusement for Winter Evenings, J. Jackson put it this way:
Fain would I plant a grove in rows
But how must I its form compose
With three trees in each row;
To have as many rows as trees;
Now tell me, artists, if you please:
’Tis all I want to know.
Those familiar with tictactoe, threeinarow might wonder how difficult this problem could be, but it’s actually been looked at by some of the most prominent mathematicians of the past and present. This essay presents many new solutions that haven’t been seen before, shows a general method for finding more solutions and points out where current best solutions are improvable.
Various classic problems in recreational mathematics are of this type:
Here is a graphic for the last problem, 11 trees with 16 lines of 3 trees. Subsets[points,{3}] collects all sets of 3 points. Abs[Det[Append[#,1]&/@#]] calculates the triangle area of each set. The sets with area 0 are the lines.
✕
Module[{points, lines}, points = {{1, 1}, {1, 1}, {1, 2 + Sqrt[5]}, {0, 1}, {0, 0}, {0, 1/2 (1 + Sqrt[5])}, {1, 1}, {1, 1}, {1, 2 + Sqrt[5]}, {(1/Sqrt[5]), 1 + 2/Sqrt[5]}, {1/Sqrt[ 5], 1 + 2/Sqrt[5]}}; lines = Select[Subsets[points, {3}], Abs[Det[Append[#, 1] & /@ #]] == 0 &]; Graphics[{EdgeForm[{Black, Thick}], Line[#] & /@ lines, White, Disk[#, .1] & /@ points}, ImageSize > 540]] 
This solution for 12 points matches the known limit of 19 lines, but uses simple integer coordinates and seems to be new. Lines are found with GatherBy and RowReduce, which quickly find a canonical line form for any 2 points in either 2D or 3D space.
✕
Module[{name, root, vals, points, lines, lines3, lines2g}, name = "12 Points in 19 Lines of Three"; points = {{0, 0}, {6, 6}, {6, 6}, {2, 6}, {2, 6}, {6, 6}, {6, 6}, {6, 0}, {6, 0}, {0, 3}, {0, 3}}; lines = Union[Flatten[#, 1]] & /@ GatherBy[Subsets[points, {2}], RowReduce[Append[#, 1] & /@ #] &]; lines3 = Select[lines, Length[#] == 3 &]; lines2g = Select[lines, Length[#] == 2 && (#[[2, 2]]  #[[1, 2]])/(#[[2, 1]]  #[[1, 1]]) == (3/2) &]; Text@Column[{name, Row[{"Point ", Style["\[FilledCircle]", Green, 18], " at infinity"}], Graphics[{Thick, EdgeForm[Thick], Line[Sort[#]] & /@ lines3, Green, InfiniteLine[#] & /@ lines2g, { White, Disk[#, .5] } & /@ points}, ImageSize > 400, PlotRange > {{7, 7}, {7, 7}} ]}, Alignment > Center]] 
This blog goes far beyond those old problems. Here’s how 27 points can make 109 lines of 3 points. If you’d like to see the bestknown solutions for 7 to 27 points, skip to the gallery of solutions at the end. For the math, code and methodology behind these solutions, keep reading.
✕
With[{n = 27}, Quiet@zerosumGraphic[ If[orchardsolutions[[n, 2]] > orchardsolutions[[n, 3]], orchardsolutions[[n, 6]], Quiet@zerotripsymm[orchardsolutions[[n, 4]], Floor[(n  1)/2]]], n, {260, 210} 2]] 
What is the behavior as the number of trees increases? MathWorld’s orchardplanting problem, Wikipedia’s orchardplanting problem and the OnLine Encyclopedia of Integer Sequences sequence A003035 list some of what is known. Let m be the number of lines containing exactly three points for a set of p points. In 1974, Burr, Grünbaum and Sloane (BGS) gave solutions for particular cases and proved the bounds:
Here’s a table.
✕
droppoints = 3; Style[Text@Grid[Transpose[ Drop[Prepend[ Transpose[{Range[7, 28], Drop[#[[2]] & /@ orchardsolutions, 6], {6, 7, 10, 12, 16, 19, 22, 26, 32, 37, 42, 48, 54, 60, 67, 73, 81, 88, 96, 104, 113, 121}, (Floor[# (#  3)/6] + 1) & /@ Range[7, 28], Min[{Floor[#/3 Floor[(#  1)/2]], Floor[(Binomial[#, 2]  Ceiling[3 #/7])/3]}] & /@ Range[7, 28], {2, 2, 3, 5, 6, 7, 9, 10, 12, 15, 16, 18, 20, 23, 24, 26, 28, 30, 32, "?", "?", "?"}, {2, 2, 3, 5, 6, 7, 9, 10, 12, 15, 16, 18, 28, 30, 31, 38, 40, 42, 50, "?", "?", "?"} }], {"points", "maximum known lines of three", "proven upper bound", "BGS lower bound", "BGS upper bound", "4orchard lower bound", "4orchard upper bound"}], droppoints]], Dividers > {{2 > Red}, {2 > Red, 4 > Blue, 6 > Blue}}], 12] 
Terence Tao and Ben Green recently proved that the maximum number of lines is the BGS lower bound most of the time (“On Sets Defining Few Ordinary Lines”), but they did not describe how to get the sporadic exceptions. Existing literature does not currently show the more complicated solutions. For this blog, I share a method for getting elegantlooking solutions for the threeorchard problem, as well as describing and demonstrating the power of a method for finding the sporadic solutions. Most of the embeddings shown in this blog are new, but they all match existing known records.
For a given number of points p, let q = ⌊ (p–1)/2⌋; select the 3subsets of {–q,–q+1,…,q} that have a sum of 0 (mod p). That gives ⌊ (p–3) p/6⌋+1 3subsets. Here are the triples from p=8 to p=14. This number of triples is the same as the lower bound for the orchard problem, which Tao and Green proved was the best solution most of the time.
✕
Text@Grid[Prepend[Table[With[{triples = Select[ Subsets[Range[Floor[(p  1)/2], Ceiling[(p  1)/2]], {3}], Mod[Total[#], p] == 0 &]}, {p, Length[triples], Row[Row[ Text@Style[ ToString[Abs[#]], {Red, Darker[Green], Blue}[[ If[# == p/2, 2, Sign[#] + 2]]], 25  p] & /@ #] & /@ triples, Spacer[1]]}], {p, 8, 14}], {" \!\(\* StyleBox[\"p\",\nFontSlant>\"Italic\"]\)\!\(\* StyleBox[\" \",\nFontSlant>\"Italic\"]\)", " lines ", Row[ {" triples with zero sum (mod \!\(\* StyleBox[\"p\",\nFontSlant>\"Italic\"]\)) with \!\(\* StyleBox[\"red\",\nFontColor>RGBColor[1, 0, 0]]\)\!\(\* StyleBox[\" \",\nFontColor>RGBColor[1, 0, 0]]\)\!\(\* StyleBox[\"negative\",\nFontColor>RGBColor[1, 0, 0]]\), \!\(\* StyleBox[\"green\",\nFontColor>RGBColor[0, 1, 0]]\)\!\(\* StyleBox[\" \",\nFontColor>RGBColor[0, 1, 0]]\)\!\(\* StyleBox[\"zero\",\nFontColor>RGBColor[0, 1, 0]]\) and \!\(\* StyleBox[\"blue\",\nFontColor>RGBColor[0, 0, 1]]\)\!\(\* StyleBox[\" \",\nFontColor>RGBColor[0, 0, 1]]\)\!\(\* StyleBox[\"positive\",\nFontColor>RGBColor[0, 0, 1]]\)"}]}], Spacings > {0, 0}, Frame > All] 
Here’s a clearer graphic for how this works. Pick three different numbers from –8 to 8 that have a sum of zero. You will find that those numbers are on a straight line. The method used to place these numbers will come later.
That’s not the maximum possible number of lines. By moving these points some, the triples that have a modulus17 sum of zero can also be lines. One example is 4 + 6 + 7 = 17.
✕
With[{n = 17}, Quiet@zerosumGraphic[ If[orchardsolutions[[n, 2]] > orchardsolutions[[n, 3]], orchardsolutions[[n, 6]], Quiet@zerotripsymm[orchardsolutions[[n, 4]], Floor[(n  1)/2]]], n, {260, 210} 2]] 
Does this method always give the best solution? No—there are at least four sporadic exceptions. Whether any other sporadic solutions exist is not known.
✕
Grid[Partition[ zerosumGraphic[orchardsolutions[[#, 6]], #, {260, 210}] & /@ {7, 11, 16, 19}, 2]] 
There are also problems with more than three in a row.
Fifteen lines of four points using 15 points is simple enough. RowReduce is used to collect lines, with RootReduce added to make sure everything is in a canonical form.
✕
Module[{pts, lines}, pts = Append[ Join[RootReduce[Table[{Sin[2 Pi n/5], Cos[2 Pi n/5]}, {n, 0, 4}]], RootReduce[ 1/2 (3  Sqrt[5]) Table[{Sin[2 Pi n/5], Cos[2 Pi n/5]}, {n, 0, 4}]], RootReduce[(1/2 (3  Sqrt[5]))^2 Table[{Sin[2 Pi n/5], Cos[2 Pi n/5]}, {n, 0, 4}]]], {0, 0}]; lines = Union[Flatten[#, 1]] & /@ Select[SplitBy[ SortBy[Subsets[pts, {2}], RootReduce[RowReduce[Append[#, 1] & /@ #]] &], RootReduce[RowReduce[Append[#, 1] & /@ #]] &], Length[#] > 3 &]; Graphics[{Thick, Line /@ lines, EdgeForm[{Black, Thick}], White, Disk[#, .05] & /@ pts}, ImageSize > 520]] 
Eighteen points in 18 lines of 4 points is harder, since it seems to require 3 points at infinity. When lines are parallel, projective geometers say that the lines intersect at infinity. With 4 points on each line and each line through 4 points, this is a 4configuration.
✕
Module[{config18, linesconfig18, inf}, config18 = {{0, Root[9  141 #1^2 + #1^4 &, 1]}, {1/4 (21  9 Sqrt[5]), Root[9  564 #1^2 + 16 #1^4 &, 4]}, {1/4 (21 + 9 Sqrt[5]), Root[9  564 #1^2 + 16 #1^4 &, 4]}, {0, 2 Sqrt[3]}, {3, Sqrt[ 3]}, {3, Sqrt[3]}, {0, Sqrt[3]}, {3/ 2, (Sqrt[3]/2)}, {(3/2), (Sqrt[3]/2)}, {1/4 (3 + 3 Sqrt[5]), Root[9  564 #1^2 + 16 #1^4 &, 4]}, {1/4 (9 + 3 Sqrt[5]), Root[225  420 #1^2 + 16 #1^4 &, 1]}, {1/2 (6  3 Sqrt[5]), ( Sqrt[3]/2)}, {0, Root[144  564 #1^2 + #1^4 &, 4]}, {1/2 (21 + 9 Sqrt[5]), Root[9  141 #1^2 + #1^4 &, 1]}, {1/2 (21  9 Sqrt[5]), Root[9  141 #1^2 + #1^4 &, 1]}}; linesconfig18 = SplitBy[SortBy[Union[Flatten[First[#], 1]] & /@ (Transpose /@ Select[ SplitBy[ SortBy[{#, RootReduce[RowReduce[Append[#, 1] & /@ #]]} & /@ Subsets[config18, {2}], Last], Last], Length[#] > 1 &]), Length], Length]; inf = Select[ SplitBy[SortBy[linesconfig18[[1]], RootReduce[slope[Take[#, 2]]] &], RootReduce[slope[Take[#, 2]]] &], Length[#] > 3 &]; Graphics[{Thick, Line /@ linesconfig18[[2]], Red, InfiniteLine[Take[#, 2]] & /@ inf[[1]], Green, InfiniteLine[Take[#, 2]] & /@ inf[[2]], Blue, InfiniteLine[Take[#, 2]] & /@ inf[[3]], EdgeForm[Black], White, Disk[#, .7] & /@ config18}, ImageSize > {520, 460}]] 
If you do not like points at infinity, arrange 3 heptagons of 7 points to make a 4configuration of 21 lines through 21 points. That isn’t the record, since it is possible to make at least 24 lines of 4 with 21 points.
✕
Module[{pts, lines}, 21 linepts = 4 {{0, b}, {0, (b c)/( a  c)}, {2 a, b}, {0, ((b c)/(2 a + c))}, {0, (b c)/( 3 a  c)}, {a, b}, {a, b}, {c, 0}, {(c/3), 0}, {c/3, 0}, {c, 0}, {((3 a c)/(3 a  2 c)), (2 b c)/(3 a  2 c)}, {( a c)/(3 a  2 c), (2 b c)/(3 a  2 c)}, {(3 a c)/(3 a  2 c), ( 2 b c)/(3 a  2 c)}, {(a c)/(5 a  2 c), (2 b c)/( 5 a  2 c)}, {(a c)/(5 a + 2 c), (2 b c)/(5 a  2 c)}, {( a c)/(3 a + 2 c), (2 b c)/( 3 a  2 c)}, {((a c)/(a + 2 c)), ((2 b c)/(a + 2 c))}, {( a c)/(a + 2 c), ((2 b c)/(a + 2 c))}, {((a c)/( 3 a + 2 c)), ((2 b c)/(3 a + 2 c))}, {(a c)/( 3 a + 2 c), ((2 b c)/(3 a + 2 c))}} /. {a > 2, c > 1, b > 1}; lines = Union[Flatten[#, 1]] & /@ Select[SplitBy[ SortBy[Subsets[pts, {2}], RowReduce[Append[#, 1] & /@ #] &], RowReduce[Append[#, 1] & /@ #] &], Length[#] > 3 &]; Graphics[{Line /@ lines, EdgeForm[Black], White, Disk[#, .3] & /@ pts}, ImageSize > 500]] 
The bestknown solution for 25 points has 32 lines, but this solution seems weak due to the low contribution made by the last 3 points. Progressively remove points labeled 25, 24, 23 (near the bottom) to see the bestknown solutions that produce 30, 28, 26 lines.
✕
Module[{pts, lines}, pts = {{0, 1/4}, {0, 3/4}, {1, 1/2}, {1, 1/2}, {1, 1}, {1, 1}, {0, 0}, {0, 3/8}, {(1/3), 1/3}, {1/3, 1/3}, {(1/3), 1/6}, {1/3, 1/ 6}, {(1/5), 2/5}, {1/5, 2/5}, {(1/5), 1/2}, {1/5, 1/ 2}, {1, (1/2)}, {1, (1/2)}, {1, 1}, {1, 1}, {(1/3), 2/ 3}, {1/3, 2/3}, {(1/3), (2/3)}, {1/3, (2/3)}, {9/5, (6/5)}}; lines = SplitBy[SortBy[ (Union[Flatten[#, 1]] & /@ SplitBy[SortBy[Subsets[pts, {2}], RowReduce[Append[#, 1] & /@ #] &], RowReduce[Append[#, 1] & /@ #] &]), Length], Length]; Graphics[{InfiniteLine[Take[#, 2]] & /@ lines[[3]], White, EdgeForm[Black], Table[{Disk[pts[[n]], .04], Black, Style[Text[n, pts[[n]]], 8]}, {n, 1, Length[pts]}] & /@ pts, Black}, ImageSize > {520}]] 
The 27 lines in space are, of course, the Clebsch surface. There are 12 points of intersection not shown, and some lines have 9 points of intersection.
✕
Module[{lines27, clebschpoints}, lines27 = Transpose /@ Flatten[Join[Table[RotateRight[#, n], {n, 0, 2}] & /@ {{{(1/3), (1/3)}, {1, 1}, {1, 1}}, {{0, 0}, {1, (2/3)}, {(2/3), 1}}, {{1/3, 1/ 3}, {1, (1/3)}, {(1/3), 1}}, {{0, 0}, {4/ 9, (2/9)}, {1, 1}}, {{0, 0}, {1, 1}, {4/9, (2/9)}}}, Permutations[#] & /@ {{{30, 30}, {35  19 Sqrt[5], 25 + 17 Sqrt[5]}, {5 + 3 Sqrt[5], 5  9 Sqrt[5]}}/ 30, {{6, 6}, {3 + 2 Sqrt[5], 6  Sqrt[5]}, {7 + 4 Sqrt[5], 8  5 Sqrt[5]}}/6}], 1]; clebschpoints = Union[RootReduce[Flatten[With[ {sol = Solve[e #[[1, 1]] + (1  e) #[[1, 2]] == f #[[2, 1]] + (1  f) #[[2, 2]]]}, If[Length[sol] > 0, (e #[[1, 1]] + (1  e) #[[1, 2]]) /. sol, Sequence @@ {} ]] & /@ Subsets[lines27, {2}], 1]]]; Graphics3D[{{ Sphere[#, .04] & /@ Select[clebschpoints, Norm[#] < 1 &]}, Tube[#, .02] & /@ lines27, Opacity[.4], ContourPlot3D[ 81 (x^3 + y^3 + z^3)  189 (x^2 y + x^2 z + x y^2 + x z^2 + y^2 z + y z^2) + 54 x y z + 126 (x y + x z + y z)  9 (x^2 + y^2 + z^2)  9 (x + y + z) + 1 == 0, {x, 1, 1}, {y, 1, 1}, {z, 1, 1}, Boxed > False][[1]]}, Boxed > False, SphericalRegion > True, ImageSize > 520, ViewAngle > Pi/8]] 
I’m not sure that’s optimal, since I managed to arrange 149 points in 241 lines of 5 points.
✕
Module[{majorLines, tetrahedral, base, points, lines}, majorLines[pts_] := ((Drop[#1, 1] &) /@ #1 &) /@ Select[(Union[Flatten[#1, 1]] &) /@ SplitBy[SortBy[Subsets[(Append[#1, 1] &) /@ pts, {2}], RowReduce], RowReduce], Length[#1] > 4 &]; tetrahedral[{a_, b_, c_}] := Union[{{a, b, c}, {a, b, c}, {b, c, a}, {b, c, a}, {c, a, b}, {c, a, b}, {c, a, b}, {c, a, b}, {b, c, a}, {b, c, a}, {a, b, c}, {a, b, c}}]; base = {{0, 0, 0}, {180, 180, 180}, {252, 252, 252}, {420, 420, 420}, {1260, 1260, 1260}, {0, 0, 420}, {0, 0, 1260}, {0, 180, 360}, {0, 315, 315}, {0, 360, 180}, {0, 420, 840}, {0, 630, 630}, {0, 840, 420}, {140, 140, 420}, {180, 180, 540}, {252, 252, 756}, {420, 420, 1260}}; points = Union[Flatten[tetrahedral[#] & /@ base, 1]]; lines = majorLines[points]; Graphics3D[{Sphere[#, 50] & /@ points, Tube[Sort[#], 10] & /@ Select[lines, Length[#] == 5 &]}, Boxed > False, ImageSize > {500, 460}]] 
The 3D display is based on the following 2D solution, which has 25 points in 18 lines of 5 points. The numbers are barycentric coordinates. To use point 231, separate the digits (2,3,1), divide by the total (2/6,3/6,1/6) and simplify (1/3,1/2,1/6). If the outer triangle has area 1, the point 231 extended to the outer edges will make triangles of area (1/3,1/2,1/6).
✕
Module[{peggpoints, elkpoints, elklines, linecoords}, peggpoints = Sort[#/Total[#] & /@ Flatten[(Permutations /@ {{0, 0, 1}, {0, 1, 1}, {0, 1, 2}, {0, 4, 5}, {1, 1, 2}, {1, 2, 2}, {1, 2, 3}, {1, 2, 6}, {1, 4, 4}, {2, 2, 3}, {2, 2, 5}, {2, 3, 4}, {2, 3, 5}, {2, 5, 5}, {2, 6, 7}, {4, 5, 6}}), 1]]; elkpoints = Sort[#/Total[#] & /@ Flatten[(Permutations /@ {{1, 1, 1}, {0, 0, 1}, {1, 2, 3}, {1, 1, 2}, {0, 1, 1}, {1, 2, 2}, {0, 1, 2}}), 1]]; elklines = First /@ Select[ SortBy[Tally[BaryLiner[#] & /@ Subsets[elkpoints, {2}]], Last], Last[#] > 4 &]; linecoords = Table[FromBarycentrics[{#[[1]], #[[2]]}, tri] & /@ Select[elkpoints, elklines[[n]].# == 0 &], {n, 1, 18}]; Graphics[{AbsoluteThickness[3], Line /@ linecoords, With[{coord = FromBarycentrics[{#[[1]], #[[2]]}, tri]}, {Black, Disk[coord, .12], White, Disk[coord, .105], Black, Style[Text[StringJoin[ToString /@ (# (Max[Denominator[#]]))], coord], 14, Bold]}] & /@ elkpoints}, ImageSize > {520, 450}]] 
A further exploration of this is at Extreme Orchards for Gardner. There, I ask if a selfdual configuration exists where the point set is identical to the line set. I managed to find the following 24point 3configuration. The numbers represent {0,2,–1}, with blue = positive, red = negative and green = zero. In barycentric coordinates, a line {a,b,c} is on point {d,e,f} if the dot product {a,b,c}.{d,e,f}==0. For point {0,2,–1}, the lines {{–1,1,2},{–1,2,4},{0,1,2}} go through that point. Similarly, for line {0,2,–1}, the points {{–1,1,2},{–1,2,4},{0,1,2}} are on that line. The set of 24 points is identical to the set of 24 lines.
✕
FromBarycentrics[{m_, n_, o_}, {{x1_, y1_}, {x2_, y2_}, {x3_, y3_}}] := {m*x1 + n*x2 + (1  m  n)*x3, m*y1 + n*y2 + (1  m  n)*y3}; tri = Reverse[{{Sqrt[3]/2, (1/2)}, {0, 1}, {(Sqrt[3]/2), (1/2)}}]; With[{full = Union[Flatten[{#, RotateRight[#, 1], RotateLeft[#, 1]} & /@ {{1, 0, 2}, {1, 1, 2}, {1, 2, 0}, {1, 2, 1}, {1, 2, 4}, {1, 4, 2}, {0, 1, 2}, {0, 2, 1}}, 1]]}, Graphics[{EdgeForm[Black], Tooltip[Line[#[[2]]], Style[Row[ Switch[Sign[#], 1, Style[ToString[Abs[#]], Red], 0, Style[ToString[Abs[#]], Darker[Green]], 1, Style[ToString[Abs[#]], Blue]] & /@ #[[1]]], 16, Bold]] & /@ Table[{full[[k]], Sort[FromBarycentrics[#/Total[#], tri] & /@ Select[full, full[[k]].# == 0 &]]}, {k, 1, Length[full]}], White, {Disk[FromBarycentrics[#/Total[#], tri], .15], Black, Style[Text[ Row[Switch[Sign[#], 1, Style[ToString[Abs[#]], Red], 0, Style[ToString[Abs[#]], Darker[Green]], 1, Style[ToString[Abs[#]], Blue]] & /@ #], FromBarycentrics[#/Total[#], tri]], 14, Bold]} & /@ full}, ImageSize > 520]] 
With a longer computer run, I found an order27, selfdual 4configuration where the points and lines have the same set of barycentric coordinates.
✕
With[{full = Union[Flatten[{#, RotateRight[#, 1], RotateLeft[#, 1]} & /@ {{2, 1, 4}, {2, 1, 3}, {1, 1, 1}, {1, 2, 0}, {1, 2, 1}, {1, 3, 2}, {1, 4, 2}, {0, 1, 2}, {1, 1, 2}}, 1]]}, Graphics[{EdgeForm[Black], Tooltip[Line[#[[2]]], Style[Row[ Switch[Sign[#], 1, Style[ToString[Abs[#]], Red], 0, Style[ToString[Abs[#]], Darker[Green]], 1, Style[ToString[Abs[#]], Blue]] & /@ #[[1]]], 16, Bold]] & /@ Table[{full[[k]], Sort[FromBarycentrics[#/Total[#], tri] & /@ Select[full, full[[k]].# == 0 &]]}, {k, 1, Length[full]}], White, {Tooltip[Disk[FromBarycentrics[#/Total[#], tri], .08], Style[Row[ Switch[Sign[#], 1, Style[ToString[Abs[#]], Red], 0, Style[ToString[Abs[#]], Darker[Green]], 1, Style[ToString[Abs[#]], Blue]] & /@ #], 16, Bold]]} & /@ full}, ImageSize > 520]] 
And now back to the mathematics of threeinarow, frequently known as elliptic curve theory, but I’ll mostly be veering into geometry.
In the cubic curve given by y = x^{3}, all the triples from {–7,–6,…,7} that sum to zero happen to be on a straight line. The Table values are adjusted so that the aspect ratio will be reasonable.
✕
simplecubic = Table[{x/7, x^3 /343}, {x, 7, 7}]; Graphics[{Cyan, Line[Sort[#]] & /@ Select[Subsets[simplecubic, {3}], Abs[Det[Append[#, 1] & /@ #]] == 0 &], {Black, Disk[#, .07], White, Disk[#, .06], Black, Style[Text[7 #[[1]], #], 16] } & /@ simplecubic}, ImageSize > 520] 
For example, (2,3,–5) has a zerosum. For the cubic curve, those numbers are at coordinates (2,8), (3,27) and (–5,–125), which are on a line. The triple (–∛2, –∛3, ∛2 + ∛3) also sums to zero and the corresponding points also lie on a straight line, but ignore that: restrict the coordinates to integers. With the curve y = x^{3}, all of the integers can be plotted. Any triple of integers that sums to zero is on a straight line.
✕
TraditionalForm[ Row[{Det[MatrixForm[{{2, 8, 1}, {3, 27, 1}, {5, 125, 1}}]], " = ", Det[{{2, 8, 1}, {3, 27, 1}, {5, 125, 1}}]}]] 
We can use the concept behind the cubic curve to make a rotationally symmetric zerosum geometry around 0. Let blue, red and green represent positive, negative and zero values. Start with:
To place the values 3 and 4, variables e and f are needed. The positions of all subsequent points up to infinity are forced.
Note that e and f should not be 0 or 1, since that would cause all subsequent points to overlap on the first five points.
Instead of building around 0, values can instead be reflected in the y = x diagonal to make a mirrorsymmetric zerosum geometry.
Skew symmetry is also possible with the addition of variables (m,n).
The six variables (a,b,c,d,e,f) completely determine as many points as you like with rotational symmetry about (0,0) or mirror symmetry about the line y = x. Adding the variables (m,n) allows for a skew symmetry where the lines and intersect at (0,0). In the Manipulate, move to change (a,b) and to change (c,d). Move horizontally to change e and vertically to change f. For skew symmetry, move to change the placements of and .
✕
Manipulate[ Module[{ halfpoints, triples, initialpoints, pts2, candidate2}, halfpoints = Ceiling[(numberofpoints  1)/2]; triples = Select[Subsets[Range[halfpoints, halfpoints], {3}], Total[#] == 0 &]; initialpoints = rotational /. Thread[{a, b, c, d, e, f} > Flatten[{ab, cd, ef}]]; If[symmetry == "mirror", initialpoints = mirror /. Thread[{a, b, c, d, e, f} > Flatten[{ab, cd, ef}]]]; If[symmetry == "skew", initialpoints = skew /. Thread[{a, b, c, d, e, f, m, n} > Flatten[{ab, cd, ef, mn}]]]; pts2 = Join[initialpoints, Table[{{0, 0}, {0, 0}}, {46}]]; Do[pts2[[ index]] = (LineIntersectionPoint33[{{pts2[[1, #]], pts2[[index  1, #]]}, {pts2[[2, #]], pts2[[index  2, #]]}}] & /@ {2, 1}), {index, 5, 50}]; If[showcurve, candidate2 = NinePointCubic2[First /@ Take[pts2, 9]], Sequence @@ {}]; Graphics[{ EdgeForm[Black], If[showcurve, ContourPlot[Evaluate[{candidate2 == 0}], {x, 3, 3}, {y, 3, 3}, PlotPoints > 15][[1]], Sequence @@ {}], If[showlines, If[symmetry == "mirror", {Black, Line[pts2[[Abs[#], (3  Sign[#])/2 ]] & /@ #] & /@ Select[triples, Not[MemberQ[#, 0]] &], Green, InfiniteLine[ pts2[[Abs[#], (3  Sign[#])/ 2 ]] & /@ #] & /@ (Drop[#, {2}] & /@ Select[triples, MemberQ[#, 0] &])}, {Black, Line[If[# == 0, {0, 0}, pts2[[Abs[#], (3  Sign[#])/2 ]]] & /@ #] & /@ triples}], Sequence @@ {}], If[extrapoints > 0, Table[{White, Disk[pts2[[n, index]], .03]}, {n, halfpoints + 1, halfpoints + extrapoints}, {index, 1, 2}], Sequence @@ {}], Table[{White, Disk[pts2[[n, index]], .08], {Blue, Red}[[index]], Style[Text[n, pts2[[n, index]]] , 12]}, {n, halfpoints, 1, 1}, {index, 1, 2}], If[symmetry != "mirror", {White, Disk[{0, 0}, .08], Green, Style[Text[0, {0, 0}] , 12]}, Sequence @@ {}], Inset[\!\(\* GraphicsBox[ {RGBColor[1, 1, 0], EdgeForm[{GrayLevel[0], Thickness[Large]}], DiskBox[{0, 0}], {RGBColor[0, 0, 1], StyleBox[InsetBox["\<\"1\"\>", {0.05, 0.05}], StripOnInput>False, FontSize>18, FontWeight>Bold]}}, ImageSize>{24, 24}]\), ab], Inset[\!\(\* GraphicsBox[ {RGBColor[1, 1, 0], EdgeForm[{GrayLevel[0], Thickness[Large]}], DiskBox[{0, 0}], {RGBColor[0, 0, 1], StyleBox[InsetBox["\<\"2\"\>", {0.07, 0.05}], StripOnInput>False, FontSize>18, FontWeight>Bold]}}, ImageSize>{24, 24}]\), cd], Inset[\!\(\* GraphicsBox[ {RGBColor[0, 1, 0], EdgeForm[{GrayLevel[0], Thickness[Large]}], DiskBox[{0, 0}], {GrayLevel[0], StyleBox[InsetBox["\<\"ef\"\>", {0, 0}], StripOnInput>False, FontSize>9]}}, ImageSize>{21, 21}]\), ef], If[symmetry == "skew", Inset[\!\(\* GraphicsBox[ {RGBColor[1, 0, 1], EdgeForm[{GrayLevel[0], Thickness[Large]}], DiskBox[{0, 0}], {GrayLevel[0], StyleBox[InsetBox["\<\"mn\"\>", {0, 0}], StripOnInput>False, FontSize>9]}}, ImageSize>{21, 21}]\), mn], Sequence @@ {}]}, ImageSize > {380, 480}, PlotRange > Dynamic[(3/2)^zoom {{2.8, 2.8}  zx/5, {2.5, 2.5}  zy/5}]]], {{ab, {2, 2}}, {2.4, 2.4}, {2.4, 2.4}, ControlType > Locator, Appearance > None}, {{cd, {2, 2}}, {2.4, 2.4}, {2.4, 2.4}, ControlType > Locator, Appearance > None}, {{ef, {.7, .13}}, {2.4, 2.4}, {2.4, 2.4}, ControlType > Locator, Appearance > None}, {{mn, {2.00, 0.5}}, {2.4, 2.4}, {2.4, 2.4}, ControlType > Locator, Appearance > None}, "symmetry", Row[{Control@{{symmetry, "rotational", ""}, {"rotational", "mirror", "skew"}, ControlType > PopupMenu}}], "", "points shown", {{numberofpoints, 15, ""}, 5, 30, 2, ControlType > PopupMenu}, "", "extra points", {{extrapoints, 0, ""}, 0, 20, 1, ControlType > PopupMenu}, "", "move zero", Row[{Control@{{zx, 0, ""}, 10, 10, 1, ControlType > PopupMenu}, " 5", Style["x", Italic]}], Row[{Control@{{zy, 0, ""}, 10, 10, 1, ControlType > PopupMenu}, " 5", Style["y", Italic]}], "", "zoom exponent", {{zoom, 0, ""}, 2, 3, 1, ControlType > PopupMenu}, "", "show these", Row[{Control@{{showlines, True, ""}, {True, False}}, "lines"}], Row[{Control@{{showcurve, False, ""}, {True, False}}, "curve"}], TrackedSymbols :> {ab, cd, ef, mn, zx, zy, symmetry, numberofpoints, extrapoints, zoom}, ControlPlacement > Left, Initialization :> ( Clear[a]; Clear[b]; Clear[c]; Clear[d]; Clear[e]; Clear[f]; Clear[m]; Clear[n]; NinePointCubic2[pts3_] := Module[{makeRow2, cubic2, poly2, coeff2, nonzero, candidate}, If[Min[ Total[Abs[RowReduce[#][[3]]]] & /@ Subsets[Append[#, 1] & /@ pts3, {4}]] > 0, makeRow2[{x_, y_}] := {1, x, x^2, x^3, y, y x, y x^2, y^2, y^2 x, y^3}; cubic2[x_, y_][p_] := Det[makeRow2 /@ Join[{{x, y}}, p]]; poly2 = cubic2[x, y][pts3]; coeff2 = Flatten[CoefficientList[poly2, {y, x}]]; nonzero = First[Select[coeff2, Abs[#] > 0 &]]; candidate = Expand[Simplify[ poly2/nonzero]]; If[Length[FactorList[candidate]] > 2, "degenerate", candidate], "degenerate"]]; LineIntersectionPoint33[{{a_, b_}, {c_, d_}}] := ( Det[{a, b}] (c  d)  Det[{c, d}] (a  b))/Det[{a  b, c  d}]; skew = {{{a, b}, {a m, b m}}, {{c, d}, {c n, d n}}, {{a e m  c (1 + e) n, b e m  d (1 + e) n}, {( a e m + c n  c e n)/(e m + n  e n), (b e m + d n  d e n)/( e m + n  e n)}}, {{a f m  ((1 + f) (a e m  c (1 + e) n))/( e (m  n) + n), b f m  ((1 + f) (b e m  d (1 + e) n))/(e (m  n) + n)}, {( c (1 + e) (1 + f) n + a m (e + e f (1 + m  n) + f n))/( 1 + f (1 + e m (m  n) + m n)), ( d (1 + e) (1 + f) n + b m (e + e f (1 + m  n) + f n))/( 1 + f (1 + e m (m  n) + m n))}}}; rotational = {#, #} & /@ {{a, b}, {c, d}, {c (1 + e)  a e, d (1 + e)  b e}, {c (1 + e) (1 + f) + a (e  (1 + e) f), d (1 + e) (1 + f) + b (e  (1 + e) f)}}; mirror = {#, Reverse[#]} & /@ {{a, b}, {c, d}, {d (1  e) + b e, c (1  e) + a e}, {(c (1  e) + a e) (1  f) + b f, (d (1  e) + b e) (1  f) + a f}};), SynchronousInitialization > False, SaveDefinitions > True] 
In the rotationally symmetric construction, point 7 can be derived by finding the intersection of lines , and .
✕
TraditionalForm[ FullSimplify[{h zerosumgeometrysymmetric[[2, 2]] + (1  h) zerosumgeometrysymmetric[[5, 2]] } /. Solve[h zerosumgeometrysymmetric[[2, 2]] + (1  h) zerosumgeometrysymmetric[[5, 2]] == j zerosumgeometrysymmetric[[3, 2]] + (1  j) zerosumgeometrysymmetric[[4, 2]] , {h, j}][[ 1]]][[1]]] 
The simple cubic had 15 points 7 to 7 producing 25 lines. That falls short of the record 31 lines. Is there a way to get 6 more lines? Notice 6 triples with a sum of 0 modulus 15:
✕
Select[Subsets[Range[7, 7], {3}], Abs[Total[#]] == 15 &] 
We can build up the triangle area matrices for those sets of points. If the determinant is zero, the points are on a straight line.
✕
matrices15 = Append[zerosumgeometrysymmetric[[#, 1]], 1] & /@ # & /@ {{2, 6, 7}, {3, 5, 7}, {4, 5, 6}}; Row[TraditionalForm@Style[MatrixForm[#]] & /@ (matrices15), Spacer[20]] 
Factor each determinant and hope to find a shared factor other than bc–ad, which puts all points on the same line. It turns out the determinants have –e + e^{2} + f – e f + f^{2} – e f^{2} + f^{3} as a shared factor.
✕
Column[FactorList[Numerator[Det[#]]] & /@ matrices15] 
Are there any nice solutions for –e + e^{2} + f – e f + f^{2} – e f^{2} + f^{3} = 0? Turns out letting e=Φ (the golden ratio) allows f = –1.
✕
Take[SortBy[Union[ Table[FindInstance[e + e^2 + f  e f + f^2  e f^2 + f^3 == 0 && e > 0 && f > ff, {e, f}, Reals], {ff, 2, 2, 1/15}]], LeafCount], 6] 
Here’s what happens with base points (a,b) = (1,1), (c,d) = (1,–1) and that value of (e,f).
✕
points15try = RootReduce[zerotripsymm[{1, 1, 1, 1, (1 + Sqrt[5])/2, 1}, 7]]; zerosumGraphic[points15try/5, 15, 1.5 {260, 210}] 
The solution’s convex hull is determined by points 4 and 2, so those points can be moved to make the solution more elegant.
✕
RootReduce[({{w, x}, {y, z}} /. Solve[{{{w, x}, {y, z}}.points15try[[2, 1]] == {1, 1}, {{w, x}, {y, z}}.points15try[[4, 1]] == {1, 1}}][[ 1]]).# & /@ {points15try[[1, 1]], points15try[[2, 1]]}] 
The values for (a,b,c,d) do not need to be exact, so we can find the nearest rational values.
✕
nearestRational[#, 20] & /@ Flatten[{{9  4 Sqrt[5], 5  2 Sqrt[5]}, {1, 1}}] 
That leads to an elegantlooking solution for the 15tree problem. There are 31 lines of 3 points, each a triple that sums to 0 (mod 15).
✕
points15 = RootReduce[zerotripsymm[{1/18, 9/17, 1, 1, (1 + Sqrt[5])/2, 1}, 7]]; zerosumGraphic[points15, 15, 1.5 {260, 210}] 
The 14point version leads to polynomial equation 2e – 2e^{2} – f + e f + e^{} – e f^{2} = 0, which has the nice solution {e>1/2,f> (–1+√17)/4}. A point at infinity is needed for an even number of points with this method.
✕
{{{1, 1}, {1, 1}}, {{1, 1}, {1, 1}}, {{1, 0}, {1, 0}}, {{1/2 (3  Sqrt[17]), 1/4 (1  Sqrt[17])}, {1/2 (3 + Sqrt[17]), 1/4 (1 + Sqrt[17])}}, {{1/4 (5  Sqrt[17]), 1/8 (1 + Sqrt[17])}, {1/4 (5 + Sqrt[17]), 1/8 (1  Sqrt[17])}}, {{1/8 (3 + 3 Sqrt[17]), 1/16 (7 + Sqrt[17])}, {1/8 (3  3 Sqrt[17]), 1/16 (7  Sqrt[17])}}} 
The solution on 15 points can be tweaked to give a match for the 16point, 37line solution in various ways. The is not particularly meaningful here. The last example is done with skew symmetry, even though it seems the same.
✕
Grid[Partition[{zerosumGraphic[ zerotripsymm[{5  2 Sqrt[5], 9  4 Sqrt[5], 1, 1, 1/2 (1 + Sqrt[5]), 1}, 7], 15, {260, 210}], zerosumGraphic[ zerotripsymm[{5  2 Sqrt[5], 9  4 Sqrt[5], 1, 1, 1/2 (1 + Sqrt[5]), 1}, 7], 16, {260, 210}], zerosumGraphic[ zerotripsymm[{1, 1, 1, 1, 3  Sqrt[5], 1/2 (3  Sqrt[5])}, 7], 16, {260, 210}], zerosumGraphic[ RootReduce[ zerotripskew[{0, 1  Sqrt[5], 3 + Sqrt[5], 3 + Sqrt[5], 1 + Sqrt[5], 1/2 (1 + Sqrt[5]), 1/2 (1 + Sqrt[5]), 1/2 (1 + Sqrt[5])}, 7]], 16, {260, 210}]}, 2]] 
The first solution is a special case of the 15solution with an abnormal amount of parallelism, enough to match the sporadic 16point solution. How did I find it?
Here are coordinates for the positive points up to 4 in the mirrorsymmetric and skewsymmetric cases. They quickly get more complicated.
✕
TraditionalForm@ Grid[Prepend[ Transpose[ Prepend[Transpose[First /@ Take[zerosumgeometrymirror, 4]], Range[1, 4]]], {"number", x, y}], Dividers > {{2 > Green}, {2 > Green}}] 
✕
TraditionalForm@ Grid[Prepend[ Transpose[ Prepend[Transpose[ Prepend[First /@ Take[zerosumgeometryskew, 4], {0, 0}]], Range[0, 4]]], {"number", x, y}], Dividers > {{2 > Blue}, {2 > Blue}}] 
Here are coordinates for the positive points up to 7 in the rotationally symmetric case. These are more tractable, so I focused on them.
✕
TraditionalForm@ Grid[Prepend[ Transpose[ Prepend[Transpose[ Prepend[First /@ Take[zerosumgeometrysymmetric, 7], {0, 0}]], Range[0, 7]]], {"number", x, y}], Dividers > {{2 > Red}, {2 > Red}}] 
For 14 and 15 points, the polynomials 2e – 2e^{2} – f + e f + e^{2} f – e f^{2} and –e + e^{2} + f – e f + f^{2} – e f^{2} + f^{3} appeared almost magically to solve the problem. Why did that happen? I have no idea, but it always seems to work. I’ll call these orchardplanting polynomials. It’s possible that they’ve never been used before to produce elegant solutions for this problem, because we would have seen them. Here are the next few orchardplanting polynomials. As a reminder, these are shared factors of the determinants generated by forcing triples modulo p to be lines.
✕
Monitor[TraditionalForm@Grid[Prepend[Table[ With[{subs = Select[Subsets[Range[Floor[n/2], Floor[n/2]], {3}], Mod[ Abs[Total[#]], n ] == 0 && Not[MemberQ[#, (n/2)]] &]}, {n, Length[subs], Select[subs, Min[#] > 0 && Max[#] < 13 && Max[#] < n/2 &], Last[SortBy[ Apply[Intersection, (First[Sort[FullSimplify[{#, #}]]] & /@ First /@ FactorList[Numerator[#]] & /@ Expand[Det[ Append[zerosumgeometrysymmetric[[#, 1]], 1] & /@ #] & /@ Select[subs, Min[#] > 0 && Max[#] < 13 && Max[#] < n/2 &]])], LeafCount]]}], {n, 11, 16}], {"trees", "lines", "triples needing modulus", "orchard planting polynomial"}]], n] 
Here is the major step for the solution of 14 trees. The item showing up in the numerator generated by (3,5,6) happens to be the denominator of item 7 = (3 + 5 + 6)/2.
✕
With[{mat = Append[zerosumgeometrysymmetric[[#, 1]], 1] & /@ {3, 5, 6}}, TraditionalForm[ Row[{Det[MatrixForm[mat]], " = ", Factor[Det[mat]] == 0, "\n compare to ", Expand[Denominator[zerosumgeometrysymmetric[[7, 1, 1]] ]]}]]] 
But I should have expected this. The solution for 18 points is next. The point 9 is at infinity! Therefore, level 9 needs 1/0 to work properly.
✕
zerosumGraphic[zerotripsymm[orchardsolutions[[18, 4]], 8], 18, 2 {260, 210}] 
Here's a contour plot of all the orchardplanting polynomials up to order 28. The number values give the location of a particularly elegant solution for that number of points.
✕
allorchardpolynomials = Table[orchardsolutions[[ff, 5]] == 0, {ff, 11, 27, 2}]; Graphics[{ContourPlot[ Evaluate[allorchardpolynomials], {e, 3/2, 2}, {f, 3/2, 2}, PlotPoints > 100][[1]], Red, Table[Style[Text[n, Take[orchardsolutions[[n, 4]], 2]], 20], {n, 11, 28}]}] 
Recall from the construction that e and f should not be 0 or 1, since that would cause all subsequent points to overlap on the first five points, causing degeneracy. The curves intersect at these values.
We can also plot the locations where the e f values lead to lines of two points having the same slope. Forcing parallelism leads to hundreds of extra curves. Do you see the lowerright corner where the green curve is passing through many black curves? That's the location of the sporadic 16point solution. It's right there!
✕
slope[{{x1_, y1_}, {x2_, y2_}}] := (y2  y1)/(x2  x1); theslopes = {#  1, FullSimplify[ slope[Prepend[ First /@ Take[zerosumgeometrysymmetric, 11], {0, 0}][[#]]]]} & /@ Subsets[Range[ 10], {2}]; sameslope = {#[[2, 1]], #[[1]]} & /@ (Transpose /@ SplitBy[SortBy[{#[[1]], #[[2, 1]] == Simplify[#[[2, 2]]]} & /@ ({#[[1]], Flatten[#[[2]]]} & /@ SortBy[ Flatten[Transpose[{Table[#[[ 1]], {Length[#[[2]]]}], (List @@@ # & /@ #[[ 2]])}] & /@ Select[{#[[1]], Solve[{#[[2, 1]] == #[[2, 2]], d != (b c)/a , e != 0, e != 1, f != 0, f != 1}]} & /@ Take[SortBy[(Transpose /@ Select[Subsets[theslopes, {2}], Length[Union[Flatten[First /@ #]]] == 4 &]), Total[Flatten[#[[1]]]] &], 150], Length[StringPosition[ToString[FullForm[#[[2]]]], "Complex"]] == 0 && Length[#[[2]]] > 0 &], 1], Last]), Last], Last]); Graphics[{Table[ ContourPlot[ Evaluate[sameslope[[n, 1]]], {e, 3/2, 2}, {f, 3/2, 2}, PlotPoints > 50, ContourStyle > Black][[1]], {n, 1, 162}], Red, Table[ContourPlot[ Evaluate[allorchardpolynomials[[n]]], {e, 3/2, 2}, {f, 3/2, 2}, PlotPoints > 50, ContourStyle > Green][[1]], {n, 1, 18}], Tooltip[Point[#], #] & /@ Tuples[Range[6, 6]/4, {2}] }] 
That's my way to find sporadic solutions. The mirror and skew plots have added levels of messiness sufficient to defy my current ability to analyze them.
Is there an easy way to generate these polynomials? I have no idea. Here are plots of their coefficient arrays.
✕
Column[{Text@ Grid[{Range[11, 22], With[{array = CoefficientList[#, {e, f}]}, With[{rule = Thread[Apply[Range, MinMax[Flatten[array]]] > Join[Reverse[ Table[ RGBColor[1, 1  z/Abs[Min[Flatten[array]]], 1  z/Abs[Min[Flatten[array]]]], {z, 1, Abs[Min[Flatten[array]]]}]], {RGBColor[1, 1, 1]}, Table[ RGBColor[1  z/Abs[Max[Flatten[array]]], 1, 1], {z, 1, Abs[Max[Flatten[array]]]}]]]}, ArrayPlot[array, ColorRules > rule, ImageSize > Reverse[Dimensions[array]] {7, 7}, Frame > False ]]] & /@ (#[[5]] & /@ Take[orchardsolutions, {11, 22}])}, Frame > All], Text@Grid[{Range[23, 28], With[{array = CoefficientList[#, {e, f}]}, With[{rule = Thread[Apply[Range, MinMax[Flatten[array]]] > Join[Reverse[ Table[ RGBColor[1, 1  z/Abs[Min[Flatten[array]]], 1  z/Abs[Min[Flatten[array]]]], {z, 1, Abs[Min[Flatten[array]]]}]], {RGBColor[1, 1, 1]}, Table[ RGBColor[1  z/Abs[Max[Flatten[array]]], 1, 1], {z, 1, Abs[Max[Flatten[array]]]}]]]}, ArrayPlot[array, ColorRules > rule, ImageSize > Reverse[Dimensions[array]] {7, 7}, Frame > False ]]] & /@ (#[[5]] & /@ Take[orchardsolutions, {23, 28}])}, Frame > All]}, Alignment > Center] 
✕
Grid[Partition[Table[Quiet@ zerosumGraphic[ If[orchardsolutions[[n, 2]] > orchardsolutions[[n, 3]], orchardsolutions[[n, 6]], Quiet@zerotripsymm[orchardsolutions[[n, 4]], Floor[(n  1)/2]]], n, {260, 210}], {n, 9, 28}], 2]] 
Looking for unsolved problems of the orchardplanting variety? Here are several I suggest:
And if you'd like to explore more recreational mathematics, check out some of the many entries on the Wolfram Demonstrations Project.
]]>The Wolfram Language is essential to many Bridges attendees’ work. It’s used to explore ideas, puzzle out technical details, design prototypes and produce output that controls production machines. It’s applied to sculpture, graphics, origami, painting, weaving, quilting—even baking.
In the many years I’ve attended the Bridges conferences, I’ve enjoyed hearing about these diverse applications of the Wolfram Language in the arts. Here is a selection of Bridges artists’ work.
George Hart is well known for his insanely tangled sculptures based on polyhedral symmetries. Two of his recent works, SNOBall and Clouds, were puzzled out with the help of the Wolfram Language:
This video includes a Wolfram Language animation that shows how the elements of the Clouds sculpture were transformed to yield the vertically compressed structure.
One of Hart’s earliest Wolfram Language designs was for the Millennium Bookball, a 1998 commission for the Northport Public Library. Sixty wooden books are arranged in icosahedral symmetry, joined by cast bronze rings. Here is the Wolfram Language design for the bookball and a photo of the finished sculpture:
One of my favorite Hart projects was the basis of a paper with Robert Hanson at the 2013 Bridges conference: “Custom 3DPrinted Rollers for Frieze Pattern Cookies.” With a paragraph of Wolfram Language code, George translates images to 3Dprinted rollers that emboss the images on, for example, cookie dough:
It’s a brilliant application of the Wolfram Language. I’ve used it myself to make cookieroller presents and rollers for patterning ceramics. You can download a notebook of Hart’s code. Since Hart wrote this code, we’ve added support for 3D printing to the Wolfram Language. You can now send roller designs directly to a printing service or a local 3D printer using Printout3D.
Christopher Hanusa has made a business of selling 3Dprinted objects created exclusively with the Wolfram Language. His designs take inspiration from mathematical concepts—unsurprising given his position as an associate professor of mathematics at Queens College, City University of New York.
Hanusa’s designs include earrings constructed with mesh and region operations:
… a pendant designed with transformed graphics primitives:
… ornaments designed with ParametricPlot3D:
… and a tea light made with ParametricPlot3D, using the RegionFunction option to punch an interesting pattern of perforations into the cylinder:
Hanusa has written about how he creates his designs with the Wolfram Language on his blog, The Mathematical Zorro. You can see all of Hanusa’s creations in his Shapeways shop.
William F. Duffy, an accomplished traditional sculptor, also explores forms derived from parametric equations and cast from largescale resin 3D prints. Many of his forms result from Wolfram Language explorations.
Here, for example, are some of Duffy’s explorations of a fifthdegree polynomial that describes a Calabi–Yau space, important in string theory:
Duffy plotted one instance of that function in Mathematica, 3Dprinted it in resin and made a mold from the print in which the bronze sculpture was cast. On the left is a gypsum cement test cast, and on the right the finished bronze sculpture, patinated with potassium sulfide:
On commission from the Simons Center for Geometry and Physics, Duffy created the object on the left as a bronzeinfused, stainless steel 3D print. The object on the right was created from the same source file, but printed in nylon:
Duffy continues to explore functions on the complex plane as sources for sculptural structures:
You will be able to see more of Duffy’s work, both traditional and mathematical, on his forthcoming website.
Robert Fathauer uses the Wolfram Language to explore diverse phenomena, including fractal structures with negative curvature that are reminiscent of natural forms. This print of such a form was exhibited in the Bridges 2013 art gallery:
Fathauer realizes the ideas he explores in meticulously handcrafted ceramic forms reminiscent of corals and sponges:
One of Fathauer’s Mathematicadesigned ceramic works consisted of 511 cubic elements (!). Here are shots of the Wolfram Language model and its realization, before firing, as a ceramic sculpture:
Unfortunately, in what Fathauer has confirmed was a painful experience, the sculpture exploded in the kiln during firing. But this structure, as well as several other fractal structures designed with the Wolfram Language, is available in Fathauer’s Shapeways shop.
Martin Levin makes consummately crafted models that reveal the structure of our world—the distance, angular and topological relationships that govern the possibilities and impossibilities of 3D space:
What you don’t—or barely—see is where the Wolfram Language has had the biggest impact in his work. The tiny connectors that join the tubular parts are 3D printed from models designed with the Wolfram Language:
Levin is currently designing 3Dprinted modules that can be assembled to make a lostplastic bronze casting of a compound of five tetrahedra:
The finished casting should look something like this (but mirrorreversed):
Henry Segerman explored some of the topics in his engaging book Visualizing Mathematics with 3D Printing with Wolfram Language code. While the forms in the book are explicitly mathematical, many have an undeniable aesthetic appeal. Here are snapshots from his initial explorations of surfaces with interesting topologies…
… which led to these 3Dprinted forms in his Shapeways shop:
His beautiful Archimedean Spire…
… was similarly modeled first with Wolfram Language code:
In addition to mathematical models, Segerman collaborates with Robert Fathauer (above) to produce exotic dice, whose geometry begins as Wolfram Language code—much of it originating from the Wolfram MathWorld entry “Isohedron”:
In addition to constructing immersive virtual reality hyperbolic spaces, Elisabetta Matsumoto turns highpower mathematics into elegant jewelry using the Wolfram Language. This piece, which requires a full screen of mathematical code to describe, riffs on one of the earliest discovered minimal surfaces, Scherk’s second surface:
Continuing the theme of hyperbolic spaces, here’s one of Matsumoto’s Wolfram Language designs, this one in 2D rather than 3D:
You can see Matsumoto’s jewelry designs in her Shapeways shop.
Father and son Koos and Tom Verhoeff have long used the Wolfram Language to explore sculptural forms and understand the intricacies of miter joint geometries and torsion constraints that enable Koos to realize his sculptures. Their work is varied, from tangles to trees to lattices in wood, sheet metal and cast bronze. Here is a representative sample of their work together with the underlying Wolfram Language models, all topics of Bridges conference papers:
Three Families of Mitered Borromean Ring Sculptures
Mitered Fractal Trees: Constructions and Properties
Folded Strips of Rhombuses, and a Plea for the Square Root of 2 : 1 Rhombus
Tom Verhoeff’s YouTube channel has a number of Wolfram Language videos, including one showing how the last of the structures above is developed from a strip of rhombuses.
In 2015, three Verhoeff sculptures were installed in the courtyard of the Mathematikon of Heidelberg University. Each distills one or more mathematical concepts in sculptural form. All were designed with the Wolfram Language:
You can find detailed information about the mathematical concepts in the Mathematikon sculptures in the Bridges 2016 paper “Three Mathematical Sculptures for the Mathematikon.”
Edmund Harriss has published two bestselling thinking person’s coloring books, Patterns of the Universe and Visions of the Universe, in collaboration with Alex Bellos. They’re filled with gorgeous mathematical figures that feed the mind as well as the creative impulse. Edmund created his figures with Mathematica, a tribute to the diversity of phenomena that can be productively explored with the Wolfram Language:
Loe Feijs and Marina Toetters are applying new technology to traditional weaving patterns: puppytooth and houndstooth, or pieddepoule. With Wolfram Language code, they’ve implemented cellular automata whose patterns tend toward and preserve houndstooth patterns:
By adding random elements to the automata, they generate woven fabric with semirandom patterns that allude to houndstooth:
This video describes their houndstooth work. You can read the details in their Bridges 2017 paper, “A Cellular Automaton for Pieddepoule (Houndstooth).”
You can hardly find a more direct translation from mathematical function to artistic expression than Caroline Bowen’s layered Plexiglas works. And yet her craftsmanship and aesthetic choices yield compelling works that transcend mere mathematical models.
The two pieces she exhibited in the 2016 Bridges gallery were inspired by examples in the SliceContourPlot3D documentation (!). All of the pieces pictured here were created using contourplotting functions in Mathematica:
In 2017, Bowen exhibited a similarly layered piece with colors that indicate the real and imaginary parts of the complexvalued function ArcCsch[z^{4}]+Sec[z^{2}] as well as the function’s poles and branch cuts:
Paper sculptor Jeannine Mosely designs some of her origami crease patterns with the Wolfram Language. In some cases, as with these tessellations whose crease patterns require the numerical solution of integrals, the Wolfram Language is essential:
Mosely created these “bud” variations with a parametric design encapsulated as a Wolfram Language function:
If you’d like to try folding your own bud, Mosely has provided a template and instructions.
The design and fabrication of Helaman Ferguson’s giant Umbilic Torus SC sculpture was the topic of a Bridges 2012 paper authored with his wife Claire, “Celebrating Mathematics in Stone and Bronze: Umbilic Torus NC vs. SC.”
The paper details the fabrication of the sculpture (below left), an epic project that required building a gantry robot and carving 144 oneton blocks of sandstone. The surface of the sculpture is textured with a Hilbert curve, a single line that traverses the entire surface, shown here in a photo of an earlier, smaller version of the sculpture (right):
The Hilbert curve is not just surface decoration—it’s also the mark left by the ballhead cutting tool that carved the curved surfaces of the casting molds. The ridges in the surface texture are the peaks left between adjacent sweeps of the cutting tool.
Ferguson attacked the tasks of modeling the Hilbert curve tool path and generating the Gcode that controlled the CNC milling machine that carved the molds with Mathematica:
I too participate in the Bridges conferences, and I use the Wolfram Language nearly every day to explore graphical and sculptural ideas. One of the more satisfying projects I undertook was the basis of a paper I presented at the 2015 Bridges conference, “Algorithmic Quilting,” written in collaboration with Theodore Gray and Nina Paley.
The paper describes an algorithmic method we used to generate a wide variety of singleline fills for quilts. Starting with a distribution of points, we make a graph on the points, extract a spanning tree from it and render a fill by tracing around the tree:
We tested the algorithm by generating a variety of backgrounds for a quilt based on frames of Eadweard Muybridge’s horse motion studies:
Here’s an animation of the frames in the quilt:
Here are four of us, shown as dots, participating in the 2017 Illinois Marathon:
How did the above animation and the indepth look at our performance come about? Read on to find out.
Why do I run? Of course, the expected answer is health. But when I go out for a run, I am really not concerned about my longevity. And quite frankly, given the number of times I have almost been hit by a car, running doesn’t seem to be in my best interest. For me, it is simply a good way to maintain some level of sanity. Also, it is locationindependent. When I travel, I pack an extra pair of running shoes, and I am set. Running is a great way to scope out a new location. Additionally, runners are a very friendly bunch of people. We greet, we chat, we hate on the weather together. And lastly, have you ever been to a race? If so, then you know that the spectator race signs are hilarious, often politically incorrect and Rrated.
I started running longer distances in 2014. Since then, I have completed eight marathons, one of which was the 2015 Bank of America Chicago Marathon. After completing that race, I wrote a blog post analyzing the runner dataset and looking at various aspects of the race.
Since then, we have shifted focus to the Illinois Marathon here in Champaign. While Wolfram Research is an international company, it also makes sense for us to engage in our local community.
The Illinois Marathon does a great job tying together our twin cities of Champaign and Urbana. Just have a look at the map: starting in close proximity to the State Farm Center, the runners navigate across the UIUC campus, through both downtown areas, various residential neighborhoods and major parks for a spectacular finish on Zuppke Field inside Memorial Stadium.
Since its inception in 2009, the event has doubled the number of runners and races offered, as well as sponsors and partners involved. By attracting a large number of people traveling to Champaign and Urbana for this event, it has quite an economic impact on our community. This is also expressed in the amount of charitable givings raised every year.
As you can imagine, here at Wolfram we were very interested in doing a partnership with the marathon involving some kind of data crunching. Over the summer of 2017, we received the full registration dataset to work with. We applied the 10step process described by Stephen Wolfram in this blog post.
We first import a simple spreadsheet.
✕
raw = Import[ "/Users/eilas/Desktop/Work/Marathon/ILMarathon2017/Marathon_\ Results_Modified.csv", "Numeric" > False]; 
The raw table descriptions look as follows:
✕
header = raw[[1]] 
But it’s more convenient to represent the raw data as key>value pairs:
✕
fullmarathon = AssociationThread[header > #] & /@ Rest[raw]; fullmarathon[[1]] 
Wherever possible, these data points should be aligned with entities in the Wolfram Language. This not only allows for a consistent representation, but also gives access to all of the data in the Wolfram Knowledgebase for those items if desired later.
Interpreter is a very powerful tool for such purposes. It allows you to parse any arbitrary string as a particular entity type, and is often the first step in trying to align data. As an example, let’s align the given location information.
✕
allLocations2017 = Union[{"CITY", "STATE", "COUNTRY"} /. fullmarathon]; 
Here is a random example.
✕
locationExample = RandomChoice[allLocations2017] 
✕
Interpreter["City"][StringJoin[StringRiffle[locationExample]]] 
In most cases, this works without a hitch. But some location information may not be what the system expects. Participants may have specified suburbs, neighborhoods, unincorporated areas or simply made a typo. This can make an automatic interpretation impossible. Thus, we need to be prepared for other contingencies. From the same dataset, let’s look at this case:
✕
problemExample = {"O Fallon", "IL", "United States"}; 
✕
Interpreter["City"][StringJoin[StringRiffle[problemExample]]] 
We can fall back to a contingency in such a case by making use of the provided postal code 62269.
✕
With[{loc = Interpreter["Location"]["62269"]}, GeoNearest["City", loc]][[1]] 
As you can see, we do know of the city, but the initial interpretation failed due to a missing apostrophe. In comparison, this would have worked just fine:
✕
Interpreter["City"][ StringJoin[StringRiffle[{"O'Fallon", "IL", "United States"}]]] 
The major piece of information that runners are interested in is their split times. The Illinois Marathon records the clock and net times at six split distances: start, 10 kilometers, 10 miles, 13.1 miles (halfmarathon distance), 20 miles and 26.2 miles (full marathon distance).
✕
random20MTime = RandomChoice["20 MILE NET TIME" /. fullmarathon] 
These are given as a list of three colonseparated numbers, which we want to represent as Wolfram Language Quantity objects.
✕
Quantity[MixedMagnitude[ FromDigits /@ StringSplit[random20MTime, ":"]], MixedUnit[{"Hours", "Minutes", "Seconds"}]] 
As with the Interpreter mentioned before, we also have to be careful in interpreting the recorded times. For the halfmarathon split and longer distances, even the fastest runner needs at least an hour. Thus, we know “xx: yy: zz” always refers to “hours: minutes: seconds”. But for the shorter distances 10 kilometers and 10 miles, this might be “minutes: seconds: milliseconds”.
✕
random10KTime = RandomChoice["10K NET TIME" /. fullmarathon] 
This is then incorrect.
✕
Quantity[MixedMagnitude[ FromDigits /@ StringSplit[random10KTime, ":"]], MixedUnit[{"Hours", "Minutes", "Seconds"}]] 
No runner took more than two days to finish a 10kilometer distance. Logic must be put in to verify the values before returning the final Quantity objects. This is the correct interpretation:
✕
Quantity[MixedMagnitude[ FromDigits /@ StringSplit[random10KTime, ":"]], MixedUnit[{"Minutes", "Seconds", "Milliseconds"}]] 
Once the data has been cleaned up, it’s just a matter of creating an Association of key>value pairs. An example piece of data for one runner shows the structure:
We did not just arrange the dataset by runner, but by division as well. The divisions recognized by most marathons are as follows:
✕
{"Female19AndUnder", "Female20To24", "Female25To29", "Female30To34", \ "Female35To39", "Female40To44", "Female45To49", "Female50To54", \ "Female55To59", "Female60To64", "Female65To69", "Female70To74", \ "Male19AndUnder", "Male20To24", "Male25To29", "Male30To34", \ "Male35To39", "Male40To44", "Male45To49", "Male50To54", "Male55To59", \ "Male60To64", "Male65To69", "Male70To74", "Male75To79", \ "Male80AndOver", "FemaleOverall", "FemaleMaster", "MaleOverall", \ "MaleMaster"} 
For each of these divisions, we included information about the minimum, maximum and mean running times. Since this marathon is held on a flat course and is thus fastpaced, we also added each division’s Boston Marathon–qualifying standard, and information about the runners’ qualifications.
With the data cleaned up and processed, it’s now simple to construct an EntityStore so that the data can be used in the EntityValue framework in the Wolfram Language. It’s mainly just a matter of attaching metadata to the properties so that they have displayfriendly labels.
✕
EntityStore[ {"ChristieClinicMarathon2017" > < "Label" > "Christie Clinic Marathon 2017 participant", "LabelPlural" > "Christie Clinic Marathon 2017 participants", "Entities" > processed, "Properties" > < "BibNumber" > <"Label" > "bib number">, "Event" > <"Label" > "event">, "LastName" > <"Label" > "last name">, "FirstName" > <"Label" > "first name">, "Name" > <"Label" > "name">, "Label" > <"Label" > "label">, "City" > <"Label" > "city">, "State" > <"Label" > "state">, "Country" > <"Label" > "country">, "ZIP" > <"Label" > "ZIP">, "ChristieClinic2017Division" > <"Label" > "Christie Clinic 2017 division">, "Gender" > <"Label" > "gender">, "PlaceDivision" > <"Label" > "place division">, "PlaceGender" > <"Label" > "place gender">, "PlaceOverall" > <"Label" > "place overall">, "Splits" > <"Label" > "splits">> >, "ChristieClinic2017Division" > < "Label" > "Christie Clinic 2017 division", "LabelPlural" > "Christie Clinic 2017 divisions", "Entities" > divTypeEntities, "Properties" > <"Label" > <"Label" > "label">, "Mean" > <"Label" > "mean net time">, "Min" > <"Label" > "min net time">, "Max" > <"Label" > "max net time">, "BQStandard" > <"Label" > "Boston qualifying standard">, "BeatBQ" > <"Label" > "beat Boston qualifying standard">, "NumberBeat" > <"Label" > "count beat Boston qualifying standard">, "RangeBQ" > <"Label" > "within range Boston qualifying standard">, "NumberRange" > <"Label" > "count within range Boston qualifying standard">, "OutsideBQ" > <"Label" > "outside range Boston qualifying standard">, "NumberOutside" > <"Label" > "count outside range Boston qualifying standard">> >}] 
In addition to creating the entity store, the split times also give us an estimate of a runner’s position along the course as the race progresses. Thus we know the distribution of all runners throughout the race course. We took this information and plotted the runner density for each minute of an eighthour race, and combined the frames into a movie.
It would be interesting to see how a single runner compares to the entire field. Obviously we don’t want to make a movie for 1,000+ runners and 500,000 movies for all possible pairs of runners. Instead, we utilized the fact that each runner follows a twodimensional path in the viewing plane perpendicular to the line going from the viewpoint to the center of the map. We calculated these 2D runner paths and superimposed them over the original movie frames. Since before exporting the frames are all Graphics3D expressions in the Wolfram Language, this worked like a charm. We created the one movie to run them all.
Now we need make the data available to the general public in an easily accessible way. An obvious choice is the use of the Wolfram Cloud. The entity store, the runner position data and the density movie are easily stored in our cloud. And with some magic from my terrific coworkers, we were able to combine it all into this amazing microsite.
By default, the movie is shown. Upon a user submitting a specific bib number, the movie is overlaid with individual runner information. Additionally, we are accessing all information stored about this specific runner and their division.
More information about the development of Wolfram microsites can be found here.
Besides the microsite, there are many interesting computations that can be performed that surround the concept of a marathon. I have explored a few of these below.
To give you an idea of the size of the event, let’s look at a few random numbers associated with the marathon weekend. Luckily, WolframAlpha has something to say about all of these.
One thousand two hundred seventeen runners finished the full marathon in 2017. This equals a total of 31,885.4 miles, which is comparable to 2.4 times the length of the Great Wall of China, or the length of 490,000 soccer fields.
✕
WolframAlpha["31885.4 miles", {{"ComparisonAsLength", 1}, "Content"}] 
✕
WolframAlpha["how many soccer fields stretch 31885.4 miles", \ {{"ApproximateResult", 1}, "Content"}] 
The marathon would literally not have ever happened had it not been for Walter Hunter inventing the safety pin back in 1849. About 80,000 of them were used during the weekend to keep bib numbers in place.
✕
WolframAlpha["safety pin", {{"BasicInformation:InventionData", 1}, "Content"}] 
The runners ate 1,600 oranges and 15,000 bananas, and drank 9,600 gallons of water and 1,920 gallons of Gatorade along the race course. WolframAlpha will tell you that 1,600 oranges are enough to fill two bathtubs:
✕
WolframAlpha["How many oranges fit in a bathtub?", \ {{"ApproximateResult", 1}, "Content"}] 
… and contain an astounding 20 kilograms of sugar:
✕
WolframAlpha["sugar in 1,600 oranges", {{"Result", 1}, "Content"}] 
And trust me: 20 miles into the race while questioning all your life choices, a sweet orange slice will fix any problem. But let’s get to the finish line: here the runners finished another 800 pounds of pasta, 1,100 pizzas and another 32,600 bottles of water. The pasta and pizza provided a combined 1.8×10^{6} dietary calories:
✕
WolframAlpha["calories in 800 lbs of pasta and 1100 pizzas", \ {{"Result", 1}, "Content"}] 
But we are not done yet. The theme of the 2017 Illinois Marathon was the 150th birthday of the University of Illinois. Ever tried to pronounce “sesquicentennial”? Going above and beyond, the race administration decided to provide the runners with 70 birthday sheet cakes—each 18×24 inches. Thanks to the folks working at the Meijer bakery, we came to find out that each such cake contains 21,340 calories, totaling close to 1.5 million calories!
✕
Table[WolframAlpha[ "70*21340 food calories", {{"Comparison", j}, "Content"}], {j, 2}] // Column 
Remember the 15,000 bananas I mentioned just a few moments ago? Turns out that their calorie count is comparable to that of the sheet cakes. That might make for a difficult discussion with a child whether “to sheet cake” or “to banana.”
✕
WolframAlpha["calories in 15,000 bananas", {{"Result", 1}, "Content"}] 
What can one do with all those calories? You did just participate in a race, and should be able to splurge a bit on food. Consider a male person weighing 159 pounds running a marathon distance at a nineminutespermile pace. He burns roughly 3,300 calories.
✕
WolframAlpha["Calories burned running at pace 9 min/mi for 26.2 \ miles", IncludePods > "MetabolicProperties", AppearanceElements > {"Pods"}] 
Though not recommended, you could have 32 guiltfree beers that are typically offered after a marathon race, or 17 servings of 2×2inch pieces of sheet cake.
✕
CALORIES PER BEER 
✕
N[\!\(\* NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "3339 food calories/(21340 food calories/108)", Typeset`boxes$$ = RowBox[{ TemplateBox[{ "3339", "\"Cal\"", "dietary Calories", "\"LargeCalories\""}, "Quantity", SyntaxForm > Mod], "/", RowBox[{"(", RowBox[{ TemplateBox[{ "21340", "\"Cal\"", "dietary Calories", "\"LargeCalories\""}, "Quantity", SyntaxForm > Mod], "/", "108"}], ")"}]}], Typeset`allassumptions$$ = {}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, Typeset`querystate$$ = { "Online" > True, "Allowed" > True, "mparse.jsp" > 0.709614`6.302567168615541, "Messages" > {}}}, DynamicBox[ToBoxes[ AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache>{221., {10., 18.}}, TrackedSymbols:>{ Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues:>{}, UndoTrackedVariables:>{Typeset`open$$}], BaseStyle>{"Deploy"}, DeleteWithContents>True, Editable>False, SelectWithContents>True]\)] 
Did I mention weather? Weather in Champaign is an unwelcome participant: one who does not pay a race fee, is constantly in everyone’s way, makes up its mind lastminute, does what it wants and unleashes full force. Though 2017 turned out fine, let’s look at WeatherData for the 2016 and 2015 race weekends.
Last year, the rain set in with the start of the race, lasted through the entire event and left town when the race was over. I was drenched before even crossing the starting line.
✕
Table[WolframAlpha[ "Weather Champaign 4/30/2016", {{"WeatherCharts:WeatherData", k}, "Content"}], {k, 2, 3}] // ColumnY 
But that wasn’t the worst we had seen: in 2015, a thunderstorm descended on this town while the race was ongoing. Thus, the Illinois Marathon is one of the few marathons that actually had to get canceled midrace.
As I mentioned at the very beginning, the runners here at Wolfram Research are a tough crowd, and weather won’t deter us. If you feel inspired and would like to see yourself in a future version of the Marathon Viewer, this is the place to start: Illinois Marathon registration.