# The Evolution of Parallel Computing with *Mathematica*

In the eighties I attended a scientific presentation about a rather cumbersome way to parallelize one of the symbolic computation systems in existence at that time and quickly realized how much more elegantly I could bring parallelism to *Mathematica*, thanks to its symbolic communication protocol, *MathLink*. This protocol allowed me to exchange not only data but also programs between concurrently running *Mathematica* kernels.

The result was a package, written entirely in *Mathematica*, called *Parallel Computing Toolkit*. At a time when parallel computing meant big expensive machines, FORTRAN, and batch jobs, it was quite satisfying to experiment with different parallel paradigms from an interactive *Mathematica* notebook, with a couple of machines on a local network doing the computations, and be able to do parallel functional programming and work with symbolic expressions and arbitrary-precision arithmetic in parallel. I got a lot of surprised reactions from people who thought that parallelization is this big complicated thing, requiring supercomputers and large funds, and rather large problems, to be worthwhile. The truth is, most problems people solve are easy to parallelize.

In the meantime the landscape of parallel computers has stabilized and evolved into three architectures: multicore machines, managed clusters, and ad hoc networks of PCs. *Mathematica* works the same on all these, but the way one finds resources and launches processes is rather different; *Mathematica* is great for interfacing with existing environments, and with some additional Java code it is now straightforward to use in all three architectures.

In reaction to the widespread availability of multicore machines Wolfram Research decided to include the features of my *Parallel Computing Toolkit* in every copy of *Mathematica*. At this time (for Version 7) we also overhauled the design of the parallel commands.

As a result, *Mathematica* is now aware of the number of processor cores of the computer it runs on, and uses them automatically when needed.

As soon as you use one of the parallel commands—to evaluate the elements of a table in parallel, say—*Mathematica* launches one extra kernel on each core and distributes the work. To convince ourselves that the computation did in fact happen on these extra kernels, we can just ask for a calculation that tells us where each element was computed.

Each running kernel has a unique ID value that can help in scheduling for more complicated distributed algorithms.

This week’s release of grid*Mathematica* Server is another important step toward hassle-free parallel computing. In the early days one had to collect host names of all available machines in a network and use remote login techniques to launch *Mathematica* on those. Nowadays the grid*Mathematica* installations on your network will advertise themselves, just like your printers, multimedia players, and other shared resources.

The available machines will show up in the control panel, and you can simply select the ones you want to use. Here, my small network consists of two dual-core machines, of which one is available for use.

Now, I have a total of six kernels available.

*Mathematica* is also the best tool to analyze the performance of a parallel computation. Here we measure the basic latency of our two remote kernels. The latency is simply the round-trip time for a trivial calculation.

Not all computations benefit from parallelization. For example, it takes the kernel less time to compute `Sin[1.0]` than it takes it to send this command to another kernel and receive the result.

The new parallel status window uses dynamic updating to display basic performance data after each parallel computation. It can show you at a glance the effect of scheduling for uneven problems. In the first run, we schedule a range of primality tests individually onto two available kernels. Both of these kernels perform a fair share of the work, as can be seen by a snapshot of the status window.

Now we schedule one half of the tests on each kernel up front. One of them is unlucky and gets all the hard cases (the times for primality tests vary wildly), so the other kernel sits mostly idle—not what you want in a parallel computation.

You can also see that the time spent in the master kernel, which performs all the scheduling and communication, is a bit higher in the first case; this is the price you pay for the finer scheduling, well worth it in this case (but not always).

*Mathematica* for your multicore desktop PC, and grid*Mathematica* Server for all other computers on your network, give you an easy-to-use, powerful, and interactive system for parallel computation. Almost twenty years after first thinking about parallelism in *Mathematica*, my early developments are now a standard part of *Mathematica*, an increasingly comprehensive system for anything you might want to calculate on any computers available.