#
10 Tips for Writing Fast *Mathematica* Code

December 7, 2011 — Jon McLoone, Director, Technical Communication & Strategy

When people tell me that *Mathematica* isn’t fast enough, I usually ask to see the offending code and often find that the problem isn’t a lack in *Mathematica*‘s performance, but sub-optimal use of *Mathematica*. I thought I would share the list of things that I look for first when trying to optimize *Mathematica* code.

**1. Use floating-point numbers if you can, and use them early.**

Of the most common issues that I see when I review slow code is that the programmer has inadvertently asked *Mathematica* to do things more carefully than needed. Unnecessary use of exact arithmetic is the most common case.

In most numerical software, there is no such thing as exact arithmetic. 1/3 is the same thing as 0.33333333333333. That difference can be pretty important when you hit nasty, numerically unstable problems, but in the majority of tasks, floating-point numbers are good enough and, importantly, much faster. In *Mathematica* any number with a decimal point and less than 16 digits of input is automatically treated as a machine float, so always use the decimal point if you want speed ahead of accuracy (e.g. enter a third as 1./3.). Here is a simple example where working with floating-point numbers is nearly 50.6 times faster than doing the computation exactly and then converting the result to a decimal afterward. And in this case it gets the same result.

The same is true for symbolic computation. If you don’t care about the symbolic answer and are not worried about stability, then substitute numerical values as soon as you can. For example, solving this polynomial symbolically before substituting the values in causes *Mathematica* to produce a five-page-long intermediate symbolic expression.

But do the substitution first, and `Solve` will use fast numerical methods.

When working with lists of data, be consistent in your use of reals. It only takes one exact value to cause the whole dataset to have to be held in a more flexible but less efficient form.

**2. Learn about Compile…**

The `Compile` function takes *Mathematica* code and allows you to pre-declare the types (real, complex, etc.) and structures (value, list, matrix, etc.) of input arguments. This takes away some of the flexibility of the *Mathematica* language, but freed from having to worry about “What if the argument was symbolic?” and the like, *Mathematica* can optimize the program and create a byte code to run on its own virtual machine. Not everything can be compiled, and very simple code might not benefit, but complex low-level numerical code can get a really big speedup.

Here is an example:

Using `Compile` instead of `Function` makes the execution over 80 times faster.

But we can go further by giving `Compile` some hints about the parallelizable nature of the code, getting an even better result.

On my dual-core machine I get a result 150 times faster than the original; the benefit would be even greater with more cores.

Be aware though that many *Mathematica* functions like `Table`, `Plot`, `NIntegrate`, and so on automatically compile their arguments, so you won’t see any improvement when passing them compiled versions of your code.

**2.5. …and use Compile to generate C code.**

Furthermore, if your code is compilable, then you can also use the option `CompilationTarget`->“`C`” to generate C code, call your C compiler to compile it to a DLL, and link the DLL back into *Mathematica*, all automatically. There is more overhead in the compilation stage, but the DLL runs directly on your CPU, not on the *Mathematica* virtual machine, so the results can be even faster.

**3. Use built-in functions.**

*Mathematica* has a lot of functions. More than the average person would care to sit down and learn in one go. So it is not surprising that I often see code where someone has implemented some operation without having realized that *Mathematica* already knows how to do it. Not only is it a waste of time re-implementing work that is already done, but our guys are paid to worry about what the best algorithms are for different kinds of input and how to implement them efficiently, so most built-in functions are really fast.

If you find something close-but-not-quite-right, then check the options and optional arguments; often they generalize functions to cover many specialized uses or abstracted applications.

Here is such an example. If I have a list of a million 2×2 matrices that I want to turn into a list of a million flat lists of 4 elements, the conceptually easiest way might be to `Map` the basic `Flatten` operation down the list of them.

But `Flatten` knows how to do this whole task on its own when you specify that levels 2 and 3 of the data structure should be merged and level 1 be left alone. Specifying such details might be comparatively fiddly, but staying within `Flatten` to do the whole flattening job turns out to be nearly 4 times faster than re-implementing that sub-feature yourself.

So rememberâ€”do a search in the Help menu before you implement anything.

**4. Use Wolfram Workbench.**

*Mathematica* can be quite forgiving of some kinds of programming mistakesâ€”it will proceed happily in symbolic mode if you forget to initialize a variable at the right point and doesn’t care about recursion or unexpected data types. That’s great when you just need to get a quick answer, but it will also let you get away with less than optimal solutions without realizing it.

*Workbench* helps in several ways. First it lets you debug and organize large code projects better, and having clean, organized code should make it easier to write good code. But the key feature in this context is the profiler that lets you see which lines of code used up the time, and how many times they were called.

Take this example, a truly horrible way (computationally speaking) to implement `Fibonacci` numbers. If you didn’t think about the consequences of the double recursion, you might be surprised by the 22 seconds it takes to evaluate `fib[35]` (about the same time it takes the built-in function to calculate all 208,987,639 digits of `Fibonacci[1000000000]` [see tip 3]).

Running the code in the profiler reveals the reason. The main rule is invoked 9,227,464 times, and the `fib[1]` value is requested 18,454,929 times.

Being told what your code really does, rather than what you thought it would do, can be a real eye-opener.

**5. Remember values that you will need in the future.**

This is good programming advice in any language. The *Mathematica* construct that you will want to know is this:

It saves the result of calling `f` on any value, so that if it is called again on the same value, *Mathematica* will not need to work it out again. You are trading speed for memory here, so it isn’t appropriate if your function is likely to be called for a huge number of values, but rarely the same ones twice. But if the possible input set is constrained, this can really help. Here it is rescuing the program that I used to illustrate tip 3. Change the first rule to this:

And it becomes immeasurably fast, since `fib[35]` now only requires the main rule to be evaluated 33 times. Looking up previous results prevents the need to repeatedly recurse down to `fib[1]`.

**6. Parallelize.**

An increasing number of *Mathematica* operations will automatically parallelize over local cores (most linear algebra, image processing, and statistics), and, as we have seen, so does `Compile` if manually requested. But for other operations, or if you want to parallelize over remote hardware, you can use the built-in parallel programming constructs.

There is a collection of tools for this, but for very independent tasks, you can get quite a long way with just `ParallelTable`, `ParallelMap`, and `ParallelTry`. Each of these automatically takes care of communication, worker management, and collection of results. There is some overhead for sending the task and retrieving the result, so there is a trade-off of time gained versus time lost. Your *Mathematica* comes with four compute kernels, and you can scale up with grid*Mathematica* if you have access to additional CPU power. Here, `ParallelTable` gives me double the performance, since it is running on my dual-core machine. More CPUs would give a better speedup.

Anything that *Mathematica* can do, it can also do in parallel. For example, you could send a set of parallel tasks to remote hardware, each of which compiles and runs in C or on a GPU.

**6.5. Think about CUDALink and OpenCLLink.**

If you have GPU hardware, there are some really fast things you can do with massive parallelization. Unless one of the built-in CUDA-optimized functions happens to do what you want, you will need to do a little work, but the *CUDALink* and *OpenCLLink* tools automate a lot of the messy details for you.

**7. Use Sow and Reap to accumulate large amounts of data (not AppendTo).**

Because of the flexibility of *Mathematica* data structures, `AppendTo` can’t assume that you will be appending a number, because you might equally append a document or a sound or an image. As a result, `AppendTo` must create a fresh copy of all of the data, restructured to accommodate the appended information. This makes it progressively slower as the data accumulates. (And the construct `data=Append[data,value]` is the same as `AppendTo`.)

Instead use `Sow` and `Reap`. `Sow` throws out the values that you want to accumulate, and `Reap` collects them and builds a data object once at the end. The following are equivalent:

**8. Use Block or With rather than Module.**

`Block`, `With`, and `Module` are all localization constructs with slightly different properties. In my experience, `Block` and `Module` are interchangeable in at least 95% of code that I write, but `Block` is usually faster, and in some cases `With` (effectively `Block` with the variables in a read-only state) is faster still.

**9. Go easy on pattern matching.**

Pattern matching is great. It can make complicated tasks easy to program. But it isn’t always fast, especially the fuzzier patterns like `BlankNullSequence` (usually written as “___”), which can search long and hard through your data for patterns that you, as a programmer, might already know will never be there. If execution speed matters, use tighter patterns, or none at all.

As an example, here is a rather neat way to implement a bubble sort in a single line of code using patterns:

Conceptually neat, but slow compared to this procedural approach that I was taught when I first learned programming:

Of course in this case you should use the built-in function (see tip 3), which will use better sorting algorithms than bubble sort.

**10. Try doing things differently.**

One of *Mathematica*‘s great strengths is that it can tackle the same problem in different ways. It allows you to program the way you think, as opposed to reconceptualizing the problem for the style of the programming language. However, conceptual simplicity is not always the same as computational efficiency. Sometimes the easy-to-understand idea does more work than is necessary.

But another issue is that because special optimizations and smart algorithms are applied automatically in *Mathematica*, it is often hard to predict when something clever is going to happen. For example, here are two ways of calculating factorial, but the second is over 10 times faster.

Why? You might guess that the `Do` loop is slow, or all those assignments to *temp* take time, or that there is something else “wrong” with the first implementation, but the real reason is probably quite unexpected. `Times` knows a clever binary splitting trick that can be used when you have a large number of integer arguments. It is faster to recursively split the arguments into two smaller products, (1*2*…*32767)*(32768*…*65536), rather than working through the arguments from first to last. It still has to do the same number of multiplications, but fewer of them involve very big integers, and so, on average, are quicker to do. There are lots of such pieces of hidden magic in *Mathematica*, and more get added with each release.

Of course the best way here is to use the built-in function (tip 3 again):

*Mathematica* is capable of superb computational performance, and also superb robustness and accuracy, but not always both at the same time. I hope that these tips will help you to balance the sometimes conflicting needs for rapid programming, rapid execution, and accurate results.

Download this post as a Computable Document Format (CDF) file.

*All timings use a Windows 7 64-bit PC with 2.66 GHz Intel Core 2 Duo and 6 GB RAM.*

## 42 Comments

Excellent tutorial on efficient programming…

“Here is a simple example where working with floating-point numbers is nearly 40 times faster than doing the computation exactly and then converting the result to a decimal afterward. And in this case it gets the same result.”

Actually, it is 50.6 times faster.

“Using Compile instead of Function makes the execution over 10 times faster.”

Actually, it is over 80 times faster.

Great post! Thanks

@ C.F.Gauss. Well spotted. I would like to claim that I tinkered with the examples after I wrote the text, but it is possible that I just got the arithmetic wrong! I will have the article corrected. Thanks.

The tips are awesome!!! Thanks dude ;-)

Very useful hints! I don’t use Sow and Reap nearly enough and I probably use a lot of pattern matching unnecessarily. The Flatten hint is going to save me tons of time since my first thought was to use Flatten/@data which I assume is exactly the same as Map[Flatten,data]. A question: are there faster ways to reference parts of lists than, say list[[All,2]] (or Transpose[list][[2]]) to get the second column of a matrix? Also, I do a lot of operations that involve Select[list,(fn[#]&)] – your Times example made me wonder if there aren’t more efficient ways to select sub-parts of lists according to given criteria?

10. (Real, tip 2.5 tipped it over) great tips. I like how tip 9. and 10. complement each other when to use tried and true versus new style.

Very useful information presented in an elegant and compact way. I was wondering if the Compile function can speedup expressions involving special functions as well e.g. Euler gamma function, Gamma[z], generalized hypergeometric function, HypergeometricPFQ.

Once again great post.

Lots of useful tricks to know. But compare to other languages in case of Mathematica is much harder to figure out how to speed up things. For example “Compile”. Is there anyone who knows exactly what can (or should) be compiled and what is not worth thinking about. I have not found any clue in the help system… And this text taken from above “Be aware though that many Mathematica functions like Table, Plot, NIntegrate, and so…” does not help much.

Thank you for this very useful post !

Tip 10 is in some way a bit worrying though: in short, try a different way, who knows if it might not be faster for some obscure reason :-) (or to cite your text : “it is often hard to predict when something clever is going to happen”)

One point to consider is the “first time” problem. For example, your first example calculating determinants…the first time I ran it in the order you did, and got timings of 2.3 and 0.04 seconds. Then I quite Mathematica and reopened the file, running the second one first. There was still a time savings, but running the floating point version first took 0.23 seconds, and the integers took 0.90 seconds. In other words, some of the time savings comes simply from having run the routine before. This is especially clear with the Solve program, which actually took longer to run with floating point numbers than integers — *if* Solve had not yet been run.

I’d be wary when considering Tip 8 (use Block over Module). For short work, that’s probably fine, but for large projects with lots of interacting functions, it can be dangerous.

Module does lexical scoping;

Block does dynamic scoping;

McCloone of course is well aware, but readers may want to look at:

http://en.wikipedia.org/wiki/Scope_%28computer_science%29#Lexical_versus_dynamic_scoping

(scroll right down to the Example heading:

Example

This example compares the consequences of using static scope and dynamic scope. )

1/3 is the same thing as 0.33333333333333

??

Great Article!

Another useful tip is to avoid using Rule expressions for storage when you don’t need them. Compare

Timing[Table[RandomReal[] -> RandomReal[], {10^6}];]

with

Timing[Table[{RandomReal[], RandomReal[]}, {10^6}];]

Also, avoid the built-in date functions wherever possible. They’re two orders of magnitude slower than the date functions in other interpreted languages, and three orders of magnitude slower than the date functions in compiled languages. Never put a date function inside a loop if you can possibly avoid it.

@ Peter Aronssen – This is directly related to tip 1. Using Rule makes this a partly symbolic expression. Pure numeric arrays are faster. However, your example is faster still if you do

RandomReal[1, {10^6, 2}]

Another example of tip 3!

Great compilation! A few comments though.

The Reap/Sow example should probably use [[2,1]] instead of[[2]] to keep the format of data consistent Dimensions@data should be {40001}, not {1,40001}. The reason Reap/Sow is so efficient is because M is generally very good with managing deeply nested expressions, as can be seen with

data={};

(

Do[data={data,RandomReal[x]},{x,0,40000}];

data=Flatten@data;

);//AbsoluteTiming

which doesn’t use Reap/Sow at all, but deep nesting and gives the same output and timing result.

I think the tip about the WB is very weak. The WB has MUCH more applicability and utility than just the profiler. And the example presented is really an example of VERY bad coding. Nobody should program it like that, as is shown in the following tip. Apart from the profiler one could mention syntax highlighting that that M f/e doesn’t provide, and better leveraging with Java and writing/debugging combined M/Java programs. And the text “To illustrate tip 3″ should probably read “to illustrate tip 4″?

And, there are good reasons to write some numerical code in Java or C#, so using JLink and NETLink are additional ways to gain speed in M. With 3 lines of code an external Java or .Net or COM library is loaded. Easy “switch-over” to Java and .Net is another very important M feature!

Hi,

Thank you for your nice information 10 Tips for Writing Fast Mathematica Code. I like it.

Thanks.

Great list, I’d like to add one essential part: Learn about PackedArrays. Use On["Packing"] and Off["Packing"] to find out when M- unpacks data.

Hi,

thanks for the tips. Just a short question on No. 1: Does anybody know why

Det[Table[1./(1. + Abs[i - j]), {i, 1., 150.}, {j, 1.,150.}]] // AbsoluteTiming

runs so much faster than

a = 150.;

Det[Table[1./(1. + Abs[i - j]), {i, 1., 150.}, {j, 1., a}]] // AbsoluteTiming

? On my macbook the first command needs roughly 0.005s, while the second needs 10x longer, i.e. 0.05s (both after a kernel reset). The same decrease occurs by using Table[Table[,..],…] instead of Table[,...,...].

thanks, markus

@ Markus

I suspect that this is because Table automatically calls Compile when called with numeric arguments, but because it has the Attributes HoldAll, it is not recognizing, until too late that you ARE calling it with numeric arguments. Using With speeds things up here…

With[{a = 150.}, Det[Table[ 1./(1. + Abs[i - j]), {i, 1., 150.}, {j, 1., a}]]] // AbsoluteTiming

The “trick” of using With[{a=a},...] works here too.

A great post.

Are there any optimisation techniques available for graphs? (now that Mathematica 8′s graphs are treated as “raw objects”)?

The same is true for symbolic computation. If you donâ€™t care about the symbolic answer and are not worried about stability …

??? I generally look to Mathematica when I want symbolic answers and/or stability. Are you suggesting that Mathematica is not for pure mathematicians any more ?

@ Andrew

I don’t think think that Mathematica was ever just for mathematicians. Back in the early 90′s we did a survey of users and found around 5% described themselves as such. Most customers that I visit seem to be engineers, scientists or (probably because I am based near London) in finance.

Take a look at

http://www.wolfram.com/solutions

for a list some of the most popular areas of application.

very helpful, thanks.

It’s good to get some tips based on what’s “under the hood” of Mathematica rather than just relying on general principles of efficient numeric programming. On a related note, I’d like to see more CUDA-optimized functions in Mathematica. I was quite impressed with the speed improvements using CUDA. Perhaps a Method->”CUDA” option?

In the procedural approach in Tip 9,

Is there any reason to use “While[TrueQ[flag],…]”

instead of simply “While[flag,...]” ?

thx

As the code stands, it is of no value at all. I had in mind not bothering to initialize the flag. But the flag logic has to be the other way round to do that…

(While[! TrueQ[stopFlag],

stopFlag = True;

Do[If[

data[[i]] > data[[i + 1]],

temp = data[[i]];

data[[i]] = data[[i + 1]];

data[[i + 1]] = temp;

stopFlag = False], {i, 1, Length[data] – 1}]];

data)

I forgot to remove the TrueQ after I added the initial value line.

Given that this is about efficiency, then it would it would be very slightly faster to do it they way I published without the TrueQ. Since this version will repeatedly have to evaluate both TrueQ and Not. Both fast functions but not necessary.

Really great tips. I like especially the info about Reap & Sow.

Only a pity that Compile does not support Reap and Sow. Compile`CompilerFunctions[] gives a list of the supported functions.

So I can use either the slow Append and then compile it or the fast Reap and then compiling is impossible :-(

Thanks Jon. Programming some sampling functions today, I thought up another useful exhortation:

“don’t map a function that threads.”

Try the following command:

AbsoluteTiming[Times[1, #] & /@ Range[10000];][[1]]/

AbsoluteTiming[Times[1, Range[10000]];][[1]]

very helpful.. very good

I guess your comparison for the Tip 4 is sligthly misleading. I definitely agree that double recursion should always handled carefully. However, I am pretty confident that the built-in implementation of the Fibonnaci numbers is not based on a recurrence but rather on the closed-form expression involving the golden ratio.

So comparing a double recurrence with a closed form is like comparing apples and oranges. Even your implementation from Tip 5 involving memoization can’t beat the closed-form !

The point of tip 3 is not just that build in functions might be coded more carefully, but that they might use completely different approaches, such as closed forms. As a user, I can’t expect to know about all such methods, which is why it is a safe bet that I should use the built in function, when it exists, for anything that I am not an expert in, or are willing to spend time researching. Perhaps it confused the point of tip 5 to mention it again there, but after tip 1, it is the most widely useful, so I felt like repeating it as much as possible.

Hey, I think your blog might be having browser compatibility issues.

When I look at your blog in Opera, it looks fine but when opening

in Internet Explorer, it has some overlapping.

I just wanted to give you a quick heads up! Other

then that, excellent blog!

Hello Visual Impact,

Where exactly did you see the problem? And what version of IE? We had a developer test it on IE 11 and 8 (PC), but couldn’t find any issues. If you could give us more details we would be happy to look into this more closely.

Thanks!

Great page. Just implementing the floating point suggestion speeded up a small program from an annoying and time wasting several seconds to .05 seconds.

I noticed that ParallelTable does not accept a function that calls ParallelTable. Not hard to see why.

This article is very helpful speeding up my Mathematica code. Thank you.

Hello, I think your blog might be having browser compatibility issues. When I look at your blog site in Safari, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, excellent blog!

As a newbie Mathematica user this article was very helpful! Thank you.

So why can I only do about 800 geodistance calculations per second but my compiled Vincenty copied from Wikipedia will do 80,000/sec?

I think filenames over the network is still 30 x slower since somewhere around version 11….

The algorithms in GeoDistance are a lot more precise than Vicenty’s algorithm, which can fail by much in some cases (for example in near-antipodal cases). GeoDistance effectively solves a shooting ODE per computation.

As I mentioned in this blog, speed also depends on the way things are done. For example this performs 10^5 GeoDistance computations in 0.3 seconds:

In[28]:= points = GeoPosition[RandomReal[{-90, 90}, {10^5, 2}]];

In[29]:= GeoDistance[points, Here]; // AbsoluteTiming

Out[29]= {0.296927, Null}

because data is packed, and the computation is performed in a single evaluator call. If we map over the list of points, then there are 10^5 independent computations, that need to handle Quantity units independently, results are not packed, etc, and then the computation is 200 times slower:

In[36]:= GeoDistance[#, Here] & /@ Thread[points]; // AbsoluteTiming

Out[36]= {51.9647, Null}