Wolfram Computation Meets Knowledge

Tracking the World Records

With the 2012 Olympics upon us, and records waiting to be broken, it might be a good time to consider some aspects of track and field. I need to write this now, because once the track part of the Games is underway, I fully intend to spend quality time with a television set. Why do I like track? Well, what school sport might one take up if one is (read: was) scrawny and not very (read: very not) coordinated?

I will focus on men’s track, but the gist of this almost certainly applies to women’s as well. We’ll look at speeds of world records and how they change as the distances get longer. I’ll start with a Demonstration by my Wolfram Research colleague Sy Blinder, “How Fast Can You Run?” The Demonstration shows that speeds follow an interesting pattern, which is covered by me here. Along the way I will also inadvertently reveal that I know nothing whatever about data modeling.

To underscore the comment about records being broken, I will point out that several of the record times listed in Blinder’s Demonstration are already out of date. Below is a current list, of the form {distance, time} measured in {meters, seconds}. I omitted the less common distances because they might not be indicative of the best possible efforts, even among elite athletes.

dtlist = {{100, 9.58`}, {200, 19.19`}, {400, 43.18`}, {800, 101.01`}, {1500, 206.`}, {1609.344`, 223.13`}, {3000, 440.67`}, {5000, 757.35`}, {10000, 1577.53`}, {42195, 7380}}; distances = dtlist[[All, 1]];

In which We Pretend to Do Serious Computation (or: How the Line Got Two Slopes)

It was observed in the Demonstration that the logarithms (logs) appear to have something close to a linear relation. Here is a picture to see this.

p1 = ListLogLogPlot[dtlist, AxesOrigin → {40, 1}]

Plot of current world records

OK, that does seem to look liney. Let us see what kind of relationship between distance and time such a line would imply. We have log(y) = alog(x) + b (the standard form of a line with slope a and intercept b). So what does this say about y as a function of x?

f1 = y /. First[Solve[Log[y] = a*Log[x] + b, y]]

E^b x^a

Since (eb) is itself just a (positive) constant, we have a model of the form y = kxa for unknown parameters k and a. Good so far. Now to fit to this requires the skills of someone who knows about curve fitting. That is not me. I tried, saw that the larger values were skewing the fits, and did the naive thing of fitting instead to the log-log values and then converting to the exponential relation.

Even so, the fit was poor. If you squint, or cheat and read my mind, you will guess that really there are two slightly different lines required, with a mild “kink” where they meet (this will show more clearly in the next plot). It happens somewhere in the middle distances; more about that later. So I will fit to the smaller and larger values separately (though with some overlap), and then show it all together.

I take the liberty of omitting the very first point. The reason is that 100 and 200 meters, at the elite level, are run at the same average speeds. That is to say, the 200m record is typically twice that of the 100m. Currently this holds to within around two hundredths of a second. This implies that they will not satisfy a nontrivial exponential relation, and including the 100m may throw things off slightly.

Without further ado, here is the fit to an exponential curve for the shorter distances, using 200m through mile.

y1[x_] = f1 /. FindFit[Log@N@dtlist[[2 ;; 6]], a*x + b, {a, b}, {x}]

0.0371604 x^1.17942

Let’s see what this predicts for some of the shorter distances.

Map[{#, y1[#]} &, distances[[1 ;; 7]]]

{{100, 8.49038}, {200, 19.2295}, {400, 43.5523}, {800, 98.6399}, {1500, 207.032}, {1609.34, 224.946}, {3000, 468.898}}

Comparing to the actual records, we see that this fit is (almost) quite good through the middle distances, then falls way off at the 3k. The “almost” refers to the fact that, for the reasons noted above, the 100m comes in way too fast.

Now we move to the longer distances, fitting to distances from 1500m through the marathon.

y2[x_] = f1 /. FindFit[Log@N@dtlist[[5 ;; -1]], a*x + b, {a, b}, {x}]

0.0823983 x^1.07087

Map[{#, y2[#]} &, distances[[4 ;; -1]]]

{{800, 105.867}, {1500, 207.544}, {1609.34, 223.787}, {3000, 435.989}, {5000, 753.438}, {10000, 1582.75}, {42195, 7395.85}}

Again this is not bad most of the way, although the relative error is noticeable for the 800 meter run. Now we show everything together. I will extend both lines to cover the full domain, so that it is easy to see where they fall off the mark.

p2 = LogLogPlot[y1[x], {x, 10, 45000}, ColorFunction → (Green &)]; p3 = LogLogPlot[y2[x], {x, 10, 45000}, ColorFunction → (Red &)]; p4 = Show[p1, p2, p3]

Plot of world records of races between 800 m and the marathon

The upshot is that our best “line” has two slopes. (Can a line change its slope? Can a leopard change its spots?)

In which Our Hero Becomes a Curve

That showed things quite well, I think. But it is hardly the end of the story; apologies to readers who are already undergoing ennuification. To start with the next point (which will get us toward the kink mentioned above), let us look at a speed plot. The speed units will be meters per second.

speeds = Map[{#[[1]], #[[1]]/#[[2]]} &, N@dtlist]

{{100., 10.4384}, {200., 10.4221}, {400., 9.26355}, {800., 7.92001}, {1500., 7.28155}, {1609.34, 7.21258}, {3000., 6.80782}, {5000., 6.60197}, {10000., 6.33902}, {42195., 5.71748}}

Again we will plot logs against logs.

ListLogLogPlot[speeds, Joined → True]

Plot of the average human speed in the world record

We see very clearly that the 100m and 200m have the same average speeds (although in actual fact the race splits tell a more complicated story). And again we see a phenomenon that might be a kink or two, indicating a discontinuity in the speed change. So what is going on?

Several years ago, while thinking about world records, I realized that after 200 meters the men’s record pace falls off by about one second per hundred meters for every doubling of the distance. With one exception, which I refer to as “the anomaly.” Between 400 and 800 meters the change is more like two seconds per hundred meters. That is to say, the record 800 was about 16 seconds more than twice as long as the record 400. Is there an explanation for this? I’ll point to a reference later.

I remark that these figures apply only to elite male runners. In some cases they may even apply to the same runner. In the 1976 Olympics, for example, Alberto Juantorena’s 800 time was slightly under 16 seconds more than twice his 400; the former was, moreover, a new world record. (There is no reason to remember this bit of trivia, by the way. It’s not as though you’ll ever encounter it on, say, a homework assignment.) For those of us not in that class, such a time difference simply means one is relatively better at the 800. For elite women runners, while I have not checked, I would expect there to be a similar rule of thumb, though with a slightly larger speed change and a kink occurring perhaps elsewhere.

For now let us ignore the anomaly and pretend that the speed loss per doubled distance is a constant. Does this also give an exponential relation? Well, we can find out, since we have Mathematica at our fingertips. The recurrence relation modeling this is f(2x) = 2f(x) + kx for some constant k. We will require a specific data value; I use the 1500 meter event for this.

model2 = f[x] /. First[RSolve[{f[2x] = 2 f[x] + kx, f[1500] = 206.}, f[x], x]]

0.137333x - 5.27537kx + 0.721348kxLog[x]

Hmm… that hardly looks exponential. For the very good reason that it isn’t. Undaunted, we will do another fit. (Note: This is in fact related to the exponentials above. It follows from the fact that x to a small power and log(x) arise as related integrals.) As xlog(x) dominates x, we should specify that our parameter k is positive. Knowing we have trouble with our presumed speed changes prior to 800 meters, I’ll use data points from that distance onward to do the fit.

y3[x_] = model2 /.  FindFit[dtlist[[4 ;; -1]], {model2, k ≥ .001}, {k}, {x}]

0.0550631x + 0.0112495xLog[x]

Now let us check this model against the full dataset.

Map[{#, y3[#]} &, distances]

{{100, 10.6869}, {200, 22.9333}, {400, 48.9857}, {800, 104.209}, {1500, 206.}, {1609.34, 222.29}, {3000, 435.393}, {5000, 754.387}, {10000, 1586.75}, {42195, 7378.69}}

This is a poor model below 800 meters. At 800 it is also inaccurate, although I can remember when the predicted value was in fact right around the world record (see earlier remark about 1976 Olympics).

p5 = LogLogPlot[y3[x], {x, 10, 45000}, ColorFunction → (Orange &)]; p6 = Show[p1, p5]

Plot of the predicted values for world records and the actual world record

Though it is hard to tell visually, this one curve is a better fit than either of the lines alone. The red line comes close, but is further off at the shorter distances. And we only needed one parameter to fit, not two, so in some sense we have a more minimal model.

Not knowing this was my intended topic, my colleague Michael Trott rather presciently suggested I look into the following reference: “Optimal Pacing for Running 400m and 800m Track Races,” by James Reardon, dated April 2 of this year. It contains a wealth of information and better modeling than I could ever do, and explains some of the physiology that is likely relevant to the anomalous speed drop going from 400 to 800 meters. One important factor is that the anaerobic energy supply can last, for the best runners, almost the entire course of the 400. So the 800 becomes a qualitatively different event; I remark that it is the first distance to be listed as a “run” rather than a “dash.” Is this physiological change responsible for that anomalous jump in speed loss? I do not know enough about it to say, but I find the correlation intriguing.


Jesse Owens won Olympic medals, gold in fact, for two consecutive distances (100m and 200m). Michael Johnson did likewise for 200m and 400m (also both gold). In fact, every consecutive pair has seen at least one runner win medals in both distances (yes, I really knew that without looking it up). Here is the question: For each consecutive pair, name a runner who has won Olympic gold medals at both distances. Okay, I confess there was one pair for which I had to go to Wolfram|Alpha. (Extra credit: Put the steeplechase between the 1500m and 5k. Now see what happens.)

Download this post as a Computable Document Format (CDF) file.


Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

1 comment

  1. Thanks Danny. The anomaly at 400 and 800 meters is fascinating.