Running the Numbers with the Illinois Marathon Viewer
I love to run. A lot. And many of my coworkers do too. You can find us everywhere, and all the time: on roads, in parks, on hills and mountains, and even running up and down parking decks, a flat lander’s version of hills. And if there is a marathon to be run, we’ll be there as well. With all of the internal interest in running marathons, Wolfram Research created this Marathon Viewer as a sponsorship project for the Christie Clinic Illinois Marathon.
Here are four of us, shown as dots, participating in the 2017 Illinois Marathon:
How did the above animation and the in-depth look at our performance come about? Read on to find out.
Background
Why do I run? Of course, the expected answer is health. But when I go out for a run, I am really not concerned about my longevity. And quite frankly, given the number of times I have almost been hit by a car, running doesn’t seem to be in my best interest. For me, it is simply a good way to maintain some level of sanity. Also, it is location-independent. When I travel, I pack an extra pair of running shoes, and I am set. Running is a great way to scope out a new location. Additionally, runners are a very friendly bunch of people. We greet, we chat, we hate on the weather together. And lastly, have you ever been to a race? If so, then you know that the spectator race signs are hilarious, often politically incorrect and R-rated.
I started running longer distances in 2014. Since then, I have completed eight marathons, one of which was the 2015 Bank of America Chicago Marathon. After completing that race, I wrote a blog post analyzing the runner dataset and looking at various aspects of the race.
Since then, we have shifted focus to the Illinois Marathon here in Champaign. While Wolfram Research is an international company, it also makes sense for us to engage in our local community.
The Course
The Illinois Marathon does a great job tying together our twin cities of Champaign and Urbana. Just have a look at the map: starting in close proximity to the State Farm Center, the runners navigate across the UIUC campus, through both downtown areas, various residential neighborhoods and major parks for a spectacular finish on Zuppke Field inside Memorial Stadium.
Since its inception in 2009, the event has doubled the number of runners and races offered, as well as sponsors and partners involved. By attracting a large number of people traveling to Champaign and Urbana for this event, it has quite an economic impact on our community. This is also expressed in the amount of charitable givings raised every year.
The Marathon Viewer
As you can imagine, here at Wolfram we were very interested in doing a partnership with the marathon involving some kind of data crunching. Over the summer of 2017, we received the full registration dataset to work with. We applied the 10-step process described by Stephen Wolfram in this blog post.
Original Dataset
We first import a simple spreadsheet.
✕
raw = Import[ "/Users/eilas/Desktop/Work/Marathon/ILMarathon2017/Marathon_\ Results_Modified.csv", "Numeric" -> False]; |
The raw table descriptions look as follows:
✕
header = raw[[1]] |
But it’s more convenient to represent the raw data as key->value pairs:
✕
fullmarathon = AssociationThread[header -> #] & /@ Rest[raw]; fullmarathon[[1]] |
Interpreting Runner Entries
Wherever possible, these data points should be aligned with entities in the Wolfram Language. This not only allows for a consistent representation, but also gives access to all of the data in the Wolfram Knowledgebase for those items if desired later.
Interpreter is a very powerful tool for such purposes. It allows you to parse any arbitrary string as a particular entity type, and is often the first step in trying to align data. As an example, let’s align the given location information.
✕
allLocations2017 = Union[{"CITY", "STATE", "COUNTRY"} /. fullmarathon]; |
Here is a random example.
✕
locationExample = RandomChoice[allLocations2017] |
✕
Interpreter["City"][StringJoin[StringRiffle[locationExample]]] |
In most cases, this works without a hitch. But some location information may not be what the system expects. Participants may have specified suburbs, neighborhoods, unincorporated areas or simply made a typo. This can make an automatic interpretation impossible. Thus, we need to be prepared for other contingencies. From the same dataset, let’s look at this case:
✕
problemExample = {"O Fallon", "IL", "United States"}; |
✕
Interpreter["City"][StringJoin[StringRiffle[problemExample]]] |
We can fall back to a contingency in such a case by making use of the provided postal code 62269.
✕
With[{loc = Interpreter["Location"]["62269"]}, GeoNearest["City", loc]][[1]] |
As you can see, we do know of the city, but the initial interpretation failed due to a missing apostrophe. In comparison, this would have worked just fine:
✕
Interpreter["City"][ StringJoin[StringRiffle[{"O'Fallon", "IL", "United States"}]]] |
The major piece of information that runners are interested in is their split times. The Illinois Marathon records the clock and net times at six split distances: start, 10 kilometers, 10 miles, 13.1 miles (half-marathon distance), 20 miles and 26.2 miles (full marathon distance).
✕
random20MTime = RandomChoice["20 MILE NET TIME" /. fullmarathon] |
These are given as a list of three colon-separated numbers, which we want to represent as Wolfram Language Quantity objects.
✕
Quantity[MixedMagnitude[ FromDigits /@ StringSplit[random20MTime, ":"]], MixedUnit[{"Hours", "Minutes", "Seconds"}]] |
As with the Interpreter mentioned before, we also have to be careful in interpreting the recorded times. For the half-marathon split and longer distances, even the fastest runner needs at least an hour. Thus, we know “xx: yy: zz” always refers to “hours: minutes: seconds”. But for the shorter distances 10 kilometers and 10 miles, this might be “minutes: seconds: milliseconds”.
✕
random10KTime = RandomChoice["10K NET TIME" /. fullmarathon] |
This is then incorrect.
✕
Quantity[MixedMagnitude[ FromDigits /@ StringSplit[random10KTime, ":"]], MixedUnit[{"Hours", "Minutes", "Seconds"}]] |
No runner took more than two days to finish a 10-kilometer distance. Logic must be put in to verify the values before returning the final Quantity objects. This is the correct interpretation:
✕
Quantity[MixedMagnitude[ FromDigits /@ StringSplit[random10KTime, ":"]], MixedUnit[{"Minutes", "Seconds", "Milliseconds"}]] |
Once the data has been cleaned up, it’s just a matter of creating an Association of key->value pairs. An example piece of data for one runner shows the structure:
Interpreting Divisions
We did not just arrange the dataset by runner, but by division as well. The divisions recognized by most marathons are as follows:
✕
{"Female19AndUnder", "Female20To24", "Female25To29", "Female30To34", \ "Female35To39", "Female40To44", "Female45To49", "Female50To54", \ "Female55To59", "Female60To64", "Female65To69", "Female70To74", \ "Male19AndUnder", "Male20To24", "Male25To29", "Male30To34", \ "Male35To39", "Male40To44", "Male45To49", "Male50To54", "Male55To59", \ "Male60To64", "Male65To69", "Male70To74", "Male75To79", \ "Male80AndOver", "FemaleOverall", "FemaleMaster", "MaleOverall", \ "MaleMaster"} |
For each of these divisions, we included information about the minimum, maximum and mean running times. Since this marathon is held on a flat course and is thus fast-paced, we also added each division’s Boston Marathon–qualifying standard, and information about the runners’ qualifications.
With the data cleaned up and processed, it’s now simple to construct an EntityStore so that the data can be used in the EntityValue framework in the Wolfram Language. It’s mainly just a matter of attaching metadata to the properties so that they have display-friendly labels.
✕
EntityStore[ {"ChristieClinicMarathon2017" -> <| "Label" -> "Christie Clinic Marathon 2017 participant", "LabelPlural" -> "Christie Clinic Marathon 2017 participants", "Entities" -> processed, "Properties" -> <| "BibNumber" -> <|"Label" -> "bib number"|>, "Event" -> <|"Label" -> "event"|>, "LastName" -> <|"Label" -> "last name"|>, "FirstName" -> <|"Label" -> "first name"|>, "Name" -> <|"Label" -> "name"|>, "Label" -> <|"Label" -> "label"|>, "City" -> <|"Label" -> "city"|>, "State" -> <|"Label" -> "state"|>, "Country" -> <|"Label" -> "country"|>, "ZIP" -> <|"Label" -> "ZIP"|>, "ChristieClinic2017Division" -> <|"Label" -> "Christie Clinic 2017 division"|>, "Gender" -> <|"Label" -> "gender"|>, "PlaceDivision" -> <|"Label" -> "place division"|>, "PlaceGender" -> <|"Label" -> "place gender"|>, "PlaceOverall" -> <|"Label" -> "place overall"|>, "Splits" -> <|"Label" -> "splits"|>|> |>, "ChristieClinic2017Division" -> <| "Label" -> "Christie Clinic 2017 division", "LabelPlural" -> "Christie Clinic 2017 divisions", "Entities" -> divTypeEntities, "Properties" -> <|"Label" -> <|"Label" -> "label"|>, "Mean" -> <|"Label" -> "mean net time"|>, "Min" -> <|"Label" -> "min net time"|>, "Max" -> <|"Label" -> "max net time"|>, "BQStandard" -> <|"Label" -> "Boston qualifying standard"|>, "BeatBQ" -> <|"Label" -> "beat Boston qualifying standard"|>, "NumberBeat" -> <|"Label" -> "count beat Boston qualifying standard"|>, "RangeBQ" -> <|"Label" -> "within range Boston qualifying standard"|>, "NumberRange" -> <|"Label" -> "count within range Boston qualifying standard"|>, "OutsideBQ" -> <|"Label" -> "outside range Boston qualifying standard"|>, "NumberOutside" -> <|"Label" -> "count outside range Boston qualifying standard"|>|> |>}] |
Star in Your Own Movie
In addition to creating the entity store, the split times also give us an estimate of a runner’s position along the course as the race progresses. Thus we know the distribution of all runners throughout the race course. We took this information and plotted the runner density for each minute of an eight-hour race, and combined the frames into a movie.
It would be interesting to see how a single runner compares to the entire field. Obviously we don’t want to make a movie for 1,000+ runners and 500,000 movies for all possible pairs of runners. Instead, we utilized the fact that each runner follows a two-dimensional path in the viewing plane perpendicular to the line going from the viewpoint to the center of the map. We calculated these 2D runner paths and superimposed them over the original movie frames. Since before exporting the frames are all Graphics3D expressions in the Wolfram Language, this worked like a charm. We created the one movie to run them all.
Now we need make the data available to the general public in an easily accessible way. An obvious choice is the use of the Wolfram Cloud. The entity store, the runner position data and the density movie are easily stored in our cloud. And with some magic from my terrific coworkers, we were able to combine it all into this amazing microsite.
By default, the movie is shown. Upon a user submitting a specific bib number, the movie is overlaid with individual runner information. Additionally, we are accessing all information stored about this specific runner and their division.
More information about the development of Wolfram microsites can be found here.
Ask Wolfram|Alpha
Besides the microsite, there are many interesting computations that can be performed that surround the concept of a marathon. I have explored a few of these below.
To give you an idea of the size of the event, let’s look at a few random numbers associated with the marathon weekend. Luckily, Wolfram|Alpha has something to say about all of these.
One thousand two hundred seventeen runners finished the full marathon in 2017. This equals a total of 31,885.4 miles, which is comparable to 2.4 times the length of the Great Wall of China, or the length of 490,000 soccer fields.
✕
WolframAlpha["31885.4 miles", {{"ComparisonAsLength", 1}, "Content"}] |
✕
WolframAlpha["how many soccer fields stretch 31885.4 miles", \ {{"ApproximateResult", 1}, "Content"}] |
The marathon would literally not have ever happened had it not been for Walter Hunter inventing the safety pin back in 1849. About 80,000 of them were used during the weekend to keep bib numbers in place.
✕
WolframAlpha["safety pin", {{"BasicInformation:InventionData", 1}, "Content"}] |
The runners ate 1,600 oranges and 15,000 bananas, and drank 9,600 gallons of water and 1,920 gallons of Gatorade along the race course. Wolfram|Alpha will tell you that 1,600 oranges are enough to fill two bathtubs:
✕
WolframAlpha["How many oranges fit in a bathtub?", \ {{"ApproximateResult", 1}, "Content"}] |
… and contain an astounding 20 kilograms of sugar:
✕
WolframAlpha["sugar in 1,600 oranges", {{"Result", 1}, "Content"}] |
And trust me: 20 miles into the race while questioning all your life choices, a sweet orange slice will fix any problem. But let’s get to the finish line: here the runners finished another 800 pounds of pasta, 1,100 pizzas and another 32,600 bottles of water. The pasta and pizza provided a combined 1.8×106 dietary calories:
✕
WolframAlpha["calories in 800 lbs of pasta and 1100 pizzas", \ {{"Result", 1}, "Content"}] |
But we are not done yet. The theme of the 2017 Illinois Marathon was the 150th birthday of the University of Illinois. Ever tried to pronounce “sesquicentennial”? Going above and beyond, the race administration decided to provide the runners with 70 birthday sheet cakes—each 18×24 inches. Thanks to the folks working at the Meijer bakery, we came to find out that each such cake contains 21,340 calories, totaling close to 1.5 million calories!
✕
Table[WolframAlpha[ "70*21340 food calories", {{"Comparison", j}, "Content"}], {j, 2}] // Column |
Remember the 15,000 bananas I mentioned just a few moments ago? Turns out that their calorie count is comparable to that of the sheet cakes. That might make for a difficult discussion with a child whether “to sheet cake” or “to banana.”
✕
WolframAlpha["calories in 15,000 bananas", {{"Result", 1}, "Content"}] |
What can one do with all those calories? You did just participate in a race, and should be able to splurge a bit on food. Consider a male person weighing 159 pounds running a marathon distance at a nine-minutes-per-mile pace. He burns roughly 3,300 calories.
✕
WolframAlpha["Calories burned running at pace 9 min/mi for 26.2 \ miles", IncludePods -> "MetabolicProperties", AppearanceElements -> {"Pods"}] |
Though not recommended, you could have 32 guilt-free beers that are typically offered after a marathon race, or 17 servings of 2×2-inch pieces of sheet cake.
✕
CALORIES PER BEER |
✕
N[\!\(\* NamespaceBox["LinguisticAssistant", DynamicModuleBox[{Typeset`query$$ = "3339 food calories/(21340 food calories/108)", Typeset`boxes$$ = RowBox[{ TemplateBox[{ "3339", "\"Cal\"", "dietary Calories", "\"LargeCalories\""}, "Quantity", SyntaxForm -> Mod], "/", RowBox[{"(", RowBox[{ TemplateBox[{ "21340", "\"Cal\"", "dietary Calories", "\"LargeCalories\""}, "Quantity", SyntaxForm -> Mod], "/", "108"}], ")"}]}], Typeset`allassumptions$$ = {}, Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2}, Typeset`querystate$$ = { "Online" -> True, "Allowed" -> True, "mparse.jsp" -> 0.709614`6.302567168615541, "Messages" -> {}}}, DynamicBox[ToBoxes[ AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic, Dynamic[Typeset`query$$], Dynamic[Typeset`boxes$$], Dynamic[Typeset`allassumptions$$], Dynamic[Typeset`assumptions$$], Dynamic[Typeset`open$$], Dynamic[Typeset`querystate$$]], StandardForm], ImageSizeCache->{221., {10., 18.}}, TrackedSymbols:>{ Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$, Typeset`assumptions$$, Typeset`open$$, Typeset`querystate$$}], DynamicModuleValues:>{}, UndoTrackedVariables:>{Typeset`open$$}], BaseStyle->{"Deploy"}, DeleteWithContents->True, Editable->False, SelectWithContents->True]\)] |
Did I mention weather? Weather in Champaign is an unwelcome participant: one who does not pay a race fee, is constantly in everyone’s way, makes up its mind last-minute, does what it wants and unleashes full force. Though 2017 turned out fine, let’s look at WeatherData for the 2016 and 2015 race weekends.
Last year, the rain set in with the start of the race, lasted through the entire event and left town when the race was over. I was drenched before even crossing the starting line.
✕
Table[WolframAlpha[ "Weather Champaign 4/30/2016", {{"WeatherCharts:WeatherData", k}, "Content"}], {k, 2, 3}] // ColumnY |
But that wasn’t the worst we had seen: in 2015, a thunderstorm descended on this town while the race was ongoing. Thus, the Illinois Marathon is one of the few marathons that actually had to get canceled mid-race.
As I mentioned at the very beginning, the runners here at Wolfram Research are a tough crowd, and weather won’t deter us. If you feel inspired and would like to see yourself in a future version of the Marathon Viewer, this is the place to start: Illinois Marathon registration.
If you’d like to work with the code you read here today, you can download this post as a Wolfram Notebook.
Is it possible to use the Marathon Viewer in other countries?
The marathon viewer was custom created to analyze a given dataset. It is event-specific, not location-dependent. So, generally speaking, we could work with any race administration around the world and create a similar site based on other datasets.