Dates Everywhere in Pi(e)! Some Statistical and Numerological Musings about the Occurrences of Dates in the Digits of Pi
In a recent blog post, Stephen Wolfram discussed the unique position of this year’s Pi Day of the Century and gave various examples of the occurrences of dates in the (decimal) digits of pi. In this post, I’ll look at the statistics of the distribution of all possible dates/birthdays from the last 100 years within the (first ten million decimal) digits of pi. We will find that 99.998% of all digits occur in a date, and that one finds millions of dates within the first ten million digits of pi.
Here I will concentrate on dates than can be described with a maximum of six digits. This means I’ll be able to uniquely encode all dates between Saturday, March 14, 2015, and Sunday, March 15, 1915—a time range of 36,525 days.
We start with a graphical visualization of the topic at hand to set the mood.
Get all dates for the last 100 years
This year’s Pi Day was, like every year, on March 14.
Since the centurial Pi Day of the twentieth century, 36,525 days had passed.
We generate a list of all the 36,525 dates under consideration.
For later use, I define a function dateNumber that for a given date returns the sequential number of the date, with the first date, Mar 15 1915, numbered 1.
I allow the months January to September to be written without a leading zero—9 instead of 09 for September—and similarly for days. So, for some dates, multiple digit sequences represent them. The function makeDateTuples generates all tuples of single-digit integers that represent a date. One could use slightly different conventions and minimal changes of the code and always enforce short dates or always enforce zeros. With the optional inclusion of zeros for days and months, I get more possible matches and a richer result, so I will use these in the following. (And, if you prefer a European date format of day-month-year, then some larger adjustments have to be made to the definition of makeDateTuples.)
Some examples with four, two, and one representation:
The next plot shows which days from the last year are representable with four, five, and six digits. The first nine days of the months January to September just need four or five digits to be represented, and the last days of October, November, and December need six.
For a fast (constant time), repeated recognition of a tuple as a date, I define two functions dateQ and dateOf. dateOf gives a normalized form of a date digit sequence. We start with generating pairs of tuples and their date interpretations.
Here are some examples.
Most (77,350) tuples can be uniquely interpreted as dates; some (2,700) have two possible date interpretations.
Here are some of the digit sequences with two date interpretations.
Here are the two date interpretations of the sequence {1,2,1,5,4} as Jan 21 1954 or as Dec 1 1954 recovered by using the function datesOf.
These are the counts for the four-, five-, and six-digit representations of dates.
And these are the counts for the number of definitions set up for the function datesOf.
Find all dates in the digits of pi
For all further calculations, I will use the first ten million decimal digits of pi (later I will see that ten million digits are enough to find any date). We allow for an easy substitution of pi by another constant.
Instead of using the full digit sequence as a string, I will use the digit sequence split into (overlapping) tuples. Then I can independently and quickly operate onto each tuple. And I index each tuple with the index representing the digit number. For example:
Using the above-defined dateQ and dateOf functions, I can now quickly find all digit sequences that have a date interpretation.
Here are some of the date interpretations found. Each sublist is of the form {date, startingDigit, digitSequenceRepresentingTheDate}.
We have found about 8.1 million dates represented as four digits, about 3.8 million dates as five digits, and about 365,000 dates represented as six digits, totaling more than 12 million dates altogether.
Note that I could have used string-processing functions (especially StringPosition) to find the positions of the date sequences. And, of course, I would have obtained the same result.
While the use of StringPosition is a good approach to deal with a single date, dealing with all 35,000 sequences would have taken much longer.
We pause a moment and have a look at the counts found for the 4-tuples. Out of the 10,000 possible 4-tuples, the 8,100 used appear each on average (1/10)⁴*10⁷=10⁴ times based on the randomness of the digits of pi. And approximately, I expect a standard deviation of about 100010^½≈31.6. Some quick calculations and a plot confirm these numbers.
The histogram of the counts shows the expected bell curve.
And the following graphic shows how often each of the 4-tuples that represent dates were found in the ten million decimal digits. We enumerate the 4-tuples by concatenating the digits; as a result, I see “empty” vertical stripes in the region where no 4-tuples are represented by dates.
Now I continue to process the found date positions. We group the results into sublists of identical dates.
Every date does indeed occur in the first 10 million digits, meaning I have 36,525 different dates found. (We will see later that I did not calculate many more digits than needed.)
Here is what a typical member of dateGroups looks like.
Statistics of all dates
Now let’s do some statistics on the found dates. Here is the number of occurrences of each date in the first ten million digits of pi. Interestingly, and in the first moment maybe unexpectedly, many dates appear hundreds of times. The periodically occurring vertical stripes result from the October-November-December month quarters.
The mean spacing between the occurrences also clearly shows the early occurrence of four-digit years with average spacings below 10,000, the five-digit dates with spacings around 100,000, and the six-digit dates with spacings around one million.
For easier readability, I format the triples {date, startingPosition, dateDigitSequence} in a customized manner.
The most frequent date in the first ten million digits of pi is Aug 6 1939—it occurs 1,362 times.
Now let’s find the least occurring dates in the first ten million digits of pi. These three dates occur only once in the first ten million digits.
And all of these dates occur only twice in the first ten million digits.
Here is the distribution of the number of the date occurrences. The three peaks corresponding to the six-, five-, and four-digit date representations (from left to right) are clearly distinct. The dates that are represented by 6-tuples each occur only a very few times, and, as I have already seen above, appear on average about 1,200 times.
We can also accumulate by year and display the date interpretations per year (the smaller values at the beginning and end come from the truncation of the dates to ensure uniqueness.) The distribution is nearly uniform.
Let’s have a look at the dates with some “neat” date sequences and how often they occur. As the results in dateGroups are sorted by date, I can easily access a given date. When does the date 11-11-11 occur?
And where does the date 1-23-45 occur?
No date starts on its “own” position (meaning there is no example such as January 1, 1945 [1-1-4-5] in position 1145).
But one palindromic case exists: March 3, 1985 (3.3.8.5), which occurs at palindromic position 5833.
A very special date is January 9, 1936: 1.9.3.6 appears at the position of the 1,936th prime, 16,747.
Let’s see what anniversaries happened on this day in history.
While no date appeared at its “own” position, if I slightly relax this condition and search for all dates that overlap with its digits’ positions, I do find some dates.
And at more than 100 positions within the first ten million digits of pi, I find the famous pi starting sequence 3,1,4,1 5 again.
Within the digits of pi I do not just find birthday dates, but also physical constant days, such as the ħ-day (the reduced Planck constant day), which was celebrated as the centurial instance on October 5, 1945.
Here are the positions of the matching date sequences.
And here is an attempt to visualize the appearance of all dates. In the date-digit plane, I place a point at the beginning of each date interpretation. We use a logarithmic scale for the digit position, and as a result, the number of points is much larger in the upper part of the graphic.
For the dates that appear early on in the digit sequence, the finite extension of the date over the digits can be visualized too. A date extends over four to six digits in the digit sequence. The next graphic shows all digits of all dates that start within the first 10,000 digits.
After coarse-graining, the distribution is quite uniform.
So far I have taken a date and looked at where this date starts in the digit sequence of pi. Now let’s look from the reverse direction: how many dates intersect at a given digit of pi? To find the total counts of dates for each digit, I loop over the dates and accumulate the counts for each digit.
A maximum of 20 dates occur at a given digit.
Here are two intervals of 200 digits each. We see that most digits are used in a date interpretation.
Above, I noted that I have about 12 million dates in the digit sequence under consideration. The digit sequence that I used was only ten million digits long, and each date needs about five digits. This means the dates need about 60 million digits. It follows that many of the ten million digits must be shared and used on average about five times. Only 2,005 out of the first ten million digits are not used in any of the date interpretations, meaning that 99.98% of all digits are used for date interpretations (not all as starting positions).
And here is the histogram of the distribution of the number of dates present at a certain digit. The back-of-the-envelope number of an average of six dates per digits is clearly visible.
The 2,005 positions that are not used are approximately uniformly distributed among the first ten million digits.
If I display the concrete positions of the non-used digits versus their expected average position, I obtain a random walk–like graph.
So, what are the neighboring digits around the unused digits? One hundred sixty two different five-neighborhoods exist. Looking at them immediately shows why the center digits cannot be part of a date: too many sequences of zeros before, at, or after.
And the largest unused block of digits that appears are the six digits between position 8,127,088 and 8,127,093.
At a given digit, dates from various years overlap. The next graphic shows the range from the earliest to the latest year as a function of the digit position.
These are the unused digits together with three left- and three right-neighboring digits.
Because the high coverage seems, in the first moment, maybe unexpected, I select a random digit position and select all dates that use this digit.
And here is a visualization of the overlap of the dates.
The most-used digit is the 1 at position 2,645,274: 20 possible date interpretations meet at it.
Here are the digits in its neighborhood and the possible date interpretations.
If I plot the years starting at a given digit for a larger amount of digits (say the first 10,000), then I see the relatively dense coverage of date interpretations in the digits-date plane.
Let’s now build a graph of dates that are “connected”. We’ll consider two dates connected if the two dates share a certain digit of the digit sequence (not necessarily the starting digit of a date).
Here is the same as the graph for the first 600 digits with communities emphasized.
We continue with calculating the mean distance between two occurrences of the same date.
The first occurrences of dates
The first occurrences of dates are the most interesting, so let’s extract these. We will work with two versions, one sorted by the date (the list firstOccurrences) they represent, and one sorted by the starting position (the list firstOccurrencesSortedByOccurrence) in the digits of pi.
Here are the possible date interpretations that start within the first 10 digits of pi.
And here are the other extremes: the dates that appear deepest into the digit expansion.
We see that Wed Nov 23 1960 starts only at position 9,982,546(=2 7 713039)—so by starting with the first ten million digits, I was a bit lucky to catch it. Here is a quick direct check of this record-setting date.
So, who are the lucky (well-known) people associated with this number through their birthday?
And what were the Moon phases on the top dozen out-pi-landish dates?
And while Wed Nov 23 1960 is furthest out in the decimal digit sequence, the last prime date in the list is Oct 22 1995.
In general, less than 10% of all first date appearances are prime.
Often one maps the digits of pi to a direction in the plane and forms a random walk. We do the same based on the date differences between consecutive first appearances of dates. We obtain typically looking 2D random walk images.
Here are the first-occurring date positions for the last few years. The bursts in October, November, and December of each year are caused by the need for five or six consecutive digits, while January to September can be encoded with fewer digits if I skip the optional zeros.
If I include all dates, I get, of course, a much denser filled graphic.
A logarithmic vertical axis shows that most dates occur between the thousandth and millionth digits.
To get a more intuitive understanding of overall uniformity and local randomness in the digit sequence (and as a result in the dates), I make a Voronoi tessellation of the day-digit plane based on points at the first occurrence of a date. The decreasing density for increasing digits results from the fact that I only take first-date occurrences into account.
Easter Sunday positions are a good date to visualize, as the date varies over the years.
The mean first occurrence as a function of the number of digits needed to specify a date depends, of course, on the number of digits needed to encode a date.
The mean occurrence is at 239,083, but due to the outliers at a few million digits, the standard deviation is much larger.
Here are the first occurrences of the “nice” dates that are formed by repetition of a single digit.
The detailed distribution of the number of occurrences of first dates has the largest density within the first few 10,000 digits.
A logarithmic axis shows the distribution much better, but because of the increasing bin sizes, the maximum has to be interpreted with care.
The last distribution is mostly a weighted superposition of the first occurrences of four-, five-, and six-digit sequences.
And here is the cumulative distribution of the dates as a function of the digits’ positions. We see that the first 1% of the ten million digits covers already 60% of all dates.
Slightly more dates start at even positions than at odd positions.
We could do the same with mod 3, mod 4, … . The left image shows the deviation of each congruence class from its average value, and the right image shows the higher congruences, all considered again mod 2.
The actual number of first occurrences per year fluctuates around the mean value.
The mean number of first-date interpretations sorted by month clearly shows the difference between the one-digit months and the two-digit months.
The mean number by day of the month (ranging from 1 to 31) is, on average, a slowly increasing function.
Finally, here are the mean occurrences by weekday. Most first date occurrences happen for dates that are Wednesdays.
Above I observed that most numbers participate in a possible date interpretation. Only relatively few numbers participate in a first-occurring date interpretation: 121,470.
Some of the position sequences overlap anyway, and I can form network chains of the dates with overlapping digit sequences.
The next graphic shows the increasing gap sizes between consecutive dates.
Distribution of the gap sizes:
Here are pairs of consecutively occurring date-interpretations that have the largest gap between them. The larger gaps were clearly visible in the penultimate graphic.
Dates in other expansions and in other constants
Now, the very special dates are the ones where the concatenated continued fraction (cf) expansion position agrees with the decimal expansion position. By concatenated continued fraction expansion, I mean the digits on the left at each level of the following continued fraction.
This gives the following cf-pi string:
And, interestingly, there is just one such date.
None of the calculations carried out so far were special to the digits in pi. The digits of any other irrational numbers (or even sufficiently long rational numbers) contain date interpretations. Running some overnight searches, it is straightforward to find many numeric expressions that contain the dates of this year (2015). Here they are put together in an interactive demonstration.
We now come to the end of our musings. As a last example, let’s interpret digit positions as seconds after this year’s pi-time at March 14 9:26:53. How long would I have to wait until seeing the digit sequence 3·1·4·1·5 in the decimal expansion of other constants? Can one find a (small) expression such that 3·1·4·1·5 does not occur in the first million digits? (The majority of the elements of the following list ξs are just directly written down random expressions; the last elements were found in a search for expressions that have the digit sequence 3·1·4·1·5 as far out as possible.)
Here are two rational numbers whose decimal expansions contain the digit sequence:
And here are two integers with the starting digit sequence of pi.
Using the neat new function TimelinePlot that Brett Champion described in his last blog post, I can easily show how long I would have to wait.
We encourage readers to explore the dates in the digits of pi more, or replace pi with another constant (for instance, Euler’s constant E, to justify the title of this post), and maybe even 10 by another base. The overall, qualitative structures will be the same for almost all irrational numbers. (For a change, try ChampernowneNumber[10].) Will ten million digits be enough to find every date in, say, E (where is October 21, 2014?) Which special dates are hidden in other constants? These and many more things are left to explore.
Download this post as a Computable Document Format (CDF) file.
Great post!
The alternative is not a “European date format” but rather a “rest-of-the-world date format”. Only the U.S. uses out-of-scale (i.e. day and month transposed) date sequences. For an international product you might take more care with this in future.
I only VERY quickly scanned this whole article, focusing on a very few random areas of it. It’s really interesting, and want to read it more carefully later! Oddly, one of the FEW areas I focused on had one possible typing error – unless I mis-understood something in that place in the article. It’s where this sentence appears : “And at more than 100 positions within the first ten million digits of pi, I find the famous pi starting sequence 3,1,4,5,9 again.” Isn’t a ‘1’ skipped in the 4th position of pi in the number sequence toward the end of the sentence? Or is there some connection with the rest of the article I’m not making by not reading the whole thing?
Thanks for pointing this out, we have corrected the typo.
You’re welcome! haha – Was so lucky to see that, as I only focused on very few areas of the article while very quickly scanning the rest!