Wolfram Blog
Michael Trott

Dates Everywhere in Pi(e)! Some Statistical and Numerological Musings about the Occurrences of Dates in the Digits of Pi

June 23, 2015 — Michael Trott, Chief Scientist

In a recent blog post, Stephen Wolfram discussed the unique position of this year’s Pi Day of the Century and gave various examples of the occurrences of dates in the (decimal) digits of pi. In this post, I’ll look at the statistics of the distribution of all possible dates/birthdays from the last 100 years within the (first ten million decimal) digits of pi. We will find that 99.998% of all digits occur in a date, and that one finds millions of dates within the first ten million digits of pi.

Here I will concentrate on dates than can be described with a maximum of six digits. This means I’ll be able to uniquely encode all dates between Saturday, March 14, 2015, and Sunday, March 15, 1915—a time range of 36,525 days.

We start with a graphical visualization of the topic at hand to set the mood.

Graphic visualization of pi

Get all dates for the last 100 years

This year’s Pi Day was, like every year, on March 14.

This year's pi day

Since the centurial Pi Day of the twentieth century, 36,525 days had passed.

Number of days between centurial pi days

We generate a list of all the 36,525 dates under consideration.

List of dates under consideration

For later use, I define a function dateNumber that for a given date returns the sequential number of the date, with the first date, Mar 15 1915, numbered 1.

Defining function dateNumber

I allow the months January to September to be written without a leading zero—9 instead of 09 for September—and similarly for days. So, for some dates, multiple digit sequences represent them. The function makeDateTuples generates all tuples of single-digit integers that represent a date. One could use slightly different conventions and minimal changes of the code and always enforce short dates or always enforce zeros. With the optional inclusion of zeros for days and months, I get more possible matches and a richer result, so I will use these in the following. (And, if you prefer a European date format of day-month-year, then some larger adjustments have to be made to the definition of makeDateTuples.)

Using makeDateTuples to generate tuples

Some examples with four, two, and one representation:

Examples of tuples with four, two, and one representation

The next plot shows which days from the last year are representable with four, five, and six digits. The first nine days of the months January to September just need four or five digits to be represented, and the last days of October, November, and December need six.

Which days from last year are representable with four, five, and six digits

For a fast (constant time), repeated recognition of a tuple as a date, I define two functions dateQ and dateOf. dateOf gives a normalized form of a date digit sequence. We start with generating pairs of tuples and their date interpretations.

Generating pairs of tuples and their data interpretations

Here are some examples.

RandomSample of tuplesAndDates

Most (77,350) tuples can be uniquely interpreted as dates; some (2,700) have two possible date interpretations.

Tuples interpreted as dates

Here are some of the digit sequences with two date interpretations.

Digit sequences with two date interpretations

Here are the two date interpretations of the sequence {1,2,1,5,4} as Jan 21 1954 or as Dec 1 1954 recovered by using the function datesOf.

Two date interpretations of the sequence 1,2,1,5,4

These are the counts for the four-, five-, and six-digit representations of dates.

Counts for the four-, five-, and six-digit representations of dates

And these are the counts for the number of definitions set up for the function datesOf.

Counts for the number of definitions set up for the function datesOf

Find all dates in the digits of pi

For all further calculations, I will use the first ten million decimal digits of pi (later I will see that ten million digits are enough to find any date). We allow for an easy substitution of pi by another constant.

Allowing for an easy substitution of pi by another constant

Instead of using the full digit sequence as a string, I will use the digit sequence split into (overlapping) tuples. Then I can independently and quickly operate onto each tuple. And I index each tuple with the index representing the digit number. For example:

Using the digit sequence split into overlapping tuples

Using the above-defined dateQ and dateOf functions, I can now quickly find all digit sequences that have a date interpretation.

Finding all digit sequences that have a date interpretation

Here are some of the date interpretations found. Each sublist is of the form {date, startingDigit, digitSequenceRepresentingTheDate}.

Sublist with the form date, startingDigit, digitSequenceRepresentingTheDate

We have found about 8.1 million dates represented as four digits, about 3.8 million dates as five digits, and about 365,000 dates represented as six digits, totaling more than 12 million dates altogether.

Dates represented at four, five, and six digits

Note that I could have used string-processing functions (especially StringPosition) to find the positions of the date sequences. And, of course, I would have obtained the same result.

Using string-processing functions to find the positions of the date sequences

While the use of StringPosition is a good approach to deal with a single date, dealing with all 35,000 sequences would have taken much longer.

Time to deal with 35,000 sequences

We pause a moment and have a look at the counts found for the 4-tuples. Out of the 10,000 possible 4-tuples, the 8,100 used appear each on average (1/10)⁴*10⁷=10⁴ times based on the randomness of the digits of pi. And approximately, I expect a standard deviation of about 100010^½≈31.6. Some quick calculations and a plot confirm these numbers.

Counts for the 4-tuples

The histogram of the counts shows the expected bell curve.

Histogram showing the expected bell curve

And the following graphic shows how often each of the 4-tuples that represent dates were found in the ten million decimal digits. We enumerate the 4-tuples by concatenating the digits; as a result, I see “empty” vertical stripes in the region where no 4-tuples are represented by dates.

4-tuples that represent dates were found in the ten million decimal digits

Now I continue to process the found date positions. We group the results into sublists of identical dates.

Grouping the results into sublists of identical dates

Every date does indeed occur in the first 10 million digits, meaning I have 36,525 different dates found. (We will see later that I did not calculate many more digits than needed.)

36,525 different dates found in the first 10 million digits

Here is what a typical member of dateGroups looks like.

What a typical member of a dateGroups look like

Statistics of all dates

Now let’s do some statistics on the found dates. Here is the number of occurrences of each date in the first ten million digits of pi. Interestingly, and in the first moment maybe unexpectedly, many dates appear hundreds of times. The periodically occurring vertical stripes result from the October-November-December month quarters.

Number of occurrences of each date in the first ten million digits of pi

The mean spacing between the occurrences also clearly shows the early occurrence of four-digit years with average spacings below 10,000, the five-digit dates with spacings around 100,000, and the six-digit dates with spacings around one million.

Mean spacing between the occurrences

For easier readability, I format the triples {date, startingPosition, dateDigitSequence} in a customized manner.

Formating triples for easier readability

The most frequent date in the first ten million digits of pi is Aug 6 1939—it occurs 1,362 times.

Most frequent date in the first ten million digits

Now let’s find the least occurring dates in the first ten million digits of pi. These three dates occur only once in the first ten million digits.

Least occurring dates in the first ten million digits

And all of these dates occur only twice in the first ten million digits.

Dates that occur only twice in the first ten million digits

Here is the distribution of the number of the date occurrences. The three peaks corresponding to the six-, five-, and four-digit date representations (from left to right) are clearly distinct. The dates that are represented by 6-tuples each occur only a very few times, and, as I have already seen above, appear on average about 1,200 times.

Distribution of the number of the date occurrences

We can also accumulate by year and display the date interpretations per year (the smaller values at the beginning and end come from the truncation of the dates to ensure uniqueness.) The distribution is nearly uniform.

Display the date interpretations per year

Let’s have a look at the dates with some “neat” date sequences and how often they occur. As the results in dateGroups are sorted by date, I can easily access a given date. When does the date 11-11-11 occur?

Dates with date sequences and how often they occur

And where does the date 1-23-45 occur?

Where does the date 1-23-45 occur

No date starts on its “own” position (meaning there is no example such as January 1, 1945 [1-1-4-5] in position 1145).

No date starts on its "own" position

But one palindromic case exists: March 3, 1985 (3.3.8.5), which occurs at palindromic position 5833.

One palindromic case exists

A very special date is January 9, 1936: 1.9.3.6 appears at the position of the 1,936th prime, 16,747.

1.9.3.6 appears at the position of the 1,936th prime

Let’s see what anniversaries happened on this day in history.

Anniversaries on January 9, 1936

While no date appeared at its “own” position, if I slightly relax this condition and search for all dates that overlap with its digits’ positions, I do find some dates.

All dates that overlap with its digits' positions

And at more than 100 positions within the first ten million digits of pi, I find the famous pi starting sequence 3,1,4,1 5 again.

Finding pi again within the first ten million digits

Within the digits of pi I do not just find birthday dates, but also physical constant days, such as the ħ-day (the reduced Planck constant day), which was celebrated as the centurial instance on October 5, 1945.

Finding physical constant days within pi

Here are the positions of the matching date sequences.

Positions of the matching date sequences using ListLogLinearPlot

And here is an attempt to visualize the appearance of all dates. In the date-digit plane, I place a point at the beginning of each date interpretation. We use a logarithmic scale for the digit position, and as a result, the number of points is much larger in the upper part of the graphic.

 Visualizing the appearance of all dates

For the dates that appear early on in the digit sequence, the finite extension of the date over the digits can be visualized too. A date extends over four to six digits in the digit sequence. The next graphic shows all digits of all dates that start within the first 10,000 digits.

All digits of all dates that start within the first 10,000 digits

After coarse-graining, the distribution is quite uniform.

Distribution is uniform using coarse-graining

So far I have taken a date and looked at where this date starts in the digit sequence of pi. Now let’s look from the reverse direction: how many dates intersect at a given digit of pi? To find the total counts of dates for each digit, I loop over the dates and accumulate the counts for each digit.

Finding the total counts of dates for each digit

A maximum of 20 dates occur at a given digit.

A maximum of 20 dates occur at a given digit.

Here are two intervals of 200 digits each. We see that most digits are used in a date interpretation.

Two intervals of 200 digits each

Above, I noted that I have about 12 million dates in the digit sequence under consideration. The digit sequence that I used was only ten million digits long, and each date needs about five digits. This means the dates need about 60 million digits. It follows that many of the ten million digits must be shared and used on average about five times. Only 2,005 out of the first ten million digits are not used in any of the date interpretations, meaning that 99.98% of all digits are used for date interpretations (not all as starting positions).

2,005 out of the first ten million digits are not used in any of the date interpretations

And here is the histogram of the distribution of the number of dates present at a certain digit. The back-of-the-envelope number of an average of six dates per digits is clearly visible.

Histogram of the distribution of the number of dates present at a certain digit

The 2,005 positions that are not used are approximately uniformly distributed among the first ten million digits.

The 2,005 positions that are not used are approximately uniformly distributed

If I display the concrete positions of the non-used digits versus their expected average position, I obtain a random walk–like graph.

Random walk-like graph

So, what are the neighboring digits around the unused digits? One hundred sixty two different five-neighborhoods exist. Looking at them immediately shows why the center digits cannot be part of a date: too many sequences of zeros before, at, or after.

Neighboring digits around the unused digits

And the largest unused block of digits that appears are the six digits between position 8,127,088 and 8,127,093.

Largest unused block of digits are the six digits between position 8,127,088 and 8,127,093

At a given digit, dates from various years overlap. The next graphic shows the range from the earliest to the latest year as a function of the digit position.

These are the unused digits together with three left- and three right-neighboring digits.

Unused digits together with three left- and three right-neighboring digits

Because the high coverage seems, in the first moment, maybe unexpected, I select a random digit position and select all dates that use this digit.

Random digit position and select all dates that use this digit

And here is a visualization of the overlap of the dates.

Code for visualization of the overlap of the dates
Visualization of the overlap of the dates

The most-used digit is the 1 at position 2,645,274: 20 possible date interpretations meet at it.

Most-used digit is the 1 at position 2,645,274: 20 possible date interpretations meet at it

Here are the digits in its neighborhood and the possible date interpretations.

Digits in its neighborhood and the possible date interpretations

If I plot the years starting at a given digit for a larger amount of digits (say the first 10,000), then I see the relatively dense coverage of date interpretations in the digits-date plane.

Plot of years starting at a given digit for a larger amount of digits

Let’s now build a graph of dates that are “connected”. We’ll consider two dates connected if the two dates share a certain digit of the digit sequence (not necessarily the starting digit of a date).

Graph of dates that are connected

Here is the same as the graph for the first 600 digits with communities emphasized.

Graph for the first 600 digits with communities emphasized

We continue with calculating the mean distance between two occurrences of the same date.

Calculating the mean distance between two occurrences of the same date

The first occurrences of dates

The first occurrences of dates are the most interesting, so let’s extract these. We will work with two versions, one sorted by the date (the list firstOccurrences) they represent, and one sorted by the starting position (the list firstOccurrencesSortedByOccurrence) in the digits of pi.

Using firstOccurrences and firstOccurrencesSortedByOccurrence

Here are the possible date interpretations that start within the first 10 digits of pi.

Possible date interpretations that start within the first 10 digits of pi

And here are the other extremes: the dates that appear deepest into the digit expansion.

Dates that appear deepest into the digit expansion

We see that Wed Nov 23 1960 starts only at position 9,982,546(=2 7 713039)—so by starting with the first ten million digits, I was a bit lucky to catch it. Here is a quick direct check of this record-setting date.

Direct check of this record-setting date

So, who are the lucky (well-known) people associated with this number through their birthday?

People associated with November 23 1960 as their birthday

And what were the Moon phases on the top dozen out-pi-landish dates?

Moon phases on the top dozen out-pi-landish dates

And while Wed Nov 23 1960 is furthest out in the decimal digit sequence, the last prime date in the list is Oct 22 1995.

The last prime date

In general, less than 10% of all first date appearances are prime.

Percentage of first date appearances being prime

Often one maps the digits of pi to a direction in the plane and forms a random walk. We do the same based on the date differences between consecutive first appearances of dates. We obtain typically looking 2D random walk images.

Date differences between consecutive first appearances of dates

Here are the first-occurring date positions for the last few years. The bursts in October, November, and December of each year are caused by the need for five or six consecutive digits, while January to September can be encoded with fewer digits if I skip the optional zeros.

First-occurring date positions for the last few years

If I include all dates, I get, of course, a much denser filled graphic.

All date positions for the last few years

A logarithmic vertical axis shows that most dates occur between the thousandth and millionth digits.

Logarithmic vertical axis shows that most dates occur between the thousandth and millionth digits

To get a more intuitive understanding of overall uniformity and local randomness in the digit sequence (and as a result in the dates), I make a Voronoi tessellation of the day-digit plane based on points at the first occurrence of a date. The decreasing density for increasing digits results from the fact that I only take first-date occurrences into account.

Voronoi tessellation of the day-digit plane based on points at the first occurrence of a date

Easter Sunday positions are a good date to visualize, as the date varies over the years.

Visualizing Easter Sunday dates

The mean first occurrence as a function of the number of digits needed to specify a date depends, of course, on the number of digits needed to encode a date.

Finding mean first occurrence

The mean occurrence is at 239,083, but due to the outliers at a few million digits, the standard deviation is much larger.

The mean occurrence is at 239,083

Here are the first occurrences of the “nice” dates that are formed by repetition of a single digit.

First occurrences of the nice dates that are formed by repetition of a single digit

The detailed distribution of the number of occurrences of first dates has the largest density within the first few 10,000 digits.

Detailed distribution of the number of occurrences of first dates

A logarithmic axis shows the distribution much better, but because of the increasing bin sizes, the maximum has to be interpreted with care.

Logarithmic axis showing the distribution

The last distribution is mostly a weighted superposition of the first occurrences of four-, five-, and six-digit sequences.

The last distribution is mostly a weighted superposition of the first occurrences of four-, five-, and six-digit sequences

And here is the cumulative distribution of the dates as a function of the digits’ positions. We see that the first 1% of the ten million digits covers already 60% of all dates.

Cumulative distribution of the dates as a function of the digits' positions

Slightly more dates start at even positions than at odd positions.

More dates start at even positions than at odd positions

We could do the same with mod 3, mod 4, … . The left image shows the deviation of each congruence class from its average value, and the right image shows the higher congruences, all considered again mod 2.

Deviation from congruences from average value and higher congruances

The actual number of first occurrences per year fluctuates around the mean value.

The number of first occurrences per year fluctuates around the mean value

The mean number of first-date interpretations sorted by month clearly shows the difference between the one-digit months and the two-digit months.

The mean number of first-date interpretations sorted by month

The mean number by day of the month (ranging from 1 to 31) is, on average, a slowly increasing function.

The mean number by day of the month

Finally, here are the mean occurrences by weekday. Most first date occurrences happen for dates that are Wednesdays.

The mean occurrences by weekday

Above I observed that most numbers participate in a possible date interpretation. Only relatively few numbers participate in a first-occurring date interpretation: 121,470.

Few numbers participate in a first-occurring date interpretation

Some of the position sequences overlap anyway, and I can form network chains of the dates with overlapping digit sequences.

Network chains of the dates with overlapping digit sequences

The next graphic shows the increasing gap sizes between consecutive dates.

Increasing gap sizes between consecutive dates

Distribution of the gap sizes:

Distribution of the gap sizes

Here are pairs of consecutively occurring date-interpretations that have the largest gap between them. The larger gaps were clearly visible in the penultimate graphic.

Pairs of consecutively occurring date-interpretations that have the largest gap between them

Dates in other expansions and in other constants

Now, the very special dates are the ones where the concatenated continued fraction (cf) expansion position agrees with the decimal expansion position. By concatenated continued fraction expansion, I mean the digits on the left at each level of the following continued fraction.

Concatenated continued fraction expansion

This gives the following cf-pi string:

Cf-pi string

And, interestingly, there is just one such date.

One date in cf-pi string

None of the calculations carried out so far were special to the digits in pi. The digits of any other irrational numbers (or even sufficiently long rational numbers) contain date interpretations. Running some overnight searches, it is straightforward to find many numeric expressions that contain the dates of this year (2015). Here they are put together in an interactive demonstration.

We now come to the end of our musings. As a last example, let’s interpret digit positions as seconds after this year’s pi-time at March 14 9:26:53. How long would I have to wait until seeing the digit sequence 3·1·4·1·5 in the decimal expansion of other constants? Can one find a (small) expression such that 3·1·4·1·5 does not occur in the first million digits? (The majority of the elements of the following list ξs are just directly written down random expressions; the last elements were found in a search for expressions that have the digit sequence 3·1·4·1·5 as far out as possible.)

Digit positions as seconds after this year's pi-time

Here are two rational numbers whose decimal expansions contain the digit sequence:

Two rational numbers whose decimal expansions contain the digit sequence

And here are two integers with the starting digit sequence of pi.

Two integers with the starting digit sequence of pi

Using the neat new function TimelinePlot that Brett Champion described in his last blog post, I can easily show how long I would have to wait.

Using TimelinePlot with pi

We encourage readers to explore the dates in the digits of pi more, or replace pi with another constant (for instance, Euler’s constant E, to justify the title of this post), and maybe even 10 by another base. The overall, qualitative structures will be the same for almost all irrational numbers. (For a change, try ChampernowneNumber[10].) Will ten million digits be enough to find every date in, say, E (where is October 21, 2014?) Which special dates are hidden in other constants? These and many more things are left to explore.

Download this post as a Computable Document Format (CDF) file.

Posted in: Mathematics
Leave a Comment

5 Comments


Lou

Great post!

Posted by Lou    June 24, 2015 at 7:54 am
Mark

The alternative is not a “European date format” but rather a “rest-of-the-world date format”. Only the U.S. uses out-of-scale (i.e. day and month transposed) date sequences. For an international product you might take more care with this in future.

Posted by Mark    June 26, 2015 at 7:37 am
Doug Lohre

I only VERY quickly scanned this whole article, focusing on a very few random areas of it. It’s really interesting, and want to read it more carefully later! Oddly, one of the FEW areas I focused on had one possible typing error – unless I mis-understood something in that place in the article. It’s where this sentence appears : “And at more than 100 positions within the first ten million digits of pi, I find the famous pi starting sequence 3,1,4,5,9 again.” Isn’t a ’1′ skipped in the 4th position of pi in the number sequence toward the end of the sentence? Or is there some connection with the rest of the article I’m not making by not reading the whole thing?

Posted by Doug Lohre    July 7, 2015 at 3:51 pm
    The Wolfram Team

    Thanks for pointing this out, we have corrected the typo.

    Posted by The Wolfram Team    July 8, 2015 at 2:56 pm
      Doug Lohre

      You’re welcome! haha – Was so lucky to see that, as I only focused on very few areas of the article while very quickly scanning the rest!

      Posted by Doug Lohre    July 13, 2015 at 10:51 am


Leave a comment

Loading...

Or continue as a guest (your comment will be held for moderation):