Spring is here, finally, and everyone around here is tired of snow this year! Some of the hardier flowers are up already, such as daffodils and hyacinths. So, naturally, I started thinking about when I could put in the more delicate annuals, or even my tomatoes. I don’t want them to be bitten by a late frost (we had one the other day!). And in the autumn, we want to know how late we can harvest before a frost might damage the produce.

Well, I could consult *The Old Farmer’s Almanac* for the last frost date, but how accurate is it for my specific locale? What about the variability? Might there be a trend to earlier dates due to global warming? To answer these questions, I need historical temperature data. The Wolfram Language has weather data available, so maybe I could do a little data mining and come up with our own planting chart, and you could for your town, too.

Let’s begin by defining where I’m located, using free-form input.

And now, let’s get the temperature data. There are several kinds available, but some of them may not be available everywhere. Since we’re interested in temperatures that go below freezing, we’ll use “`MinTemperature`”.

So, what does the temperature data look like?

There is a gap in the data in the late 1940s, but that should not present a problem.

Let’s pick a year at random and look at it in detail, just to get a feel for the data. Using `Manipulate`, we can pick out individual days and read off the minimum temperature.

We can readily observe the annual variation, and also see that it’s not perfectly sinusoidal: the low temperature for the months of January through April seems to be flat. There is also a good deal of day-to-day variation. Using the slider, we find that the last frost date in the spring of 1990 was April 19 and the first frost date in the autumn was October 27.

As we saw above, the minimum temperature data is a time series, actually a `TemporalData` object. Many functions in the Wolfram Language know how to handle these objects automatically, and we will be making use of them rather than converting the data to a list of {date,temp} pairs using `Normal`. `TemporalData` objects can also act as functions so that you can get a property of the object with the idiom `temporalDataObject[property]`, where property is something like `"FirstDate"`. A very useful function for time series is `MovingMap`.

Let’s “fold” the data back onto itself on an annual basis and plot it. To do this, we’ll convert the long time series into a list of year-long time series. If there is no data for a given year (remember the gap mentioned above?), then `Missing[]` will be returned by `extractYear`.

To get the individual time series to plot over the same time interval, we can convert the actual dates to the number of seconds from the beginning of that year. A second argument is included in the function so that we can start the year in any month, which is convenient for working with data from a location south of the equator. Note that we’re not doing anything special when February 29 is present in leap years.

This plot shows that there is wider variation in the daily low temperatures during the winter than in the summer, and that the lowest low occurs near the end of January and the highest low occurs near the end of July. The frost-free period is early May to early October.

Now, to get the last frost date each spring and the first frost date each autumn, we need to work with yearly windows using `MovingMap`. We scan the window over each year, take the dates where the minimum temperature is less than zero, and take the last or first occurrence for the spring or autumn, respectively, returning both the date and its lowest temperature. We want to be able to use the function for data from a location south of the equator where spring and autumn are reversed, so the season is specified as “`early`” or “`late`”.

We define a function to convert time in seconds since the start of the year to time in days since the start of the year:

Wow, that’s quite a spread for each histogram, more than two months. We can get the earliest and latest dates on an annual basis by using the `yearAbsoluteTime` function when sorting the dates, and then taking the first and last dates in each list.

Since the latest date we’ve had frost in the spring is May 11, we could naively assume that anytime after that would be OK for planting. But mother nature is not that simple, and we cannot be 100% sure of the latest date for a spring frost. We need to work with probabilities—that is, model the dates as a distribution and then pick the 95th quantile if we want to be wrong 5% of the time, or the 99th quantile if we can tolerate only 1% error. The spring data is roughly symmetrical (the same shape on the left as on the right), so a normal distribution might be a good first estimate for a model. However, our data is too “pointy” in the middle for that, so we’ll use a smooth kernel distribution instead.

If I’m willing to be wrong half the time, then I use the dates from the first row where the probability is 0.5; if I want to be wrong only once in twenty years, then I use the row for 0.05 probability. I still don’t know which year I’ll be wrong with this model, but that’s the nature of probabilities.

The *Almanac* predicts April 24 and October 15 for the median (50% probability) last and first frost dates, respectively, which are closer to the 10% probability of our model. Perhaps they have a longer set of data from which they have drawn their conclusions, or they included a two-week buffer for good measure.

How do our last and first frost dates look over time? Is there a trend due to climate change? We can show a moving average and standard deviation superimposed on our data by using `MovingMap`. Here are the spring observations:

And here are the autumn observations:

There seems to be a trend, especially after 1990, to earlier last frost dates in the spring and later first frost dates in the autumn, but we really need much more data to say that with confidence.

Well, we’ve answered our questions for Trenton, New Jersey. We’ve mined the temperature data from the curated weather data in the Wolfram Language, visualized it, built a model, made predictions with the model, compared the predictions to those in *The Old Farmer’s Almanac*, and looked for temporal trends in the data.

Here are some examples for other cities with a shorter growing season (Calgary, Alberta) or located south of the equator (Christchurch, New Zealand), where spring and autumn are reversed. More sophisticated models for predicting the first and last frost dates could be tried, for example, by using machine learning with the previous two months of low and high temperatures.

So, now it’s your turn. Plug in your location and see when it’s safe to plant. You could also compute the growing season from this data or look for temporal patterns (a `Periodogram` might be revealing). Are there other weather aphorisms you could test with the Wolfram Language and its curated data?

Download this post as a Computable Document Format (CDF) file.

## 8 Comments

Great post! It is also useful for finding out when to change the summer/winter tires. Unfortunately I am having problems with this part of the code:

Quantity[1, "Events"],

I get this error:

Quantity::unkunit: Unable to interpret unit specification Events. >>

I am using Mathematica 10.1

Replacing Quantity[1,"Events"] with a 1 solves the problem. The example in MovingMap’s documentation also uses the “Events” quantity and also fails.

I spoke with one of our developers in the statistics group. His suggestion is to use Quantity[1, IndependentUnit["Events"]]. A new CDF will be posted soon with this change.

I had some difficulties executing the code in this post using version 10.0.2.0…

The first Manipulate command seems to corrupt the original minTemps time series object for some reason which I cannot establish.

The time series properties have “FirstTime”, not “FirstDate” and “LastTime”, not “LastDate”.

Quantity[1,"Events"] yields a Quantity::unkunit error (i.e. unknown unit) on my installation.

Looks like it would be a fun post otherwise. :-(

Thanks for your comment, we are working to fix this ASAP.

For your first observation, the SaveDefinitions -> True option does cause this side effect when the cell with Dynamic content is in focus.

Our statistics developer also told me that “FirstDate” and “LastDate” are new properties in 10.1.0, so that explains why you had difficulties using 10.0.2.

Please see my reply to Gustavo above about the change for Quantity.

I agree with the previous posts….appears to be a very interesting project, but for the pothole in the “First Date” / “Last Date” section

Greta post! It is amazing to see what kind of cool stuffs can be done with Mathematica