We all know Wolfram|Alpha is great for solving calculations and math problems, but not everyone knows about the full breadth of useful data it provides. I entered college as a biology major and was quickly overwhelmed with the amount of information I had to memorize. Class lectures moved at a fast pace, and often my notes had gaps in them where I hadn’t finished writing down what the professor was saying before she moved on. I was up late at night making flashcards for tests and searching desperately through Yahoo! Answers, trying to find information like what exactly the alimentary system does (hint: it “functions in food ingestion and digestion; absorption of water and nutrients; secretion of water, acids, enzymes, buffers and salts; waste excretion; and energy storage”—thanks, Wolfram|Alpha!).

Thinking back on those late-night study sessions, I would have saved a lot of time if I had properly used Wolfram|Alpha as a study tool. Because I was a biology major, many of the areas in which I most frequently sought information were related to scientific fields such as chemistry, but Wolfram|Alpha can be a valuable resource in so many more areas. Here are 15 applications of Wolfram|Alpha in topics beyond mathematics. I hope you will find these to be useful both inside and outside the classroom!

Stoichiometry was for me one of the hardest parts of chemistry classes, and I’m sure I’m not alone. Thankfully, Wolfram|Alpha can also calculate stoichiometric problems—finding the amount of a substance needed to complete a reaction as well as the theoretical yields. To calculate a reaction’s stoichiometry, enter the reaction as well as the amounts you know.

For example, let’s use the reaction B_{2}H_{6} + O_{2} → B_{2}O_{3} + H_{2}O. I have 36.1g B_{2}H_{6} and I need to calculate how many grams of O_{2} would be necessary to burn all 36.1g of B_{2}H_{6}. In this case, I just enter the following:

36.1 g B2H6 + O2 -> B2O3 + H2O

Wolfram|Alpha balances the equation and calculates the answer, which is displayed conveniently in a table. From this, I now know that I would need 3.914 moles of O_{2}, or 125.2 grams, to burn all of the B_{2}H_{6}:

A great place to find basic information about a significant historical figure or event is Wolfram|Alpha. One thing I find especially helpful is information about specific battles or conflicts. Often, when reading about these events, the scale of them seems so large that I get lost and can’t fully comprehend the important details.

However, Wolfram|Alpha provides only the most important information in easy-to-understand charts that compare both sides, so I’m able to get the full picture succinctly, without spending hours combing through long Wikipedia articles.

For example, I queried “Spanish Conquest of Peru”, the decades-long war between the Kingdom of Spain and the Inca Empire in the 16th century.

Wolfram|Alpha first provides information such as the dates and location, then delves into more specific data, such as the two factions in the conflict, the commanders of both factions, the casualties and losses of both sides, and the major outcomes of the battle.

With this, I can easily understand the basics of what happened without getting bogged down by details.

Taxonomical information for all manner of organisms is available in Wolfram|Alpha. You can type in a specific organism to find information about it, or you can try a whole species, family, genus or more.

If I enter in something broad and basic, for example “cnidaria”, I come up with a broad range of information, including biological properties, metabolism, other members of the kingdom and phylum, and more.

However, if I don’t want to go through all that data, I can narrow the scope. For example, if I specifically query “planarian taxonomy”, I get a table with the kingdom, phylum, class, order and family all classified:

The breadth of biological data Wolfram|Alpha has available isn’t limited simply to animals. In addition to taxonomical and basic information, Wolfram|Alpha can generate growth curves for plants for specific locations and compare them to the American national average. For example, if I query “maple tree growth curve in Illinois”, I get a graph of the growth curve of maple trees in Illinois, a graph of the growth curve for the entire USA and a chart detailing the taxonomical information of a maple tree:

Wolfram|Alpha can balance chemical equations as well. For example, I used the query “C4H6O3+H2O -> C2H4O2”. Wolfram|Alpha first shows you the answer, then the structure of the equation. What’s particularly helpful about this is you can toggle the type of structure you want to look at—you can switch from a skeletal structure to a Lewis structure, or display all the atoms.

Scroll down further and you’ll find Wolfram|Alpha displays the chemical names and formulas of the components involved, as well as other relevant information such as their molar masses, densities, melting points and more.

Similarly, converting mass or volume to moles and vice versa with Wolfram|Alpha is really simple. Just type in the amount of a substance in moles or mass along with the unit to which you want it converted (for example, 7 moles of acetic acid to kilograms), and Wolfram|Alpha computes the answer along with other possible unit conversions and corresponding quantities:

Wolfram|Alpha has extensive anatomical and medical information available freely at your fingertips. For example, if you query “human heart arteries”, the results interpret this query correctly as the left and right coronary arteries, then display their Latin names, alternate names and a brief description of the function of each artery:

If you scroll further down, models of the morphologies, regional locations and body locations are also displayed:

Beneath that are the physical characteristics, constitutional parts and connections (how the arteries function as part of the circulatory system):

You can also contrast and compare more specific aspects of two different structures—for example, “function of gallbladder vs liver”:

The range of information available isn’t limited to only humans. All creatures great and small are represented in a similar way as the human data. Try the query “horse femur”, for example. The first thing displayed is the hierarchical relationships of the left and right femurs, followed by models of the morphology, regional location and body location:

Finally, even the articulations and homologous structures are shown:

I found that you can even analyze a gene with Wolfram|Alpha—and not just from the human genome. Wolfram|Alpha has data from the fully sequenced genomes of the mouse, the fruit fly and more.

Let’s take a look at the results for the query “human gene gfra1”. First, Wolfram|Alpha interprets the query, then provides the standard and alternate names, and then the location of the gene via a table chart as well as a visual representation of its location on chromosome 10:

Wolfram|Alpha also displays other relevant information such as the reference sequence, other nearby genes, visual representations of gene-splicing structures, protein names and more. It also provides gene ontology characteristics such as functions and processes.

Apart from genes, information on proteins and biomolecules is also available. In the case of proteins, such as hemoglobin, Wolfram|Alpha provides the sequence, molecular weight and models of the ribbon and atomic structures. For biomolecules like serotonin, the chemical names and formula are displayed, as well as a structure diagram:

If you’re working with atomic spectra, Wolfram|Alpha is a great resource to have. If you enter in “atomic spectrum of” plus whatever element you’re working with, Wolfram|Alpha will display a visualization of the atomic spectrum and the visible regions. It also includes the spectral lines. What’s particularly handy is that for the visualization of the atomic spectrum, you can toggle among wavelength, frequency and energy. Next to the toggle is the option to show or hide additional views, like oscillator strength and color.

When not bogged down by schoolwork (which is admittedly rare), I like to read and write, and have found Wolfram|Alpha to be a surprisingly helpful resource. In addition to providing definitions and translations of words, Wolfram|Alpha can give synonyms, antonyms, words that rhyme with a specific word and more.

The main reason I prefer using Wolfram|Alpha to find definitions rather than the more popular dictionary or thesaurus sites is because there’s so much more interesting information provided on Wolfram|Alpha. For example, if I query the word “fugue”, I get not only three different definitions, but also the word’s first known usage in English, its roots, a timeline of its usage, a list of translations into other languages, a list of broader terms with the same or similar meanings and more.

You can also discover synonyms or antonyms of a word with Wolfram|Alpha. If I type in the query “incite synonym”, Wolfram|Alpha generates a list of synonyms, and also provides definitions below.

In addition to synonyms and antonyms, Wolfram|Alpha can also list words that rhyme with a certain word. For example, if I enter in “words that rhyme with sublime”, I get a list of words such as lime, dime, prime, etc. This feature could be especially useful to poets, songwriters and other creative writers.

Another feature writers and readers alike can appreciate is that Wolfram|Alpha can provide relevant quantitative data relative to the number or words or pages of a document. For example, if I type in “12,000 words”, Wolfram|Alpha can tell me how many pages that document would be; how long it would take to read, write or speak it; and more.

Got a secret message you need to pass on? Wolfram|Alpha can help with that too. Apart from translating words and phrases from one language to another, Wolfram|Alpha can also convert words and phrases into Morse code. Just type what you want to translate and let Wolfram|Alpha do the rest.

When working with kinematics, Wolfram|Alpha is a great tool to have in your figurative belt. If you’re doing a two-body collision problem, for example an elastic collision, Wolfram|Alpha can compute the final speeds of the two bodies as well as generate a useful schematic that visualizes the collision. Simply type in “elastic collision” plus the values for the initial mass and velocities of the two bodies and press Enter. The first thing Wolfram|Alpha does is interpret the input information, displaying it in a chart. It then calculates the result for the two final speeds, then generates a schematic visualizing the collision.

In addition to having detailed information on brain anatomy, as in previously shown examples, Wolfram|Alpha also carries a good deal of information on cognitive tasks and related areas such as memory, language, attention, reasoning and perception tests.

You can also find information on individual types of neurons in the nervous system. Along with the location and description of a neuron, Wolfram|Alpha also includes useful data such as its synaptic properties and electrophysiological measurements:

Similarly, you can also look for brain areas by criteria like their functions. For example, querying “brain area associated with memory” brings up a list of associated brain areas, with each result listed linking to a separate page for that specific brain area.

Researching country-specific socioeconomic data is very simple. Type in what you want to know, followed by the country. If you are comparing several countries, separate the names with a comma. For example, try querying “GDP of Mozambique, Namibia, Mali” or like in this example, “unemployment rate Vietnam, Cambodia, Laos”.

The first thing Wolfram|Alpha displays is the interpretation of your input, followed by the direct results:

It then displays other relevant information, such as the unemployment rates for the three countries over time, the long-term unemployment rate (separated by sex), the unemployment rate by education, labor force by education and more:

If you want to check the data associated with a particular region of the US, Wolfram|Alpha can be especially helpful. You can find data like income, for example the highest per capita income by county:

Wolfram|Alpha has a great tool for calculating titrations. It can compute for the volume of base, volume of acid, base concentration or acid concentration. All you need to do is plug in the corresponding variables, such as volume or concentration of base, etc. Wolfram|Alpha then computes the answer.

Wolfram|Alpha can be a helpful resource for looking up and comparing equations. For example, if you enter in “particle in a box”, a model fundamental to the field of quantum mechanics, the input results first show an interpretation of your query, followed by all the relevant equations for both classic and quantum mechanics. Wolfram|Alpha also displays a diagram of the potential energy.

Hybridization, or the mixing of atomic orbitals on an atomic center, is pretty hard to get the hang of. Wolfram|Alpha comes through as a great resource in this regard. Simply enter the name of the chemical compound plus “hybridization” and Wolfram|Alpha will display the hybridization on each atom in the molecule, the electron count and a diagram with the orbitals labeled:

Similarly, Wolfram|Alpha also has information about each of the bonds in a compound. If you type in the compound name plus “bonds” or “bond information”, Wolfram|Alpha will provide you with a convenient chart with each of the bonds, their bond count, bond length and energy in kJ/mol (although you can adjust the units if need be):

Calculating thermodynamics problems such as isothermal processes is made simple with Wolfram|Alpha. If you query “isothermal process”, a calculator tool with input tools for initial pressure, initial volume, initial temperature and final volume appears. Input the values corresponding to your needed calculation, press Compute and a table with the results for final pressure, entropy change, heat transferred to the system, final temperature and work done on the system is generated. Wolfram|Alpha also generates graphs of pressure vs. volume and temperature vs. entropy.

While I found these 15 applications of Wolfram|Alpha technology useful, my list only scratches the surface; Wolfram|Alpha has so much information and functionality to offer on subjects ranging from radioactive decay to classical mythology, to comparing nutrition information in a meal, calculating Bose–Einstein distributions, estimating how many of an object would fill a container, computing the luminosity of a star and many more. Whether you’re a student preparing for an upcoming exam or just a curious individual interested in researching a particular topic (perhaps the physics of running in the rain!), I highly recommend Wolfram|Alpha as an informational resource. It has a lot more to offer than just math-related information and solutions!

]]>The sparse ruler problem has been famously worked on by Paul Erdős, Marcel J. E. Golay, John Leech, Alfréd Rényi, László Rédei and Solomon W. Golomb, among many others. The problem is this: what is the smallest subset of so that the unsigned pairwise differences of give all values from 1 to ? One way to look at this is to imagine a blank yardstick. At what positions on the yardstick would you add 10 marks, so that you can measure any number of inches up to 36?

Another simple example is of size 3, which has differences , and . The sets of size 2 have only one difference. The minimal subset is not unique; the differences of also give .

Part of what makes the sparse ruler problem so compelling is its embodiment in an object inside every schoolchild’s desk—and its enduring appeal lies in its deceptive simplicity. Read on to see precisely just how complicated rulers, marks and recipes can be.

First, let’s review the rules and terminology used in the sparse ruler problem. A subset of a set *covers* if .

For example, what is the smallest subset of that covers the set ? The greatest number of differences for a subset of size 5 is , which is not enough to get 13 values. But a subset of size 6, with differences, is large enough. In this case, the subset covers , and so the size of the smallest covering subset of is at most 6.

Here are the differences using only :

✕
{1, 13, 9, 13, 6, 6, 13, 9, 9, 11, 11, 13, 13} - {0, 11, 6, 9, 1, 0, 6, 1, 0, 1, 0, 1, 0} |

Of the 15 differences, two are achieved twice: and . Here is a way to list the pairs explicitly:

✕
Column[SplitBy[ SortBy[Subsets[{0, 1, 6, 9, 11, 13}, {2}], Differences], Differences]] |

Let’s try another way to calculate the set of differences:

✕
Union@Abs@ Flatten@Outer[Subtract, {0, 1, 6, 9, 11, 13}, {0, 1, 6, 9, 11, 13}] |

Of the subsets that cover , let be the size of a smallest subset (there may be more than one).

The following table summarizes the values of for . Both and of size 3 cover ; note that after sorting:

✕
Text@Grid[Prepend[Table[With[{ruler = SplitToRuler[sparsedata[[n]]]}, {ruler, Row[{"[", n, "]"}], Length[ruler]}], {n, 1, 12}], {"a smallest\n subset\n", "differences [n]", Row[{" the smallest\nsubset size ", Style[Subscript["M", "n"], Italic], "\n"}]}]] |

In 1956, John Leech wrote “On the Representation of 1, 2, …, *n* by Differences,” which proved the bounds .

There are a few terms and “rules” to keep in mind when discussing the sparse ruler problem:

- A subset of containing 0 and is called an -length
*ruler*, and its elements are called*marks*. (The length can be dropped when it is understood.) - An -length ruler is
*complete*if the distances between marks cover . - A
*sparse*ruler (or*minimal complete*ruler) is a length- complete ruler such that no length- ruler with fewer marks exists. An example of a sparse ruler is . - An
*optimal*ruler is a sparse ruler where there is no longer sparse ruler with the same number of marks. An example is . No longer sparse ruler with five marks exists. - A
*perfect*ruler is a sparse ruler where a longer sparse ruler with fewer marks*does not*exist. All sparse rulers of length less than 135 are perfect. - A
*nonperfect*ruler is a sparse ruler where a longer sparse ruler with fewer marks*does*exist.

This length-135 sparse ruler is nonperfect:

✕
Length@{0, 1, 2, 3, 4, 5, 6, 65, 68, 71, 74, 81, 88, 95, 102, 109, 116, 123, 127, 131, 135} |

This length-138 sparse ruler is optimal:

✕
Length@{0, 1, 2, 3, 7, 14, 21, 28, 43, 58, 73, 88, 103, 111, 119, 127, 135, 136, 137, 138} |

Here is an optimal length-50 sparse ruler with 12 marks (i.e. ). The list of positions of the marks is the ruler form:

✕
ruler50 = {0, 1, 3, 6, 13, 20, 27, 34, 41, 45, 49, 50}; |

This visualizes the marks:

✕
Graphics[{ Thickness[.005], Line[{{#, 1}, {#, 1.5}}] & /@ Range@50, Line[{{#, 1}, {#, 5}}] & /@ ruler50 }, Axes -> {True, False}, Ticks -> {ruler50, None}, ImageSize -> 520] |

Let the differences between the marks be the *diff* form. Here is the diff form for `ruler50`:

✕
Differences[ruler50] |

In 1963, B. Wichmann wrote “A Note on Restricted Difference Bases,” in which he constructed many sparse rulers. The following code has his original *recipe* and a function to read the recipe:

✕
originalwichmannrecipe = { {1, 1 + r, 1 + 2 r, 3 + 4 r, 2 + 2 r, 1}, {r, 1, r, s, 1 + r, r}}; |

✕
WichmannRuler[recipe_, {x_, y_}] := Transpose[ Select[Transpose[recipe /. Thread[{r, s} -> {x, y}]], Min[#] > 0 &]] |

With that, we can set up function for Wichmann recipe #1:

✕
Subscript[W, 1][r_, s_] := WichmannRuler[originalwichmannrecipe, {r, s}]; |

There are thousands of Wichmann recipes. Here’s the second:

✕
WichmannRecipes[[2]] |

Here’s a function for Wichmann recipe #2:

✕
Subscript[W, 2][r_, s_] := WichmannRuler[WichmannRecipes[[2]], {r, s}]; |

Here, and in the recipes are replaced by 1 and 5, respectively. These representations are examples of the *split* form of a sparse ruler:

✕
Column[{Subscript[W, 1][1, 5], Subscript[W, 2][1, 5]}] |

We can use these functions to convert among the three forms of sparse ruler:

✕
DiffToRuler[diff_] := FoldList[Plus, 0, diff] |

✕
DiffToSplit[diff_] := {First /@ Split[diff], Length /@ Split[diff]} |

✕
SplitToDiff[split_] := Flatten[Table[#[[1]], {#[[2]]}] & /@ Transpose[split]] |

✕
SplitToRuler[split_] := DiffToRuler[SplitToDiff[split]] |

✕
RulerToSplit[ruler_] := DiffToSplit[Differences[ruler]] |

Here are the diff forms for both and `ruler50` from above; we can see from their identical outputs that they are in fact the same ruler:

✕
SplitToDiff[Subscript[W, 1][1, 5]] |

✕
Differences[ruler50] |

The diff form can be used to remake the ruler:

✕
DiffToRuler[%] |

Here is the split form again:

✕
Subscript[W, 1][1, 5] |

The split form can be written compactly and compared to Wichmann’s recipe with :

✕
TraditionalForm@ Grid[{HoldForm[#1^#2] & @@@ First@*Tally /@ Split@Differences[ruler50], HoldForm[#1^#2] & @@@ Transpose[originalwichmannrecipe]}, Frame -> All] |

This Wichmann ruler is one of an infinite list of Wichmann rulers. The length-57 sparse rulers show two examples for :

✕
Text@Grid[{{"length", "marks", "recipe", Style["r", Italic], Style["s", Italic]}, {50, 12, "\!\(\*SuperscriptBox[\(1\), \(1\)]\) \!\(\*SuperscriptBox[\(2\), \ \(1\)]\) \!\(\*SuperscriptBox[\(3\), \(1\)]\) \ \!\(\*SuperscriptBox[\(7\), \(5\)]\) \!\(\*SuperscriptBox[\(4\), \ \(2\)]\) \!\(\*SuperscriptBox[\(1\), \(1\)]\)", 1, 5}, {57, 13, "\!\(\*SuperscriptBox[\(1\), \(1\)]\) \!\(\*SuperscriptBox[\(2\), \ \(1\)]\) \!\(\*SuperscriptBox[\(3\), \(1\)]\) \ \!\(\*SuperscriptBox[\(7\), \(6\)]\) \!\(\*SuperscriptBox[\(4\), \ \(2\)]\) \!\(\*SuperscriptBox[\(1\), \(1\)]\)", 1, 6}, {57, 13, "\!\(\*SuperscriptBox[\(1\), \(2\)]\) \!\(\*SuperscriptBox[\(3\), \ \(1\)]\) \!\(\*SuperscriptBox[\(5\), \(2\)]\) \ \!\(\*SuperscriptBox[\(11\), \(2\)]\) \!\(\*SuperscriptBox[\(6\), \(3\ \)]\) \!\(\*SuperscriptBox[\(1\), \(2\)]\)", 2, 2}, {90, 16, "\!\(\*SuperscriptBox[\(1\), \(2\)]\) \!\(\*SuperscriptBox[\(3\), \ \(1\)]\) \!\(\*SuperscriptBox[\(5\), \(2\)]\) \ \!\(\*SuperscriptBox[\(11\), \(5\)]\) \!\(\*SuperscriptBox[\(6\), \(3\ \)]\) \!\(\*SuperscriptBox[\(1\), \(2\)]\)", 2, 5}, {93, 17, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(2\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 2}, {101, 17, "\!\(\*SuperscriptBox[\(1\), \(2\)]\) \!\(\*SuperscriptBox[\(3\), \ \(1\)]\) \!\(\*SuperscriptBox[\(5\), \(2\)]\) \ \!\(\*SuperscriptBox[\(11\), \(6\)]\) \!\(\*SuperscriptBox[\(6\), \(3\ \)]\) \!\(\*SuperscriptBox[\(1\), \(2\)]\)", 2, 6}, {108, 18, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(3\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 3}, {112, 18, "\!\(\*SuperscriptBox[\(1\), \(2\)]\) \!\(\*SuperscriptBox[\(3\), \ \(1\)]\) \!\(\*SuperscriptBox[\(5\), \(2\)]\) \ \!\(\*SuperscriptBox[\(11\), \(7\)]\) \!\(\*SuperscriptBox[\(6\), \(3\ \)]\) \!\(\*SuperscriptBox[\(1\), \(2\)]\)", 2, 7}, {123, 19, "\!\(\*SuperscriptBox[\(1\), \(2\)]\) \!\(\*SuperscriptBox[\(3\), \ \(1\)]\) \!\(\*SuperscriptBox[\(5\), \(2\)]\) \ \!\(\*SuperscriptBox[\(11\), \(8\)]\) \!\(\*SuperscriptBox[\(6\), \(3\ \)]\) \!\(\*SuperscriptBox[\(1\), \(2\)]\)", 2, 8}, {123, 19, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(4\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 4}, {138, 20, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(5\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 5}, {153, 21, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(6\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 6}, {168, 22, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(7\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 7}, {183, 23, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(8\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 8}, {198, 24, "\!\(\*SuperscriptBox[\(1\), \(3\)]\) \!\(\*SuperscriptBox[\(4\), \ \(1\)]\) \!\(\*SuperscriptBox[\(7\), \(3\)]\) \ \!\(\*SuperscriptBox[\(15\), \(9\)]\) \!\(\*SuperscriptBox[\(8\), \(4\ \)]\) \!\(\*SuperscriptBox[\(1\), \(3\)]\)", 3, 9}, {213, 25, "\!\(\*SuperscriptBox[\(1\), \(4\)]\) \!\(\*SuperscriptBox[\(5\), \ \(1\)]\) \!\(\*SuperscriptBox[\(9\), \(4\)]\) \ \!\(\*SuperscriptBox[\(19\), \(6\)]\) \!\(\*SuperscriptBox[\(10\), \ \(5\)]\) \!\(\*SuperscriptBox[\(1\), \(4\)]\)", 4, 6}}] |

Next is the length-58 optimal ruler showing that . Using brute force, is provable. In 2011, Peter Luschny conjectured that the optimal ruler is the largest optimal ruler that does not use Wichmann’s recipe.

✕
Text@Grid[Transpose[{{"split", "diff", "ruler", "\!\(\* StyleBox[SubscriptBox[\"M\", \"n\"],\nFontSlant->\"Italic\"]\)"}, {#, SplitToDiff[#], SplitToRuler[#], Length[SplitToRuler[#]]} &@sparsedata[[58]]}], Frame -> All] |

In 2014, Arch D. Robison wrote “Parallel Computation of Sparse Rulers,” where months of computer time was spent on 256 Intel cores to calculate 106,535 sparse rulers up to length 213. Part of this run proved the existence of a length-135 nonperfect ruler.

So while we have identified all the sparse rulers up to length 213, we only have candidates beyond length 213. For the rest of this blog post, “conjectured sparse ruler” means a complete ruler with length greater than 213 and the minimal known number of marks. Above length 213, no sparse rulers have been proven minimal. Length 214 has the first conjectured sparse ruler:

✕
Text@Grid[{{"minimal?", "length", "marks", "compact split form"}, {"proven", 213, 25, "\!\(\*SuperscriptBox[\(1\), \(4\)]\) \!\(\*SuperscriptBox[\(5\), \ \(1\)]\) \!\(\*SuperscriptBox[\(9\), \(4\)]\) \ \!\(\*SuperscriptBox[\(19\), \(6\)]\) \!\(\*SuperscriptBox[\(10\), \ \(5\)]\) \!\(\*SuperscriptBox[\(1\), \(4\)]\)"}, {"conjectured", 214, 26, "\!\(\*SuperscriptBox[\(1\), \(5\)]\) \ \!\(\*SuperscriptBox[\(5\), \(1\)]\) \!\(\*SuperscriptBox[\(9\), \ \(4\)]\) \!\(\*SuperscriptBox[\(19\), \(6\)]\) \ \!\(\*SuperscriptBox[\(10\), \(5\)]\) \!\(\*SuperscriptBox[\(1\), \(4\ \)]\)"}}, Frame -> All] |

Robison’s run required 1.5 computer years to verify . Computationally verifying would require 3 computer years using current methods. Adding a single mark doubles the computational difficulty of verifying minimality with currently known methods.

You may have heard of sparse rulers, Golomb rulers and difference sets. How do these relate to each other?

- In a sparse ruler, all differences must be covered, but they can be repeated.
- In a Golomb ruler, differences can be missed, but none can be repeated.
- In a difference set, all modular distances must be covered, and none can be repeated.
- In 1967, Robinson and Bernstein predicted the best 24-mark Golomb ruler.
- In 1984, Atkinson and Hassenklover made predictions of the best Golomb rulers with under 100 marks.
- In 2014, distributed.net proved that the predicted Golomb rulers with 24 to 27 marks were indeed correct using thousands of years of computer time. The Robison run verified that predicted behavior for optimal rulers was correct up to 25 marks.
- In 2016, Rokicki and Dogon made predictions of the best Golomb rulers with under 40000 marks.

In 2019, I devised a formula that expresses the *excess* of a complete ruler in terms of the length and the number of minimal marks ; here, is the rounding function:

.

For the first 50 lengths, . Then , so .

✕
{12 - Round[Sqrt[3 50 + 9/4]], 13 - Round[Sqrt[3 51 + 9/4]]} |

The excess formula produces the exact number of minimal marks for sparse rulers up to length 213, with two lines of code. In the On-Line Encyclopedia of Integer Sequences (OEIS), this list of the number of minimal marks for a sparse ruler is sequence A046693:

✕
A308766[n_] := If[MemberQ[{51, 59, 69, 113, 124, 125, 135, 136, 139, 149, 150, 151, 164, 165, 166, 179, 180, 181, 195, 196, 199, 209, 210, 211}, n], 1, 0]; A046693 = Table[Round[Sqrt[3 n + 9/4]] + A308766[n], {n, 213}] |

Based on the sparse rulers and conjectured sparse rulers to length 2020, the excess seems to be a chaotic sequence of 0s and 1s:

✕
ListPlot[Take[rulerexcess, 2020], Joined -> True, AspectRatio -> 1/30, Axes -> False, ImageSize -> 520] |

If Luschny’s conjecture is correct, then the lowest possible excess is 0 and all conjectured sparse rulers are minimal.

Without rounding, a plot of the best-known number of minimal marks minus shows some distinct patterns up to length 2020. Some points, such as seem to float above and break the pattern, which makes their minimality questionable:

✕
unroundedexcess = Table[{n, Round[Sqrt[3 n + 9/4]] + rulerexcess[[n]] - Sqrt[3 n + 9/4]}, {n, 1, 2020}]; ListPlot[unroundedexcess, AspectRatio -> 1/4, ImageSize -> {520, 130}] |

Here are lengths of currently conjectured sparse rulers that break the pattern:

✕
First /@ Select[unroundedexcess, #[[2]] > 1 &] |

Here is a plot of the verified number of minimal marks to :

✕
ListPlot[A046693, AspectRatio -> 1/4, ImageSize -> 520] |

Robison discovered that the sequence is not strictly increasing, as seen by the dips. Where do these dips occur?

✕
Flatten[Position[Differences[A046693], -1]] |

How are they spaced?

✕
Differences[%] |

In the previous table, the last six listed Wichmann rulers had these lengths:

✕
{138, 153, 168, 183, 198, 213}; |

These coincide with the positions of the dips:

✕
{136, 151, 166, 181, 196, 211} + 2 |

We can plot and compare Leech’s bounds for the number of minimal marks to the actual number of minimal marks:

✕
ListPlot[{Table[Sqrt[2.434 n], {n, 1, 213}], Table[A046693[[n]], {n, 1, 213}], Table[Sqrt[3.348 n], {n, 1, 213}]}, AspectRatio -> 1/5] |

The furthest values in the lines of dots are almost always lengths of optimal Wichmann rulers, with the last known exception being . We saw that some of the lengths of optimal Wichmann rulers were . Let us call these *Wichmann values*. These lengths (A289761) are given by:

✕
WichmannValues = Table[(n^2 - (Mod[n, 6] - 3)^2)/3 + n, {n, 1, 24}] |

Here I arrange numbers to 213 so that the bottom of each column is a Wichmann value. Under the blue line is the number of marks associated with the column. This is a numeric representation of the *excess pattern*:

values are gray.

values are bold black.

✕
Grid[Append[ Transpose[Table[PadLeft[Take[Style[If[rulerexcess[[#]] == 1, Style[#, Black, Bold, 16], Style[#, Gray, 14]]] & /@ Range[213], {WichmannValues[[n]] + 1, WichmannValues[[n + 1]]}], 15, ""], {n, 24 - 1}]], Range[3, 25]], Spacings -> {.2, .2}, Dividers -> {False, -2 -> Blue}] |

For convenience, I’ll use various terms relating to the excess pattern:

- A
*column*is a column in the excess pattern - is the column for rulers with marks
- The
*height*is the placement within a column; Wichmann values have height 0 - The
*rise*is the number of entries in a column; has 15 entries - The
*excess coordinate*is for length within the excess pattern - The
*excess fraction*(a normalized form) is for length - A
*window*is a contiguous set of rulers within a column - A
*mullion*is a contiguous set of rulers within a column (a space between windows is a mullion)

A few sample values for rulers in columns 19 and 25:

✕
Text@Grid[ Transpose[ Prepend[Flatten[{#, ExcessCoordinates[#]}] & /@ {114, 116, 120, 122, 200, 202, 204, 206, 208, 210, 212, 213}, {"length", "column", "height", "fraction", "rise"}]], Frame -> All] |

Here is the excess pattern of the best-known excess values for lengths up to 10501. is gray, is black. This is a pixel representation of the excess pattern:

✕
ArrayPlot[Transpose[Table[ PadLeft[ First /@ Take[Transpose[{Take[rulerexcess, 10501], Range[10501]}], {WichmannValues[[n]] + 1, WichmannValues[[n + 1]]}], 119, 2], {n, 1, 175}]], ColorRules -> {0 -> LightGray, 1 -> Black, 2 -> White}, PixelConstrained -> 3, Frame -> False] |

The creator of OEIS, N. J. A. Sloane, describes this pattern as “Dark Satanic Mills on a Cloudy Day.” This unique description refers to the solid black part of the pattern with many windows, dark mills and the irregular patches above, the clouds.

We calculate these coordinates for various lengths:

✕
coordinated = Table[ xy = ExcessCoordinates[n]; col = Switch[xy[[2]], 1/ 2, {RGBColor[0, 1, 1], RGBColor[0.5, 0, 0.5]}, 1/ 4, {RGBColor[1, 1, 0], RGBColor[0, 1, 0]}, 3/ 4, {RGBColor[1, 0, 0], RGBColor[0, 0, 1]}, _, {GrayLevel[0.9], GrayLevel[0]} ]; {col[[rulerexcess[[n]] + 1]], xy[[1]], {xy[[1, 1]]/120, 1 - xy[[2]]}, xy[[3]]}, {n, 1, 10501}]; |

Here is the excess pattern of the best-known excess values for lengths up to 10501.

.

Some colors have exact excess fractions: :

✕
Row[{Graphics[{{#[[1]], Rectangle[#[[2]]]} & /@ coordinated}, ImageSize -> {480, 318}, PlotRange -> {{3, 159}, {0, 106}}]}] |

This plots the excess fractions. The excess fractions , and occur in each column; they are the colored horizontal lines:

✕
Graphics[{{#[[1]], Point[#[[3]]]} & /@ coordinated}, ImageSize -> {480, 318}] |

Here’s a version of the excess pattern where the excess fraction 1/2 makes a horizontal line. A crushed version of the normalized excess fraction pattern is shown on the right.

Some colors have exact excess fractions as before: :

✕
Row[{Graphics[{{#[[1]], Rectangle[#[[2]] + {0, 53 - Round[#[[4]]/2]}]} & /@ coordinated}, ImageSize -> {480, 318}, PlotRange -> {{3, 159}, {0, 106}}], Graphics[{AbsolutePointSize[.1], {#[[1]], Point[#[[3]]/{10, 1}]} & /@ coordinated}, ImageSize -> {Automatic, 320}]}] |

For the following diagram, on the left are columns *C*_{68} to *C*_{73} with cells representing lengths 1516 to 1797.

On the right is the normalized excess fraction:

✕
farey = Select[FareySequence[24], MemberQ[{1, 2, 3, 4, 6, 8, 12, 24}, Denominator[#]] &]; took = Take[coordinated, 8479]; Row[{Graphics[{EdgeForm[Black], Table[ Tooltip[{coordinated[[k, 1]], Rectangle[coordinated[[k, 2]]]}, {k, coordinated[[k, 2]], sparsedata[[k]]}], {k, 1516, 1797}], Arrowheads[Medium], Arrow[{{73, 46} + {5, 1/2}, {73, 46} + {1/2, 1/2}}], Text[Row[{"Top of column ", Subscript[Style["C", Italic], "73"], " with 73 marks, length 1751"}], {73, 46} + {5, 1/2}, {Left, Center}], Arrow[{{73, 0} + {5, 1/2}, {73, 0} + {1/2, 1/2}}], Text[Row[{"Bottom of column ", Subscript[Style["C", Italic], "73"], " with 73 marks, length 1797"}], {73, 0} + {5, 1/2}, {Left, Center}], Arrow[{{71, 17} + {5, 1/2}, {71, 17} + {1/2, 1/2}}], Text[Row[{"Window in column ", Subscript[Style["C", Italic], "71"], ", 1686 and 1687"}], {71, 17} + {5, 1/2}, {Left, Center}], Line[{{73, 24} + {3, 1/2}, {73, 24} + {5, 1/2}}], Arrow[{{73, 24} + {3, 1/2}, {73, 27} + {1/2, 1/2}}], Arrow[{{73, 24} + {3, 1/2}, {73, 21} + {1/2, 1/2}}], Text[Row[{"Window in column ", Subscript[Style["C", Italic], "73"], ", 1770 to 1776"}], {73, 24} + {5, 1/2}, {Left, Center}], Line[{{73, 12} + {3, 1/2}, {73, 12} + {5, 1/2}}], Arrow[{{73, 12} + {3, 1/2}, {73, 20} + {1/2, 1/2}}], Arrow[{{73, 12} + {3, 1/2}, {73, 2} + {1/2, 1/2}}], Text[Row[{"Mullion in column ", Subscript[Style["C", Italic], "73"], ", 1777 to 1795"}], {73, 12} + {5, 1/2}, {Left, Center}] }, ImageSize -> {380, 420}], Graphics[{AbsolutePointSize[.01], {#[[1]], Point[#[[3]]/{10, 1}]} & /@ coordinated, Style[ Text[Row[{Numerator[#], "/", Denominator[#]}], {-.03, 1 - #}], 10] & /@ farey}, AspectRatio -> 4, ImageSize -> {115, 420}]}, Alignment -> {Bottom, Bottom}] |

In the normalized excess pattern:

- 0 to 1/4 has mostly excess 0
- 1/4 to 1/2 has a chaotic pattern
- 1/2 to 1 has mostly excess 1

Various sequences from OEIS:

A004137: maximal number of edges in a graceful graph on nodes

A046693: minimal marks for a sparse ruler of length

A103300: number of perfect rulers with length

A289761: maximum length of an optimal Wichmann ruler with marks

A308766: lengths of sparse rulers with excess 1

A309407: round(sqrt(3* + 9/4))

A326499: excess of a length- sparse ruler

You can also check out the “Sparse Rulers” Demonstration, which has thousands of these sparse rulers:

Producing two million sparse rulers required over two thousand Wichmann-like rulers, construction recipes that all work with arbitrarily large values. Substituting and values into a Wichmann recipe is computationally easy:

The excess of a length- sparse ruler with minimal number of marks is .

**Sparse ruler conjecture: E = 0 or 1 for all sparse rulers.**

Finding sparse rulers satisfying for all lengths under 257992 is difficult and likely couldn’t have been done without current-era computers. Finding longer-length sparse rulers turns out to be easy and could have been done back in 1963 with the following simple proof.

is the split form of Wichmann recipe 1, or .

is , , : or in the diff form.

is an *extension*. A sparse ruler starting with 1s in the diff form can be extended by up to 1s with an extra mark at the end. This new ruler looks like . The new lengths above are handled by differences , and . Note that is not a sparse ruler since the length cannot be expressed as a difference.

The “Wichmann Columns” Demonstration generates a column in the excess pattern by using only sparse rulers made by the first two Wichmann recipes, *W*_{1} and *W*_{2}, and extensions of these rulers.

indicates that a sparse ruler cannot be generated by *W*_{1}, *W*_{2} or by extending them.

indicates a generated sparse ruler with excess 0.

indicates a generated sparse ruler with excess 1.

We can see in the following `Manipulate` that length cannot be covered by this method in the excess pattern column representing sparse rulers with 236 marks. Adjust the slider or hover over a value to get a `Tooltip` with the generated sparse ruler:

Red pixels show where extensions don’t solve in the excess pattern:

✕
pixels = Table[ Reverse[PadRight[Switch[#[[2]], RGBColor[0, 0, 1], 1, RGBColor[0, Rational[2, 3], 0], 2, RGBColor[1, 0, 0], 3] & /@ Reverse[First /@ WichmannColumn[k][[1, 1, 1]]], 600]], {k, 2, 895}]; ArrayPlot[Transpose[Drop[pixels, 363]], PixelConstrained -> 1, Frame -> False, ColorRules -> {0 -> White, 1 -> LightGray, 2 -> Gray, 3 -> Red}] |

Lengths of sparse rulers generated by and are generated by order-2 polynomials differing by 1. The behavior of values generated by these polynomials is completely predictable and ultimately generates two weird sequences: `sixsev` and `sixfiv`:

✕
Text@Grid[Prepend[{Subscript[Style["W", Italic], #], WichmannLength[WichmannRecipes[[#]]], WichmannMarks[WichmannRecipes[[#]]]} & /@ {1, 2}, {"recipe", "length", "marks"}], Frame -> All] |

Sequence `sixsev` consists of infinite 6s and 7s. Similarly, the sequence `sixfiv` consists entirely of 6s and 5s:

✕
cutoff = 15; (*raise the cutoff to go farther*) sixsev = Drop[ Flatten[Table[Table[{Table[6, {n}], 7}, {6}], {n, 0, cutoff}]], 1]; sixfiv = Drop[ Flatten[Table[Table[{Table[6, {n}], 5}, {6}], {n, 0, cutoff}]], 2]; |

✕
Column[{Take[sixsev, 80], Take[sixfiv, 80]}] |

What are the values for the and recipe in column 236 with ? What are the Wichmann recipe column zeros (WRCZ)? Code for `WRCZ`, based on `sixsev` and `sixfiv`, is shown in the initialization section in the downloadable notebook. Column 236 in the excess pattern has seven sets of values. The height of a column is roughly (2/3)*column, 159 in this case. The average possible extension is roughly a quarter of the column height:

✕
WRCZ[236] |

In the excess pattern, each column divides into quarter sections with the same size as the extension lengths of and . If we can show that eventually there are at least *four* reasonably spaced and zeros in each column, we’re done:

✕
Row[{Graphics[{{#[[1]], Rectangle[#[[2]]]} & /@ coordinated}, ImageSize -> {480, 318}, PlotRange -> {{3, 159}, {0, 106}}]}] |

The last column in the excess pattern without four reasonably spaced and zeros is column 880:

✕
WRCZ[880] |

Here are the lengths generated by these pairs:

✕
zero880 = (3 + 8 r + 4 r^2 + 3 s + 4 r s) /. {r -> #[[1]], s -> #[[2]]} & /@ WRCZ[880] |

Notice how the generated lengths for this column are palindromic, a worst-case scenario. Length 257992 isn’t covered by the zeros here and is out of reach of the last zero in the previous column, .

The acceleration of change between values generated by is a constant –24. The spacing between zeros is predictable:

✕
Differences[Differences[zero880]] |

Only four reasonably spaced zeros are needed per column. The polynomial inexorably offers more and more zeros. Column 880 is the last column where extensions can fail:

✕
ListPlot[Table[Length[WRCZ[n]], {n, 50, 3000}]] |

Another plot showing that extensions overwhelm the differences:

✕
ListPlot[Table[(WRCZ[k][[1, 1]] + 2) - Max[ Differences[ Union[3 + 8 r + 4 r^2 + 3 s + 4 r s /. Thread[{r, s} -> #] & /@ WRCZ[k]]]] , {k, 50, 2050}], Joined -> True] |

All integer lengths greater than 257992 (corresponding to 880 marks) are excess-01 rulers made by extensions to Wichmann recipe 1.

All integer lengths greater than 119206 (corresponding to 598 marks) are excess-01 rulers made by extensions and double extensions to Wichmann recipe 1. Here’s an example double extension that covers length 257992:

✕
SplitExtensions[ Last[SplitExtensions[ WichmannRuler[WichmannRecipes[[1]], {146, 292}]]]][[6]] |

✕
Dot @@ % |

We can programmatically verify the conjecture with precalculated rulers to length 2020, or to length 257992 with more running time. This tallies the number of 0s and 1s for the excess, up to length 2020:

✕
Tally[Sparseness[SplitToRuler[#]] & /@ Take[sparsedata, 2020]] |

I knew Robison found rulers to length 213, so I wanted to show samples. But except for the counts, all the ruler data was lost. I rebuilt it, but without access to the Intel superclusters. This search started with trying to make an image showing a row of column-presented sparse rulers from length 1 to length 213.

First, here are sparse rulers up to length 36 with the mark positions converted into pixel positions. The gray rows indicate that the sparse ruler for that length is unique:

✕
Row[{Style[ Column[Table[SplitToRuler[sparsedata[[n]]], {n, 1, 36}], Alignment -> Right], 8], ArrayPlot[Table[PadRight[ReplacePart[Table[0, {n + 1}], ({# + 1} & /@ SplitToRuler[sparsedata[[n]]]) -> 1], 37] + If[counts[[n]] == 1, 2, 0], {n, 1, 36}], PixelConstrained -> 11, ColorRules -> {0 -> White, 1 -> Black, 2 -> GrayLevel[.9], 3 -> Black }, Frame -> False]}] |

The following plot is a transpose of the previous plot, extended to a length of 213. Each column represents a sparse ruler, with gray columns indicating uniqueness. These columns line up with the log plot after the next paragraph:

✕
Row[{Spacer[20], ArrayPlot[ Transpose[Reverse /@ Table[PadRight[ReplacePart[Table[0, {n + 1}], ({# + 1} & /@ SplitToRuler[sparsedata[[n]]]) -> 1], 215] + If[counts[[n]] == 1, 2, 0], {n, 1, 213}]], PixelConstrained -> 2, ColorRules -> {0 -> White, 1 -> Black, 2 -> GrayLevel[.9], 3 -> Black }, Frame -> False]}] |

And here is a log plot of the number of distinct sparse rulers of length to length 213, which shows that there are usually fewer (blue) rulers and more (brown) rulers. Points on the bottom correspond to unique rulers (and a gray column in the previous image):

✕
ListPlot[Take[#, 2] & /@ # & /@ GatherBy[Transpose[{Range[213], Log /@ Take[counts, 213], Take[rulerexcess, 213]}], Last], ImageSize -> 450] |

Length has 15990 distinct sparse rulers. These counts are sequence A103300. Out of the first 213 lengths, 31 of them have a unique sparse ruler. I suspect many lengths above 213 have unique or hard-to-find minimal representations.

The following log plot shows the number of distinct sparse rulers and conjectured sparse rulers of length to length 10501, found in the search that produced 2,016,735 sparse rulers and conjectured sparse rulers:

✕
ListPlot[Take[#, 2] & /@ # & /@ GatherBy[Transpose[{Range[10501], Log /@ Take[counts, 10501], Take[rulerexcess, 10501]}], Last]] |

In the downloadable notebook I show many ways to use a sparse ruler to generate new sparse rulers, which can in turn make more sparse rulers. I call this process recursion. Processing shorter-length rulers gave better results and needed less time, so rulers of a length above 4000 were initially not used to produce more rulers. After cracking the particularly hard length 1792, I extended the new ruler processing to length 7000 in hopes of finding an example of length 5657. After checking to 10501, I temporarily stopped the search.

Various regularities and patterns can be seen, but part of the change in pattern is due to arbitrary cutoffs in processing at 4000 and 7000. One curious case is , with 363 rulers. Nearby is , with 3619 rulers. If an sparse ruler exists, the first clue will likely be an length with an unusually high count of examples.

An infinite number of complete rulers with can be made using all 2069 Wichmann-like recipes. How well does the catalog of Wichmann recipes work? To find out, I tried the following overnight run:

✕
addedrulers=Table[With[{wich=FindWichmann[hh][[1,1]]}, WichmannRuler[WichmannRecipes[[wich[[1]]]], wich[[3]]]], {hh,10520,17553}] |

How do these 7033 new complete rulers match up with the pattern? About 6448 rulers match the previous pattern well. About 587 rulers appear to be violations:

✕
ArrayPlot[ Transpose[Drop[Table[PadLeft[Take[Take[oldrulerexcess, 17553], {WichmannValues[[n]] + 1, WichmannValues[[n + 1]]}], 151, 6], {n, 1, 227}], 92]], ColorRules -> {0 -> LightGray, 1 -> Black, 6 -> White, 2 -> Green, 3 -> Brown, 4 -> Red, 5 -> Yellow }, PixelConstrained -> 4, Frame -> False] |

Adding an extension is the simplest way to make new complete rulers. Let us try that. This code (which will also require a long running time) finds 586 lengths that can be improved this simple way:

✕
oldsparsedata=CloudGet["https://wolfr.am/KeKbjOBs"]; rulerexcess = oldrulerexcess; newrulers = First /@ SplitBy[ Sort[{Last[#], Length[#], RulerToSplit[#]} & /@ Complement[Union[SparseCheckImprove /@ Flatten[ Table[With[{ruler = SplitToRuler[#]}, Append[ruler, Last[ruler] + n]], {n, 1, 40}] & /@ Drop[oldsparsedata, 10501], 1]], {False}]], First]; Do[length = newrulers[[index, 1]]; rulerexcess[[length]] = newrulers[[index, 2]] - Round[Sqrt[3 length + 9/4]]; sparsedata[[length]] = newrulers[[index, 3]], {index, 1, Length[newrulers]}]; |

After that trial, a single exception to the sparse ruler conjecture remains in this range, at . The pattern cleaned up nicely:

✕
ArrayPlot[ Transpose[ Drop[Table[ PadLeft[Take[ReplacePart[Take[rulerexcess, 17553], 16617 -> 2], {WichmannValues[[n]] + 1, WichmannValues[[n + 1]]}], 151, 6], {n, 1, 227}], 92]], ColorRules -> {0 -> LightGray, 1 -> Black, 6 -> White, 2 -> Green, 3 -> Brown, 4 -> Red, 5 -> Yellow }, PixelConstrained -> 4, Frame -> False] |

I did not expect this trial to work so well.

I had to find an example for . The tools in this notebook gave me an example in a few hours. The sparse ruler conjecture is true to at least 17553:

✕
sparse16617 = {{1, 75, 1, 75, 149, 74, 42, 1, 19, 1}, {32, 1, 4, 37, 73, 37, 1, 32, 2, 4}}; temp = SplitToRuler[sparse16617]; {Last[temp], Length[temp], Sparseness[temp]} |

Length 16617 is the final difficult value for . All lengths 16618 to 257992 can be solved with the 2069 known Wichmann recipes or extensions.

After the initialization section in the notebook is `ReasonableRuler`, which will find a ruler with excess 0 or 1 for any given positive integer length. Here’s a ruler for length 100000:

✕
ReasonableRuler[100000] |

The function generates example sparse rulers with *E *= 0 or 1 for all lengths up to 257992 within a few minutes.

QED.

The Leech bounds can be improved. The Leech upper and lower bounds drift far away from the best-known values for minimal marks :

✕
ListPlot[{Table[Sqrt[2.434 n] - (Sqrt[3 n + 9/4]), {n, 1, 17553}], Table[rulerexcess[[n]] + Round[Sqrt[3 n + 9/4]] - Sqrt[3 n + 9/4], {n, 1, 17553}], Table[Sqrt[3.348 n] - (Sqrt[3 n + 9/4]), {n, 1, 17553}]}, AspectRatio -> 1/5] |

I know of 11 rulers with the following properties:

- and complete
- Length greater than 213
- Excess fraction ≥ 4/7
- Not built with a Wichmann recipe

✕
highzero = {{{1, 3, 2, 8, 17, 1, 9, 1}, {3, 1, 1, 3, 9, 1, 4, 3}}, {{1, 4, 8, 17, 9, 6, 3, 9, 1}, {4, 1, 2, 9, 1, 1, 1, 3, 3}}, {{1, 2, 7, 9, 1, 9, 17, 8, 5, 1}, {3, 1, 1, 2, 1, 1, 9, 3, 1, 3}}, {{1, 3, 5, 3, 8, 17, 9, 2, 7, 2, 9, 1}, {2, 2, 1, 1, 2, 9, 1, 1, 1, 1, 2, 2}}, {{1, 3, 10, 21, 11, 1, 11, 1}, {4, 2, 4, 10, 1, 1, 4, 4}}, {{1, 2, 9, 11, 1, 11, 21, 10, 6, 1}, {4, 1, 1, 2, 1, 2, 10, 4, 1, 4}}, {{1, 3, 10, 21, 11, 1, 11, 1}, {4, 2, 4, 11, 1, 1, 4, 4}}, {{1, 2, 9, 11, 1, 11, 21, 10, 6, 1}, {4, 1, 1, 2, 1, 2, 11, 4, 1, 4}}, {{1, 3, 4, 12, 25, 13, 1, 13, 1}, {5, 1, 1, 5, 12, 2, 1, 4, 5}}, {{1, 3, 4, 12, 25, 13, 1, 13, 1}, {5, 1, 1, 5, 13, 2, 1, 4, 5}}, {{1, 3, 5, 14, 29, 15, 1, 15, 1}, {6, 1, 1, 6, 14, 3, 1, 4, 6}}}; Text@Grid[Prepend[{Dot @@ #, Row[{ToString[Numerator[#]], "/", ToString[Denominator[#]]}] &@ ExcessCoordinates[Dot @@ #][[2]], #} & /@ highzero, {"length", "excess fraction", "ruler"}], Frame -> All] |

- Peter Luschny’s optimal ruler conjecture is true; is the last non-Wichmann optimal sparse ruler
- The lower bound for is
- The list of 11 rulers immediately above is complete

I’ve shown how approaching the problem computationally with the Wolfram Language can help not only to solve but also construct a proof for the sparse rulers problem that has historically fascinated so many. Make your own mark—to continue exploring, be sure to download this post’s notebook, which features lots of additional code, the connection between sparse rulers and graceful graphs, and a longer discussion for finding sparse rulers. Can any of the current excess values be improved? Are there more excellent Wichmann-like recipes? I would love to know—submit your recipes in the comments or to Wolfram Community!

Many thanks to T. Sirgedas, A. Robison, G. Beck and N. J. A. Sloane for help with this search.

Leech, J. “On the Representation of 1, 2, …, *n* by Differences.” *Journal of the London Mathematical Society* s1–31.2 (1956): 160–169.

Luschny, P. “The Optimal Ruler Conjecture.” *The On-Line Encyclopedia of Integer Sequences*.

Pegg, E. “Sparse Ruler Conjecture.” *Wolfram Community*.

Rédei, L. and A. Rényi. “On the Representation of the Numbers 1, 2, …, *n* by Means of Differences.”

*Matematicheskii Sbornik* 24(66), no. 3 (1949): 385–389.

Robison, A. D. “Parallel Computation of Sparse Rulers.” *Intel Developer Zone*.

Rokicki, T. and G. Dogon. “Golomb Rulers: Pushing the Limits.” *cube20.org*.

Wichmann, B. “A Note on Restricted Difference Bases.” *Journal of the London Mathematical Society*

s1–38.1 (1963): 465–466. doi:10.1112/jlms/s1-38.1.465.

Wikipedia Contributors. “Sparse Ruler.” *Wikipedia, the Free Encyclopedia*.

In a perfect world, the debate audio and video would be machine readable. But as it stands, we at least have access to the text in transcript form. Before we dive in, it’s helpful to be ready with a bit of preliminary info.

First, let’s retrieve a list of candidate names. Since candidates keep joining and dropping out, I built this list manually over time:

✕
candidateNames = {"Amy Klobuchar", "Andrew Yang", "Bernie Sanders", "Beto O'Rourke", "Bill De Blasio", "Cory Booker", "Elizabeth Warren", "Eric Swalwell", "Jay Inslee", "Joe Biden", "John Delaney", "John Hickenlooper", "Julián Castro", "Kamala Harris", "Kirsten Gillibrand", "Marianne Williamson", "Michael Bennet", "Pete Buttigieg", "Steve Bullock", "Tim Ryan", "Tulsi Gabbard", "Tom Steyer"}; |

We can use `Interpreter` to see which candidates the Wolfram System recognizes—as it turns out, all of them:

✕
candidates = Interpreter["Person"]@candidateNames |

For proper parsing of questions, we’ll also want the names of the moderators (no entities required in this case):

✕
modNames = {"Lester Holt", "Savannah Guthrie", "Chuck Todd", "Rachel Maddow", "Jake Tapper", "Dana Bash", "Don Lemon", "George Stephanopoulos", "Jorge Ramos", "Linsey Davis", "David Muir", "Anderson Cooper", "Erin Burnett", "Marc Lacey", "Andrea Mitchell", "Ashley Parker", "Kristen Welker", "Judy Woodruff", "Yamiche Alcindor", "Amna Nawaz", "Tim Alberta", "Jose Diaz-Balart", "Wolf Blitzer", "Abby Phillip", "Brianne Pfannenstiel", "ANNOUNCER"}; |

Finally, we combine these into a list of everyone who either spoke or was spoken to:

✕
debateNames = Join[candidateNames, modNames]; debatePeople = Join[candidates, modNames]; |

For navigating the transcript, it will also be useful to split into lists of first and last names:

✕
{firstNames, lastNames} = Transpose[{First[#], StringRiffle@Rest@#} & /@ StringSplit[debateNames]]; |

Transcripts can come from any number of outlets; fortunately, the formatting is fairly standard across sources. Importing the transcript is as easy as `Import`; for this analysis, I will use the transcript of the most recent debate (January 14) as transcribed by Rev.com:

✕
d7raw = Import["https://wolfr.am/K7dyQhOh"]; |

You can see on the website that every spoken line begins with the speaker’s name—(with a few variations like “E. Warren”),—followed by a colon and a timestamp, then the contents of the line. We can use `StringCases` to grab each of those elements (including all name variations, iconized for brevity) from the transcript:

✕
d7Lines = StringTrim /@ StringCases[ StringReplace[ d7raw, {{ "Amy Klobuchar", "A. Klobuchar", "A Klobuchar", "Amy K.", "Amy K"} -> "Amy Klobuchar", { "Andrew Yang", "A. Yang", "A Yang", "Andrew Y.", "Andrew Y"} -> "Andrew Yang", { "Bernie Sanders", "B. Sanders", "B Sanders", "Bernie S.", "Bernie S"} -> "Bernie Sanders", { "Beto O'Rourke", "B. O'Rourke", "B O'Rourke", "Beto O.", "Beto O"} -> "Beto O'Rourke", { "Bill De Blasio", "B. De Blasio", "B De Blasio", "Bill D.", "Bill D"} -> "Bill De Blasio", { "Cory Booker", "C. Booker", "C Booker", "Cory B.", "Cory B"} -> "Cory Booker", { "Elizabeth Warren", "E. Warren", "E Warren", "Elizabeth W.", "Elizabeth W"} -> "Elizabeth Warren", { "Eric Swalwell", "E. Swalwell", "E Swalwell", "Eric S.", "Eric S"} -> "Eric Swalwell", { "Jay Inslee", "J. Inslee", "J Inslee", "Jay I.", "Jay I"} -> "Jay Inslee", { "Joe Biden", "J. Biden", "J Biden", "Joe B.", "Joe B"} -> "Joe Biden", { "John Delaney", "J. Delaney", "J Delaney", "John D.", "John D"} -> "John Delaney", { "John Hickenlooper", "J. Hickenlooper", "J Hickenlooper", "John H.", "John H"} -> "John Hickenlooper", { "Julian Castro", "J. Castro", "J Castro", "Julian C.", "Julian C"} -> "Julian Castro", { "Kamala Harris", "K. Harris", "K Harris", "Kamala H.", "Kamala H"} -> "Kamala Harris", { "Kirsten Gillibrand", "K. Gillibrand", "K Gillibrand", "Kirsten G.", "Kirsten G"} -> "Kirsten Gillibrand", { "Marianne Williamson", "M. Williamson", "M Williamson", "Marianne W.", "Marianne W"} -> "Marianne Williamson", { "Michael Bennet", "M. Bennet", "M Bennet", "Michael B.", "Michael B"} -> "Michael Bennet", { "Pete Buttigieg", "P. Buttigieg", "P Buttigieg", "Pete B.", "Pete B"} -> "Pete Buttigieg", { "Steve Bullock", "S. Bullock", "S Bullock", "Steve B.", "Steve B"} -> "Steve Bullock", { "Tim Ryan", "T. Ryan", "T Ryan", "Tim R.", "Tim R"} -> "Tim Ryan", { "Tulsi Gabbard", "T. Gabbard", "T Gabbard", "Tulsi G.", "Tulsi G"} -> "Tulsi Gabbard", { "Tom Steyer", "T. Steyer", "T Steyer", "Tom S.", "Tom S"} -> "Tom Steyer", { "Lester Holt", "L. Holt", "L Holt", "Lester H.", "Lester H"} -> "Lester Holt", { "Savannah Guthrie", "S. Guthrie", "S Guthrie", "Savannah G.", "Savannah G"} -> "Savannah Guthrie", { "Chuck Todd", "C. Todd", "C Todd", "Chuck T.", "Chuck T"} -> "Chuck Todd", { "Rachel Maddow", "R. Maddow", "R Maddow", "Rachel M.", "Rachel M"} -> "Rachel Maddow", { "Jake Tapper", "J. Tapper", "J Tapper", "Jake T.", "Jake T"} -> "Jake Tapper", { "Dana Bash", "D. Bash", "D Bash", "Dana B.", "Dana B"} -> "Dana Bash", { "Don Lemon", "D. Lemon", "D Lemon", "Don L.", "Don L"} -> "Don Lemon", { "George Stephanopoulos", "G. Stephanopoulos", "G Stephanopoulos", "George S.", "George S"} -> "George Stephanopoulos", { "Jorge Ramos", "J. Ramos", "J Ramos", "Jorge R.", "Jorge R"} -> "Jorge Ramos", { "Linsey Davis", "L. Davis", "L Davis", "Linsey D.", "Linsey D"} -> "Linsey Davis", { "David Muir", "D. Muir", "D Muir", "David M.", "David M"} -> "David Muir", { "Anderson Cooper", "A. Cooper", "A Cooper", "Anderson C.", "Anderson C"} -> "Anderson Cooper", { "Erin Burnett", "E. Burnett", "E Burnett", "Erin B.", "Erin B"} -> "Erin Burnett", { "Marc Lacey", "M. Lacey", "M Lacey", "Marc L.", "Marc L"} -> "Marc Lacey", { "Andrea Mitchell", "A. Mitchell", "A Mitchell", "Andrea M.", "Andrea M"} -> "Andrea Mitchell", { "Ashley Parker", "A. Parker", "A Parker", "Ashley P.", "Ashley P"} -> "Ashley Parker", { "Kristen Welker", "K. Welker", "K Welker", "Kristen W.", "Kristen W"} -> "Kristen Welker", { "Judy Woodruff", "J. Woodruff", "J Woodruff", "Judy W.", "Judy W"} -> "Judy Woodruff", { "Yamiche Alcindor", "Y. Alcindor", "Y Alcindor", "Yamiche A.", "Yamiche A"} -> "Yamiche Alcindor", { "Amna Nawaz", "A. Nawaz", "A Nawaz", "Amna N.", "Amna N"} -> "Amna Nawaz", { "Tim Alberta", "T. Alberta", "T Alberta", "Tim A.", "Tim A"} -> "Tim Alberta", { "Jose Diaz-Balart", "J. Diaz-Balart", "J Diaz-Balart", "Jose D.", "Jose D"} -> "Jose Diaz-Balart", { "Wolf Blitzer", "W. Blitzer", "W Blitzer", "Wolf B.", "Wolf B", "Moderator 3"} -> "Wolf Blitzer", { "Abby Phillip", "A. Phillip", "A Phillip", "Abby P.", "Abby P", "Moderator 2"} -> "Abby Phillip", { "Brianne Pfannenstiel", "B. Pfannenstiel", "B Pfannenstiel", "Brianne P.", "Brianne P", "Moderator 1"} -> "Brianne Pfannenstiel"}], Shortest[ name : (debateNames) ~~ ": " ~~ time : ("(" ~~ __ ~~ ")") ~~ __ ~~ line__ ~~ "\n" ~~ debateNames] :> {name, time, line}, IgnoreCase -> False, Overlaps -> True]; |

Each entry is a list containing those three elements:

✕
RandomChoice[d7Lines] |

Then we convert this into a chronological list of *speaker→line* rules. Not every transcript I found had timestamps, so I’ve left them out of this data:

✕
d7Data = (#1 /. Thread[debateNames -> debatePeople]) -> #3 & @@@ d7Lines; |

Here’s an entry from that list:

✕
RandomChoice[d7Data] |

Although the steps varied, I was able to build these datasets for each debate transcript. Here is the full data from all seven debates so far:

✕
allDebateData = CloudGet["https://wolfr.am/K7wv2AEd"]; |

The cloud data gives us enough information to start exploring with computation. For example, people often judge the debate by who got the most speaking time. We can roughly estimate this using `WordCount` to see how many words each person spoke:

✕
wordsSpoken[data_, person_] := StringRiffle@Values[Select[data, First@# == person &]] // WordCount |

A `PieChart` of this data from the most recent debate shows a pretty even spread:

✕
PieChart[wordsSpoken[d7Data, #] & /@ candidates, ChartLabels -> candidates] |

We can also dive a bit deeper into the content of the candidates’ words. With `TextCases`, we can get specific people, places and concepts mentioned in each line of text:

✕
entityMentions[data_, entType_String] := Thread[data[[All, 1]] -> Union /@ TextCases[Values@data, entType -> "Interpretation", VerifyInterpretation -> True]] |

We could use this to explore any number of trends, such as which companies were mentioned during a given debate:

✕
Counts[Flatten@Values@entityMentions[d7Data, "Company"]] |

Throughout the debates, several countries have also been discussed; viewing the data with `WordCloud` shows the recent emphasis on Iran and Afghanistan:

✕
WordCloud[ Most@Sort@Counts[Flatten@Values@entityMentions[d7Data, "Country"]]] |

Since audiences seem to love watching politicians talk to and about each other, I thought it would be interesting to look at direct mentions of other candidates. We can use `StringCases` to determine when a candidate is referred to by first or last name, including former President Barack Obama and current President Donald Trump, both of whom are invoked quite often. This function finds such mentions, associating each text mention with a specific person:

✕
peopleMentions[data_] := Module[{ names = { Alternatives["Amy", "Klobuchar"], Alternatives["Andrew", "Yang"], Alternatives["Bernie", "Sanders"], Alternatives["Beto", "O'Rourke"], Alternatives["Bill", "De Blasio"], Alternatives["Cory", "Booker"], Alternatives["Elizabeth", "Warren"], Alternatives["Eric", "Swalwell"], Alternatives["Jay", "Inslee"], Alternatives["Joe", "Biden"], Alternatives["John", "Delaney"], Alternatives["John", "Hickenlooper"], Alternatives["Julian", "Castro"], Alternatives["Kamala", "Harris"], Alternatives["Kirsten", "Gillibrand"], Alternatives["Marianne", "Williamson"], Alternatives["Michael", "Bennet"], Alternatives["Pete", "Buttigieg"], Alternatives["Steve", "Bullock"], Alternatives["Tim", "Ryan"], Alternatives["Tulsi", "Gabbard"], Alternatives["Tom", "Steyer"], Alternatives["Donald", "Trump"], Alternatives["Barack", "Obama"]}, mentions, rules}, mentions = Union /@ StringCases[data // Values, names]; rules = Thread[names -> Join[candidates, { Entity["Person", "DonaldTrump::6vv3q"], Entity["Person", "BarackObama::7yj6w"]}]]; Thread /@ Thread[Keys[data] -> mentions /. rules] // Flatten] |

Although this function could potentially return false positives (e.g. “Warren Buffet” matching “Elizabeth Warren”), I found this to be a rare enough occurrence that I didn’t try to account for it. In a broader conversational piece, this might be a bit more difficult.

Since the moderators are not crucial for this analysis, here is a function to get a list of only Democratic candidates mentioning each other (or Trump/Obama):

✕
candidateMentions[data_] := DeleteMissing[peopleMentions[data] /. Thread[modNames -> Missing[]], 1, 2] |

When applied across the full dataset, we see that the candidates have mentioned names nearly a thousand times:

✕
allMentions = Flatten[candidateMentions /@ allDebateData]; Length[allMentions] |

Out of curiosity, I had to see who has spoken about the two presidents most throughout the debates, so I made a quick function to count who mentioned a given person the most:

✕
mostMentions[candidate_] := Dataset@TakeLargest[ Counts[Select[allMentions, #[[2]] == candidate &] // Keys], UpTo@5] |

✕
Row[mostMentions /@ {Entity["Person", "DonaldTrump::6vv3q"], Entity["Person", "BarackObama::7yj6w"]}] |

Amy Klobuchar has a lot to say about President Trump, whereas Obama gets a lot of name-drops from his former vice president, Joe Biden. Notably, between the two lists, we see all five of the current frontrunners (excluding Harris and Castro).

Here are all the self-mentions; amusingly, Biden talks about himself the most:

✕
DeleteCases[ Table[KeySelect[mostMentions[c] // Normal, MatchQ[#, c] &], {c, candidates}], <||>] // Dataset |

To narrow the focus even more, here is a function that gives a list of mentions between participants of a specific debate only—removing Trump and Obama, as well as self-references:

✕
debateMentions[data_] := Select[candidateMentions[data], MatchQ[#[[2]], Alternatives @@ Union@Keys[data]] && FreeQ[#[[2]], Alternatives @@ { Entity["Person", "DonaldTrump::6vv3q"], Entity["Person", "BarackObama::7yj6w"]}] && ! SameQ[#[[1]], #[[2]]] &] |

The result is a reduced list of rules that we can use for further exploration of the interactions in each individual debate.

We’ve done a bit of text analysis and basic counting; now it’s time to switch gears to graph theory. If we think of the output from our previous function as a list of one-way connections between candidates (i.e. directed edges pointing from speaker to subject) we can easily display the resulting graph for a given debate:

✕
Graph[debateMentions[d7Data], VertexLabels -> "Name"] |

We can simplify this view by representing repeated mentions with edge weights, styling each edge accordingly. First, we need to convert repeated edges into weighted edges:

✕
weighted = Graph[Union@debateMentions[d7Data], EdgeWeight -> Values@Counts[debateMentions[d7Data]]]; |

To apply styles to the edges, we need to get the weights using `PropertyValue`, use `Rescale` to get values appropriate for the styling functions and then apply the styles using `Property`:

✕
edgeWeightStyledGraph[g_, styleFunc_List, ranges_List] := Graph@Module[ {edges = EdgeList[g], weights, weightsScaled, styles}, weights = Table[PropertyValue[{g, e}, EdgeWeight], {e, edges}]; weightsScaled = Table[Rescale[weights, MinMax[weights], r], {r, ranges}]; styles = Directive @@@ Transpose[ Thread /@ Table[styleFunc[[i]][weightsScaled[[i]]], {i, Length[styleFunc]}]]; Table[Property[ edges[[i]], {EdgeStyle -> styles[[i]], EdgeWeight -> weights[[i]]}], {i, Length[edges]}]] |

Here is the same graph, but with edges consolidated and resized based on weights:

✕
Graph[edgeWeightStyledGraph[weighted, {Thickness}, {{.002, .01}}], VertexLabels -> "Name"] |

Combining these steps (and adding some color), we can make a function to display the graph in a compact way:

✕
debateMentionsGraph[data_] := With[ {g = Graph[debateMentions[data] // Union, EdgeWeight -> Values@Counts@debateMentions[data]]}, Graph[edgeWeightStyledGraph[ g, {Thickness, Darker[RGBColor[0, 0.68625, 0.95295], #] &}, {{.002, .01}, {.5, 0}}], ({ GraphLayout -> "LayeredDigraphEmbedding", AspectRatio -> (If[# > 0.7, 0.7, #]& )[VertexCount[#]/20.], VertexLabels -> Placed[ "Name", Center, Framed[#, RoundingRadius -> 5, Background -> Directive[{White, Opacity[0.8]}]]& ], VertexLabelStyle -> Directive[{"Text", Medium}], VertexSize -> Large, VertexStyle -> White}& )[g]]] |

These graphs give a nice visual summary of the exchanges in each debate. Note that the positioning of the candidates is not important here—just the edges and their styles. In the latest debate, you can see a pretty strong emphasis on Sanders and Warren, the two progressive candidates:

✕
g = debateMentionsGraph[d7Data] |

Most candidates have built-in images tied to their entities, so instead of labeling with names, we could also use faces:

✕
facesOverlay[g_] := Graph[g, VertexSize -> 1.1, VertexLabels -> (Table[c -> Placed[ Overlay[{ Image[ RemoveBackground[ ImageApply[# + 1& , (ImageTrim[#, First[ FindFaces[#]], 20]& )[ If[ ImageQ[ c["Image"]], c["Image"], First[ WebImageSearch[ StringJoin[ c["Name"], " headshot"], Method -> "Google", MaxItems -> 1]]]], Masking -> Graphics[ Disk[]]]], ImageSize -> 60], Graphics[{ GrayLevel[0.7], Thickness[0.05], Circle[{0, 0}, 30]}, ImageSize -> 62]}, Alignment -> Center], Center], {c, VertexList[#]}]& )[g]] |

✕
facesOverlay[g] |

Applying this to the first debate makes for an interesting—but crowded—graphic:

✕
facesOverlay[debateMentionsGraph[allDebateData[[1]]]] |

Next, we can explore `DegreeCentrality`: the relative number of edges connected to each vertex. In a directed graph, we can look at either in-degrees (a candidate being spoken about) or out-degrees (a candidate speaking about others). A high in-degree might indicate popularity, whereas a high out-degree would imply assertiveness. So for this exploration, we’ll consider both:

✕
Thread[VertexList[g] -> Transpose@Table[DegreeCentrality[g, i], {i, {"In", "Out"}}]] |

By these numbers alone, we might give a win to Bernie Sanders for assertiveness or Tom Steyer for popularity. But we can take a more visual look by overlaying the data on our graph. First we compute both measures for each vertex:

✕
centralityList[g_] := Table[DegreeCentrality[g, i], {i, {"In", "Out"}}] |

Then we use `Blend` to define a color for each vertex (candidate)—`Blue` for in-degrees (popularity) and `Red` for out-degrees (assertiveness):

✕
colorList[g_] := Blend[{RGBColor[0, 0, 1], RGBColor[1, 0, 0]}, #] & /@ Transpose@centralityList[g] |

Combining this with our existing function, we can make a new function to display the graph with vertex colors based on that blend:

✕
mentionsCentralityGraph[data_] := With[{g = debateMentionsGraph[data]}, Graph[g, {VertexLabelStyle -> Thread[VertexList[g] -> colorList[g]]}]] |

Let’s take another look at the most recent debate:

✕
mentionsCentralityGraph[d7Data] |

The story here is clear: Sanders and Warren (purple) were at the core of numerous exchanges, with additional commentary from Biden and Steyer (who actually mentioned *everyone*). Klobuchar called out others but was barely named, whereas Buttegieg was mentioned but didn’t talk about others. By a “balanced” criterion (high assertiveness *and* popularity scores), Sanders and Warren appear to be the winners.

Let’s look back a bit further at the fifth debate, which aired on November 20:

✕
mentionsCentralityGraph[allDebateData[[5]]] |

One thing is evident: more candidates lead to fewer overall mentions. This makes sense from a logistical standpoint—each person has less time to speak—but I think it also shows the growing emphasis on candidates’ differences as the debates wear on.

Combining all the data, we get an indication of how people tended to interact throughout the debates:

✕
Thread[candidates -> colorList@debateMentionsGraph[Flatten@allDebateData]] |

Again, the data tells a story. Biden, the long-time presumptive frontrunner, receives a lot more attention than he gives. Many of the minor candidates—Bullock, Inslee, Delaney—tend to (or rather, are able to) call out others without getting much return fire.

If there’s one metric most analysts can agree upon (or at least, agree to criticize), it’s the national polls. So if we want some objective measure of who “won” the debate, that might be a good place to start. Much of this section is probably similar to standard analyses you’ll find elsewhere.

For this section, we’ll jump back into text analytics; I found a Wikipedia article rife with up-to-date info in organized tables:

✕
pollData = Import["https://wolfr.am/K7iT0dHR", "Data"]; |

Each row contains the same information—dates administered, sample size, margin of error and percentages for each candidate. Using `StringTrim` to get rid of unnecessary characters, here is a function to parse the data from each list and interpret it with `SemanticImportString`:

✕
parsePollData[list_] := SemanticImportString[ExportString[Select[ StringTrim[#, { Shortest["\[Dash]" | "-" ~~ ___ ~~ DigitCharacter ..], "\[PlusMinus] ", ("(" | "[") ~~ __ ~~ (")" | "]") ~~ WhitespaceCharacter ... }] & /@ list // Quiet, Length[#] == Length[list[[1]]] &], "CSV", CharacterEncoding -> "ASCII"], PadRight[{"String", Interpreter@"Date", "Number", "Percent"}, Length[list[[1]]], "Percent"], HeaderLines -> 1] |

The page includes a lot of extra info—and it keeps growing—but we’re mainly interested in polls between last June and today. Here is a combined list of all entries from that window:

✕
pollDataCombined = Flatten@Table[Normal@parsePollData[p], {p, pollData[[1, 5 ;; 12]]}]; |

Now we have entries for each poll, but all the dates are interpreted as being in 2020:

✕
RandomChoice[pollDataCombined] |

As a final cleaning step, we’ll fix the dates and standardize the key names:

✕
cleanPollData[list_] := Table[KeyMap[StringReplace[StringTrim[#], { ___ ~~ "source" ~~ ___ -> "Source", ___ ~~ "date" ~~ ___ -> "Date", ___ ~~ "size" ~~ ___ -> "SampleSize", ___ ~~ "err" ~~ ___ -> "ErrorMargin", ___ ~~ "undecided" ~~ ___ -> "Undecided"}, IgnoreCase -> True] &, p] // If[#Date > Today, # /. #Date -> (#Date - Quantity[1, "Years"]), #] &, {p, list}] |

✕
pollDataClean = cleanPollData[pollDataCombined]; |

With this dataset, we can view polling numbers for a given candidate or date range. Here is each candidate’s polling history as a `TimeSeries` object:

✕
pollNames = Intersection[candidateNames, Union[Flatten[Keys /@ pollDataClean]]]; |

✕
pollHistory = Table[c -> TimeSeries[ Values /@ DeleteMissing[pollDataClean[[All, {"Date", c}]], 1, 2] ], {c, pollNames}]; |

The data is summed up nicely by a `DateListPlot`:

✕
DateListPlot[ Thread@Legended[ TimeSeriesResample[Values@pollHistory, Quantity[1, "Weeks"]], Last /@ StringSplit /@ pollNames]] |

However, this overview includes more information than we need, and many of the lower-polling candidates are obscured. Let’s zoom in to only look at the leaders, i.e. anyone who has risen above 10% in the polls since the debates began:

✕
leaders = Select[pollHistory, ContainsAny[ Union[# > Quantity[10, "Percent"] & /@ Values[#[[2]]]], {True}] &]; |

By adding vertical lines for each debate date, we can get a rough idea of how the debates have affected polling:

✕
DateListPlot[Legended[ TimeSeriesResample[Values@leaders, Quantity[1, "Weeks"]], { Style["Sanders", 15], Style["Warren", 15], Style["Biden", 15], Style["Harris", 15], Style["Buttigieg", 15]}] // Thread, { ImageSize -> 430, LabelStyle -> {FontSize -> 11}, PlotRange -> {{{2019, 6, 1}, {2020, 2, 5}}, Automatic}, Epilog -> {{ RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 0}, { DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 40.}}], GrayLevel[0], FontSize -> 11, Inset[ Framed["Jan 14", Background -> GrayLevel[1]], { DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 42.}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 40.}}], GrayLevel[0], FontSize -> 11, Inset[ Framed["Dec 19", Background -> GrayLevel[1]], { DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 42.}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 40.}}], GrayLevel[0], FontSize -> 11, Inset[ Framed["Nov 20", Background -> GrayLevel[1]], { DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 42.}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 40.}}], GrayLevel[0], FontSize -> 11, Inset[ Framed["Oct 15", Background -> GrayLevel[1]], { DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 42.}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 40.}}], GrayLevel[0], FontSize -> 11, Inset[ Framed["Sep 12", Background -> GrayLevel[1]], { DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 42.}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 40.}}], GrayLevel[0], FontSize -> 11, Inset[ Framed["Jul 30", Background -> GrayLevel[1]], { DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 42.}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 40.}}], GrayLevel[0], FontSize -> 11, Inset[ Framed["Jun 26", Background -> GrayLevel[1]], { DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 42.}]}}}] |

Though the rankings haven’t moved much, you can see a few peaks and troughs that likely resulted from debates. For a quick estimate of who might have gained or lost, we can aggregate data from the week before a debate, then compare to polling numbers the week after the same debate:

✕
beforeAfter[date_, data_: pollDataClean] := With[{ before = Select[data, date >= #Date > date - Quantity[1, "Weeks"] &], after = Select[data, date <= #Date < date + Quantity[1, "Weeks"] &]}, DeleteCases[<|Table[c -> N@Mean[Cases[#[[All, c]], _Quantity]], {c, pollNames}] /. Thread[candidateNames -> candidates]|>, _Mean] & /@ {before, after}] |

Subtracting the two values gives an estimate of percentage changes around a given date:

✕
percentChange[date_, data_: pollDataClean] := Subtract @@ Reverse@beforeAfter[date, data] |

From there, we can use the mean polling per candidate to judge whether public opinion shifted. For instance, Kamala Harris’s average went up significantly after the first debate in June:

✕
percentChange[DateObject[{2019, 6, 26}]] |

This should give a broad measure of how debate performance affected polling for each candidate. Here are the numbers for the latest debate—only including those candidates who actually participated:

✕
KeySelect[ percentChange[DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.`]], MemberQ[Union@Keys@d7Data, #] &] |

Of course, none of these numbers *really* mean anything until voting starts. The first candidate selection event of the cycle, the Iowa caucus, was held earlier this week—with Pete Buttegieg appearing to come out on top (pending final results).

Let’s see how this compares with our debate mentions. We’ll need to get a dataset of statewide polling:

✕
stateData = Import["https://wolfr.am/K7juP63d", "Data"]; |

Using the functions from the previous section, we can import the most recent Iowa polls:

✕
iowaPollData = Normal@parsePollData[stateData[[1, 5]]] // cleanPollData; |

As in the previous section, we grab the polling history for each candidate:

✕
iowaNames = Intersection[candidateNames, Union[Flatten[Keys /@ iowaPollData]]]; |

✕
iowaHistory = Table[c -> TimeSeries[ Values /@ DeleteMissing[iowaPollData[[All, {"Date", c}]], 1, 2]], {c, iowaNames}]; |

The polling history shows that the rankings changed drastically after the Iowa debate:

✕
DateListPlot[ Thread@Legended[ TimeSeriesResample[Values@iowaHistory, Quantity[2, "Days"]], Last /@ StringSplit /@ iowaNames], {Epilog -> {{ RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 0}, { DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 32}}], GrayLevel[0], FontSize -> 13, Inset[ Framed["Jan 14", Background -> GrayLevel[1]], { DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 34}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 32}}], GrayLevel[0], FontSize -> 13, Inset[ Framed["Dec 19", Background -> GrayLevel[1]], { DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 34}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 32}}], GrayLevel[0], FontSize -> 13, Inset[ Framed["Nov 20", Background -> GrayLevel[1]], { DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 34}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 32}}], GrayLevel[0], FontSize -> 13, Inset[ Framed["Oct 15", Background -> GrayLevel[1]], { DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 34}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 32}}], GrayLevel[0], FontSize -> 13, Inset[ Framed["Sep 12", Background -> GrayLevel[1]], { DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 34}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 32}}], GrayLevel[0], FontSize -> 13, Inset[ Framed["Jul 30", Background -> GrayLevel[1]], { DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 34}]}, { RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], Opacity[0.5], Line[{{ DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 0}, { DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 32}}], GrayLevel[0], FontSize -> 13, Inset[ Framed["Jun 26", Background -> GrayLevel[1]], { DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 34}]}}}] |

Sanders and Biden have been battling for the top spot in Iowa, but Buttigieg and Warren came in with some potential too. Sanders had the biggest polling surge in the weeks leading up to the caucus:

✕
percentChange[DateObject[{2020, 1, 20}], iowaPollData] |

Going by the standard analysis, Sanders or Biden should obviously have come out on top. But referring back to the centrality graph from earlier, Buttigieg was actually the most “popular” in that others mentioned him without him ever mentioning anyone else:

✕
facesOverlay[debateMentionsGraph[d7Data]] |

Perhaps ignoring critics was a winning strategy in that case!

Graph theory is a useful lens for visualizing interpersonal exchanges, providing some unique computational insights about how the candidates interact. Combining this with other computations could provide even more interesting information, such as using sentiment analysis to determine candidates’ attitudes toward each other. In the coming months, it will be interesting to find out what kinds of debaters end up winning the most primaries: assertive, popular or mixed. Try it yourself—maybe you can define your own computational measure of a debate winner!

Get full access to the latest Wolfram Language functionality with a Mathematica 12 or Wolfram|One trial. |

Today, the world around us is being captured by imaging devices ranging from cell phones and action cameras to microscopes and telescopes. With ever-increasing generation of images, image processing and automatic image analysis are used in a wide range of individual, academic and industry applications.

We are excited to announce Introduction to Image Processing, a free interactive course from Wolfram U, which makes cutting-edge image processing simple with graphical and visual examples that demonstrate how image operations work. It includes 14 video lessons, each lasting 20 minutes or fewer, and 5 short quizzes, as well as a certificate for finishing all course materials. Topics range from how to control brightness and contrast or crop and resize images, to advanced topics including segmentation, image enhancement, feature detection and using machine learning to perform modern image processing—no machine learning knowledge necessary!

This course is targeted at anyone looking to build skills in image processing—no prior knowledge is required, so it works well as a general introduction to image processing concepts and methods.

The course starts with the Course Overview video, which gives a brief summary of each topic covered. The series of lessons with interactive examples and assessments help build a student’s knowledge in each image processing area:

- Importing images
- Adjusting brightness and contrast
- Histogram operations
- Geometric operations
- Segmentation
- Selective coloring
- Image enhancement
- Perspective correction
- Object detection and recognition

Each lesson begins with an example to demonstrate why the topic is important and how it is generally applied. Lessons are backed with interactive expressions to build intuition for the algorithms discussed, as the Wolfram Language provides a range of functionality for modern industrial-strength image processing.

For instance, here is one dynamic example from the Segmentation lesson that illustrates the effect of thresholding and the different built-in methods for image binarization (converting to black and white):

You can use the lesson transcript in each section as a quick reference for the learning goals. Each lesson begins with a brief introductory paragraph and ends with a summary of the topics and Wolfram Language functions covered. You can also copy code from the transcript into your interactive scratch notebook to try it yourself.

Every few lessons, you’ll find a quiz with about five multiple-choice questions. Successful completion of the quizzes helps reinforce concepts and algorithms discussed during lessons, and also counts toward the final course certification.

The Capstone Examples section concludes the course with four complete real-world examples using many of the tools and methods from earlier sections, as well as introducing some new functions specific to each application. After completing this lesson (and the course), you should have a thorough enough understanding of image processing to start your own projects!

Lesson topics and functions covered range from fundamentals like brightness and contrast to advanced features such as facial recognition. For a brief look inside the course, let’s explore a few code samples.

First, here is some code from the Controlling Contrast section that demonstrates how to adjust contrast in an image by rescaling the range of intensity values. This involves first finding the minimum and maximum pixel values in the image:

✕
{minValue, maxValue} = MinMax[CloudGet["https://wolfr.am/JXZ9iuXk"]] |

Then we define a linear function that returns 0 for the minimum value and 1 for the maximum value:

✕
f[p_] = (p - minValue)/(maxValue - minValue); |

Finally, we apply that function across every pixel in the image:

✕
ImageApply[f, CloudGet["https://wolfr.am/JXZ9iuXk"]] |

The same sort of transformation can be applied even when the pixel values do not fall in a narrow value range as the previous example. Here is a function that gives a linear transformation for a certain range of pixel values (0.4 to 0.6) and transforms values below that range to 0 and values above that range to 1:

✕
f[x_] = Piecewise[{{0, x < 0.4}, {(x - 0.4)/(0.6 - 0.4), 0.4 < x < 0.6}, {1, x > 0.6}}]; |

The effect of that transformation is to maximize contrast for pixels in the selected range and to turn everything else either black or white. This has a more dramatic effect on the image—making light areas white and dark areas black, with only a narrow range of gray in the middle:

✕
ImageApply[f, CloudGet["https://wolfr.am/JXZ9tJxN"]] |

Here is an interactive interface for adjusting the interval to emphasize different aspects of the image:

✕
With[{img = CloudGet["https://wolfr.am/JXZ9tJxN"]}, Manipulate[{x1, x2} = int; g = Piecewise[{{0, # < x1}, {(# - x1)/(x2 - x1), x1 < # < x2}, {1, # > x2}}] &; GraphicsRow[{Plot[g[x], {x, 0, 1}], ImageApply[g, img]}, ImageSize -> 400], {{int, {0, 1}}, 0, 1, ControlType -> IntervalSlider}]] |

The next example comes from the Clustering section, where we use component measurements to create and apply masks. We start with this image of some vivid orange flowers:

✕
img = CloudGet["https://wolfr.am/JXZ9BCzn"]; |

We can segment the image into five similarly colored areas:

✕
lmat5 = ClusteringComponents[img, 5]; Colorize[lmat5] |

Then we extract each component as a mask:

✕
masks = ComponentMeasurements[lmat5, "Mask"] |

This gives us a list of rules for applying functions to the different parts of the image. Putting it all together, here is a short program that replaces the background with black and adjusts the colors of the flowers while leaving the green parts of the image unchanged:

✕
ImageAdd[ImageApply[RotateLeft, ImageMultiply[img, ImageAdd[Image[2 /. masks], Image[4 /. masks]]]], ImageMultiply[img, ImageAdd[Image[3 /. masks], Image[5 /. masks]]]] |

Finally, here is a high-level example from the Machine Learning for Images section that shows how to locate human faces in an image. This is done by first generating a list of rectangles marking the locations of faces in the image:

✕
faces = FindFaces[CloudGet["https://wolfr.am/JXZ9YxvX"]] |

We can then show those rectangles as highlights on the original image:

✕
HighlightImage[CloudGet["https://wolfr.am/JXZ9YxvX"], faces] |

After watching all the videos and successfully passing the quizzes, you will earn a course completion certificate. We also have additional certification options coming soon; when available, you’ll be able to complete additional exercises to become Wolfram Certified Level I in Image Processing. Check back at the Image Processing course page for updates.

This course is part of our series of completely free and open learning resources; you can get started with just a Wolfram ID and a browser. So head over to the course page now to start doing more with your images!

Take Introduction to Image Processing now to perfect your image processing skills, or check out the Image & Signal Processing page for related Wolfram U events and courses. |

Who has not encountered a stink bug? Perhaps the better question is not if, but when. I remember well my first interactions with stink bugs—partly because of their pungent, cilantro-like odor, but also because in my native Catalan language they are called *Bernat pudent* (“stinky Bernat”) and Bernat is my twin brother’s name.

So when I encountered the stink bug again when visiting Champaign, Illinois, for the 2019 Wolfram Technology Conference, it brought up a lot of fond childhood memories. This time, however, two things had changed: the frequency of encounters with the stink bug seemed exponentially greater, and I now had the Wolfram Language to more fully (and computationally) satisfy my curiosity about this reviled insect and its growing impact on our ecosystem. So to get a better picture of the arrival and spread of this invasive bug across the US, I used available observation data and the Wolfram Language to make a map of sightings over the past two decades.

In the US, there are dozens of native stink bug species, but in this post I’m going to focus mainly on a non-native species that has become quite common in North America and fairly unpopular over the last two decades: the brown marmorated stink bug, also known by its scientific name as *Halyomorpha halys* (Stål, 1855). To get as comprehensive a dataset as possible, we will be importing stink bug data from three sources. After saving the data files in the same folder as our notebook, we should set the directory path appropriately:

✕
SetDirectory[NotebookDirectory[]]; |

We’ll start by importing data on stink bug distribution from the EPPO Global Database. Using `SemanticImport`, we can specify data interpretations for the desired columns and get a `Dataset` object for easy exploration. In this case, `Interpreter["Country"]` is being passed as a pure function for richer interpretation:

✕
dataEPPO = SemanticImport[ "distribution_HALYHA.csv", <| "country code" -> Interpreter["Country"], "state" -> "String", "Status" -> "String"|>, "HeaderLines" -> 1] |

We can get a quick overview by using `Select` to get national data from around the world (represented by items with a blank “state” property):

✕
nationalData = dataEPPO[Select[StringMatchQ[#state, ""] &]]; |

The data assigns each sighting to one of six distinct status levels—either absent or present, and the quality/intensity of the report:

✕
Column[Union@Normal@nationalData[[All, "Status"]]] |

We can use `GroupBy` to collect the countries by status:

✕
countriesByStatus = KeySort@nationalData[GroupBy[#Status &]][[All, All, "country code"]] // Normal; |

Using `GeoRegionValuePlot`, we can then create a map of stink bug distribution and level of occurrence per country, based on the EPPO data:

✕
GeoRegionValuePlot[ Thread /@ Thread[Values@countriesByStatus -> {RGBColor[ 0.00784313725490196, 0.5098039215686274, 0.9294117647058824], RGBColor[ 0.5411764705882353, 0.7137254901960784, 0.027450980392156862`], RGBColor[ 0.9333333333333333, 0.5098039215686274, 0.9333333333333333], RGBColor[ 0.996078431372549, 0.9882352941176471, 0.03529411764705882], RGBColor[ 0.996078431372549, 0.3607843137254902, 0.027450980392156862`], RGBColor[1, 0, 0]}] // Flatten, { PlotStyle -> Opacity[0.6], ImageSize -> 550, GeoBackground -> "CountryBorders", Epilog -> Inset[ Framed[ SwatchLegend[{ RGBColor[0.00784313725490196, 0.5098039215686274, 0.9294117647058824], RGBColor[0.5411764705882353, 0.7137254901960784, 0.027450980392156862`], RGBColor[0.9333333333333333, 0.5098039215686274, 0.9333333333333333], RGBColor[0.996078431372549, 0.9882352941176471, 0.03529411764705882], RGBColor[0.996078431372549, 0.3607843137254902, 0.027450980392156862`], RGBColor[1, 0, 0]}, { "Absent, intercepted only", "Absent, unreliable record", "Present, few occurrences", "Present, no details", "Present, restricted distribution", "Present, widespread"}, LegendLayout -> "Row"], RoundingRadius -> 5, Background -> Directive[ GrayLevel[1], Opacity[0.9]]], {35, -63}]}] |

Now let’s import the data from iNaturalist and EDDMapS (Early Detection and Distribution Mapping System), both of which contain crowdsourced observation data. We’ll use `SemanticImport` again. Because the locations are listed as latitude/longitude pairs, no custom interpretation is needed:

✕
dataEDDMapS = SemanticImport["EDDMapS_Halys.csv"]; dataiNaturalist = SemanticImport["iNaturalist_US_Stink_bugs.csv"]; |

The iNaturalist data contains observations of many stink bug species; for this exploration, we only need observations of the invasive stink bug, *Halyomorpha halys*:

✕
dataiNaturalist = dataiNaturalist@ Select[#["scientific_name"] == "Halyomorpha halys" &]; |

With our data imported, we’re now ready to create a distribution map.

The main purpose of this visualization is to determine geographic distribution throughout time. I decided the easiest representation for this would be an interactive, color-coded map showing the combined data across several years. First, we assign a color to each year since 1998, roughly when stink bugs were first sighted:

✕
colorYear[year_Integer] := Part[Table[ ColorData[{"SolarColors", "Reverse"}][i], {i, 0, 1, 0.05}], year - 1998]; |

Then we need to extract coordinates. Since the datasets have different formats, we need separate functions for selecting the observations from each. For EDDMapS, we group by observation date, extracting the latitude and longitude from each data point:

✕
EDDMapSCoords = Values /@ Normal[dataEDDMapS@GroupBy[#["ObsDate"]["Year"] &]][[All, All, {"Latitude", "Longitude"}]]; |

The process is the same for the iNaturalist dataset, but with different key names:

✕
iNaturalistCoordinates = Values /@ Normal[dataiNaturalist@GroupBy[#["observed_on"]["Year"] &]][[All, All, {"latitude", "longitude"}]]; |

We can use `Merge` and `DeleteMissing` to combine and clean the lists:

✕
combinedCoords = Replace[DeleteMissing[Flatten[#, 1], 2], {} -> Nothing, 3] & /@ Merge[{EDDMapSCoordinates, iNaturalistCoordinates}, Identity]; |

Then we create an array of geodetic positions with `GeoPosition`:

✕
positionsYear = GeoPosition /@ combinedCoords; |

Here’s a styled background for our animation, using `GeoListPlot` to display polygons for each state over a satellite image:

✕
bgImage = GeoListPlot[ EntityList@ EntityClass["AdministrativeDivision", "AllUSStatesPlusDC"], { PlotStyle -> Directive[ EdgeForm[{ GrayLevel[0.85], Thickness[Tiny], Opacity[0.8]}], FaceForm[{ Opacity[0.2], GrayLevel[1]}]], GeoBackground -> GeoStyling["Satellite", Opacity[0.8]], GeoRange -> Entity["Country", "UnitedStates"], PlotLegends -> None, ImageSize -> 550}] |

We also want to include a thumbnail image of the stink bug on the final GIF. The following specimen was trying to overwinter inside my flat in Barcelona (we also have the same invasive stink bug species in Europe):

✕
thumbnail = CloudGet["https://wolfr.am/JNu0LzR8"]; |

Finally, we can combine my thumbnail image, the custom color function and the previous positions-per-year function in `GeoListPlot` (with other styling tweaks enclosed in iconized code) to display successive sightings per year starting in 2001, using `Show` to superimpose the data onto our background:

✕
distributionYear[year_Integer] := Show[bgImage, GeoListPlot[ Table[positionsYear[y], {y, 2001, year}], ({ PlotMarkers -> Point, PlotStyle -> Table[ Directive[ colorYear[y], Opacity[0.5], PointSize[0.0045]], {y, 2001, #}], GeoRange -> Entity["Country", "UnitedStates"], PlotLegends -> None, ImageSize -> 550, Epilog -> { Inset[ Framed[ Style[ ToString[#], colorYear[#], FontSize -> 36, FontFamily -> "Helvetica"], Background -> Directive[White, Opacity[0.2]], FrameStyle -> Transparent, RoundingRadius -> 5], Scaled[{0.5, 0.5}]], Inset[ Column[{ ImageResize[thumbnail, 75], Style["Halyomorpha halys", Italic, Bold, FontSize -> 13, LightGray]}, Alignment -> Right], Scaled[{0.86, 0.23}]]}}& )[year]], GeoRange -> Entity["Country", "UnitedStates"]] |

The resulting map shows the accumulation of data points up to a given year, along with a central label showing the year:

✕
distributionYear[2017] |

Now that we can generate distribution maps of each year, it is quite easy to assemble them into an animated GIF. We’re essentially creating a `ParallelTable` of the different years’ maps, then exporting as an animation (repeating the final frame for a longer viewing time):

✕
Export["Halys_US_Distribution.gif", ParallelTable[ distributionYear[y], {y, Append[Range[2001, 2019], 2019]}], "DisplayDurations" -> 0.6, "AnimationRepetitions" -> Infinity] |

Seeing the result assembled into an animation, it’s immediately apparent that the reported stink bug sightings have increased dramatically and over an increasing amount of territory as the years continue. Over 20 years, there is seemingly unrestricted growth—likely due to the absence of natural predators like the samurai wasp, which help control stink bug populations in their native habitat in East Asia. And thanks to the contributions of citizen scientists on platforms like iNaturalist, we are able to use their data to map these sightings more accurately.

When I posted my GIF to Reddit it touched a nerve, with more than 41,000 Reddit users engaging with the post! And circling back to my twin brother Bernat: he had shown the GIF to some friends in Providence, Rhode Island, where he was living back then—and two of them told him that they had already seen the GIF on the front page of Reddit (demonstrating a different kind of “invasive” distribution—internet virality!).

The distribution map has provided an easily comprehensible confirmation of a shared experience: the invasive brown marmorated stink bug has spent the past two decades becoming a near-ubiquitous nuisance. But even more significantly, we can see that they are a serious threat in the “breadbasket” states (like Illinois), where a majority of the country’s food crops are grown. The West Coast, geographically opposite the location of the stink bug’s original arrival, has also developed its own share of problems with the insect. Even California—where orchards and crops are already threatened by fire and drought—isn’t safe. Accounts of an independent West Coast arrival event (confirmed by genetic testing) are clearly borne out by the distribution map data.

We may be irritated by the one, five or even some thousands of stink bugs we encounter ourselves, but seeing 20 years’ worth of an invasive species’ unabated spread across the nation better speaks to the full weight of imbalanced biodiversity. I encourage you to sign up for crowdsourced sites like iNaturalist to become a citizen scientist, and start contributing to science with your own observations. As more people engage in the conversation, the better chance we have to stop a 30-year retrospective on the stink bug invasion!

Join us—share your own computational explorations or stink bug observations via the Wolfram Community thread that I initially started, or check out my other posts on biodiversity. Every data point is important!

“Brown Marmorated Stink Bug.” National Invasive Species Information Center, USDA.

www.invasivespeciesinfo.gov/profile/brown-marmorated-stink-bug.

Rice, K. B., C. J. Bergh, E. J. Bergmann, et al. “Biology, Ecology, and Management of

Brown Marmorated Stink Bug (Hemiptera: Pentatomidae).” *Journal of Integrated
Pest Management* 5.3 (2014): A1–A13.

Get full access to the latest Wolfram Language functionality with a Wolfram|One trial. |

We’ve gathered some of our favorite Wolfram Community posts showing the variety of applications made possible with the Wolfram Language.

The *p*-value is a crucial concept in statistics—and yet, it’s also a commonly misunderstood one. People regularly misstate or misrepresent its meaning, and doing so can have devastating consequences in research and the outcomes therein. Seth Chandler, a Foundation Professor of Law at the University of Houston, clearly defines the notoriously slippery concept in his computational essay, posted to Wolfram Community in a functional cloud notebook. Want to play with Seth’s code yourself? Copy the cloud notebook to your own cloud account!

Claudio Chaib, a technical manager from São Paulo, Brazil, uses data from the Wolfram Knowledgebase to design `VitaminData`. This new function quickly compiles crucial information about vitamins, providing you with chemical and physical data, classification, 2D and 3D visualizations, various properties, food information and nutritional data. In this post, Claudio also ranks foods with the highest amount of different vitamins—bringing all the top-ranking candidates together would make for quite an interesting meal.

Locating borderlines between countries is easy enough with any map—but what about finding a border’s coordinates? Wolfram’s own Mads Bahrami wrote a computational essay showing all the steps needed to find the exact coordinates between any two countries.

Investigating an increase in stink bug sightings, fellow Wolfram employee Jofre Espigule-Pons uses observation data to create a GIF of its spread. His GIF showing the progressive proliferation of the brown marmorated stink bug across the United States sparked many a conversation—ranking second on Reddit’s home page, with more than 41,000 upvotes on r/dataisbeautiful, and close to 4,000 comments in 26 communities across Reddit. Keep an eye out for more in Jofre’s blog post, coming soon!

Silvia Hao, CEO at Glimscape Technology in China, shows how to implement a neural network to compute the Mandelbrot set in a tour de force of beautiful Wolfram Language graphics. Silvia creates a custom function that can more efficiently generate visualizations of Mandelbrot sets at high resolutions.

How prepared are you for your next getaway? Gather and analyze the information you need about prospective Airbnb rentals, using rental reviews and machine learning algorithms. And even if your ideal vacation is staying in, you can use the code from Anton Antonov, co-founder of Accendo Data LLC in Florida, on other machine learning datasets as well—including restaurant and product reviews.

Kotaro Okazaki, a Japanese inventor from Fujitsu Limited, creates a spectacular visualization of the Wolfram Neural Net Repository. The visually stunning, informative diagram represents the breadth, depth and diversity of models available.

Diego Zviovich uses Mathematica for both work and leisure: at his job as a supply chain director at the Coca-Cola Company, as well as for his hobbies, which include performing financial portfolio simulations. We get a glimpse of what this hobby entails in his Community post, where Diego formulates and simulates a simple economic model to explore the gap between the rich and the poor, and demonstrates the emergence of a wealth gap following a Pareto distribution.

While autumn may be associated with pumpkin picking, carving and pie-baking, Jofre Espigule-Pons demonstrates that pumpkins can be used for computational experiments all year round! Venture into pumpkin forensics by determining the appearance of the whole pumpkin from only one slice, find out a pumpkin’s nutritional value or use math to create a 3D pumpkin—all made possible with the power of the Wolfram Language.

If you haven’t yet signed up to be a member of Wolfram Community, now is the perfect time to do so! You can join in on discussions similar to the ones highlighted in this blog, post your own work in groups that cover your interests and browse the complete list of Staff Picks.

]]>Looking to fulfill your New Year’s resolution of learning new data science skills? Join Wolfram U for Wolfram Technology in Action: Data Science, a three-part web series demonstrating a range of data science applications in the Wolfram Language. These 90-minute sessions feature recorded talks from the 2019 Wolfram Technology Conference, along with live presentations by Wolfram staff scientists, application developers, software engineers and Wolfram Language users who apply the technology every day to their business operations and research.

Newcomers to Wolfram technology are welcome, as are longtime users wanting to see the latest functionality in the language.

This series brings the latest in a broad range of data science areas covered at the conference. Each session is designed to highlight a different aspect of doing data science with Wolfram technology.

The series kicks off on January 22 with several data analysis examples using built-in Wolfram Language functionality, including custom-built Twitter analytics, data mining of imaginary maps and the creation of an automated reporting system.

In the second session on January 29, speakers will present data-intensive applications across several computational fields. Topics include image processing for noninvasive cancer screening, running large-scale field campaigns in atmospheric research and structuring diverse data sources for machine learning and analysis.

Finally, the February 5 session will explore analysis and visualization of different kinds of data. Presenters will discuss game AI planning analytics, computational taxonomy and sunny day flooding.

You can join any or all of the webinars to benefit from the series. Signing up once will save your seat for all sessions in the series. When you sign up, you’ll receive an email confirming your registration, as well as reminders for upcoming sessions.

Register now to find out what other Wolfram Language users and experts are accomplishing in data science!

Visit our Wolfram U calendar to sign up for all our special events and instructor-led courses, or check out our Data Science & Statistics page for the latest courses in data science. |

Yellowstone National Park has long been known for its active geysers. These geysers are a surface indication of subterranean volcanic activity in the park. In fact, Yellowstone is actually the location of the Yellowstone Caldera, a supervolcano: a volcano with an exceptionally large magma reservoir. The park has had a history of many explosive eruptions over the last two million years or so.

I’ve found that the United States Geological Survey (USGS) maintains data on the various volcanic calderas and related features, which makes it perfect for computational exploration with the Wolfram Language. This data is in the form of SHP files and related data stored as a ZIP archive. Thanks to the detail of this available data, we can use the Wolfram Language and, in particular, `GeoGraphics` to get a better picture of what this data is telling us.

First, I want to examine the area of interest. The following code generates a relief map of the data within 50 miles of the center of the Yellowstone Caldera:

✕
GeoGraphics[GeoBackground -> "ReliefMap", ImageSize -> 600, GeoCenter -> Entity["Volcano", "Yellowstone"], GeoRange -> Quantity[50, "Miles"]] |

Although there are some interesting features visible in the data, it’s not immediately clear where the volcanic calderas are, although some possible candidates are visible as flat areas among the surrounding mountainous terrain.

In order to gain a better understanding of the region, let’s pull in some additional data from the USGS in the form of an SHP file and the corresponding support files stored in the ZIP archive. Ideally, we would be working with the data straight from the Wolfram Data Repository, which makes storing, importing and computing data easy. It may be some time before this data is submitted, due to the complexity of the SHP file format and sheer volume of metadata; so in the meantime, this is a good example of working with data “in the wild,” without the benefit of the standardization found in the Data Repository.

In order to use this data, we need to understand the datum used so it can be properly transformed for representation using `GeoGraphics`:

✕
csi = Import["https://www.wsgs.wyo.gov/gis-files/caldera.zip", "CoordinateSystemInformation"] |

Next, I extract the data and labels:

✕
shpdata = Import["https://www.wsgs.wyo.gov/gis-files/caldera.zip", "Data"]; |

The data has various “properties” that can be extracted:

✕
shpdata[[1, All, 1]] |

For my purposes, I only need the `"Geometry"` and `"LabeledData"` properties:

✕
labeleddata = "LabeledData" /. shpdata[[1]] |

✕
labels = "LTYPE" /. labeleddata; |

✕
lines = "Geometry" /. shpdata[[1]]; |

With the data extracted, it’s now in suitable shape to add to our map.

Let’s see what the data can reveal to us. I can use the `"CoordinateSystemInformation"` we extracted to transform the data via `GeoGridPosition`:

✕
convertedlines = lines /. Line[data_] :> Line[GeoGridPosition[data, "UTMZone12", "NAD831986"]]; |

Now let’s give this some style! I choose some distinct colors to represent the various labels found in the data:

✕
color["Outer Ring Fault"] = White; color["Inner Ring Fault"] = Red; color["Resurgent Dome"] = Yellow; color["Caldera Rim"] = Black; |

Then I combine the colors and labels, and add a tooltip:

✕
coloredtooltip[line_, label_] := {color[label], Tooltip[line, label]} |

For a sense of scale, I include the state outlines for Wyoming, Idaho and Montana, with the data drawn using colored lines:

✕
GeoGraphics[{EdgeForm[Black], GeoStyling[Opacity[.4]], Red, Polygon[Entity[ "AdministrativeDivision", {"Montana", "UnitedStates"}]], Purple, Polygon[Entity[ "AdministrativeDivision", {"Idaho", "UnitedStates"}]], Blue, Polygon[Entity[ "AdministrativeDivision", {"Wyoming", "UnitedStates"}]], Thickness[.003], coloredtooltip @@@ Transpose[{convertedlines, labels}]}, GeoBackground -> "ReliefMap", ImageSize -> 600, GeoCenter -> Entity["Volcano", "Yellowstone"]] |

The extent of the Yellowstone calderas is quite impressive! Now that we have an idea of the larger scope, I want to try and get a bit more detail with a closer look at the area.

A useful surface marker we can use as a reference point is the location of Yellowstone Lake, which we’ll visually distinguish with a white rectangle:

✕
yellowstonelake = Rectangle[GeoPosition[{Latitude[#[[1]]], Longitude[#[[2]]]}], GeoPosition[{Latitude[#[[3]]], Longitude[#[[4]]]}]] &@ Entity["Lake", "YellowstoneLake::5n289"][{"SouthernmostPoint", "WesternmostPoint", "NorthernmostPoint", "EasternmostPoint"}]; |

Zooming in on Yellowstone with a radius of 50 miles will show more detail:

✕
GeoGraphics[{EdgeForm[Black], GeoStyling[Opacity[.4]], Red, Polygon[Entity[ "AdministrativeDivision", {"Montana", "UnitedStates"}]], Purple, Polygon[Entity[ "AdministrativeDivision", {"Idaho", "UnitedStates"}]], Blue, Polygon[Entity[ "AdministrativeDivision", {"Wyoming", "UnitedStates"}]], {EdgeForm[], White, Opacity[.5], yellowstonelake}, Thickness[.007], coloredtooltip @@@ Transpose[{convertedlines, labels}]}, GeoBackground -> "ReliefMap", ImageSize -> 600, GeoCenter -> Entity["Volcano", "Yellowstone"], GeoRange -> Quantity[50, "Miles"]] |

As a result of visually consolidating the available data and its properties, we can better understand the interactions and relationships between them. This map represents a flattened timescale of the Yellowstone calderas’ activity and their surrounding area, shaped over many millennia. We can see that some of the faults and rims of the various calderas correspond to visible surface features. A pair of resurgent domes (the yellow circular features) can be found within the data; these domes are areas where uplift occurred due to follow-up eruptions within older calderas. Through its visualization, the data reveals several calderas in the location of the largest—which we know by name as the Yellowstone Caldera.

Analyzing even this one dataset with the Wolfram Language allows us some perspective into geographical features that appear to be immutable from our point of view, yet are actually full of life and transformation. And fortunately for us, there’s a surfeit of data out there, just waiting to be given form—I look forward to seeing what you can create out of it!

Before I give strict definitions, here is the intuitive version of an integer partition via an example: . However, don’t add up! Just think of the sum 14 as being broken up into the four parts: 7, 3, 3, 1. Now the standard additive question is, how many ways are there of breaking 14 into parts? In other words, how many partitions of 14 are there? As we often say, the Wolfram Language has a function for that:

✕
PartitionsP[14] |

I’ll explain the pieces of the problem at hand as we go along; consider this a succinct abstract:

Two infinite lower-triangular matrices related to integer partitions are inverses of each other. The matrix ν comes from an additive analogue of the multiplicative Möbius μ function, while γ comes from counting generalized complete partitions; a complete partition of *n* has all possible subsums 1 to *n*.

First I’ll set up the function definitions we will use.

✕
Ferrers@p_ := Framed@Grid[Table["\[FilledCircle]", #] & /@ p] |

✕
ConjugatePartition[l_List] := Module[{i, r = Reverse[l], n = Length[l]}, Table[n + 1 - Position[r, _?(# >= i &), Infinity, 1][[1, 1]], {i, l[[1]]}]] |

✕
DistinctPartitionQ@x_ := Length@x == Length@Union@x |

✕
DistinctPartitions@n_ := Select[IntegerPartitions@n, DistinctPartitionQ] |

✕
PartitionMu@\[Lambda]_ := If[DistinctPartitionQ@\[Lambda], (-1)^Length@\[Lambda], 0] |

✕
DistinctPartitionsByMax[n_, m_] := \!\(TraditionalForm\`DistinctPartitionsByMax\)[n, m] = Select[IntegerPartitions@n, (Sort@# == Union@#) && (Max@# == m) &] |

✕
PartitionsMuByMax[n_, m_] := PartitionsMuByMax[n, m] = Length@DistinctPartitionsByMax[n, m] - 2 Length@Select[DistinctPartitionsByMax[n, m], EvenQ@*Length] |

✕
PartitionsMuByMax@r_ := Table[PartitionsMuByMax[i, j], {i, r}, {j, i}] |

✕
\[Nu]@r_ := PadRight@Table[PartitionsMuByMax[i, j], {i, r}, {j, i}] |

✕
KStepPartitionQ[\[Lambda]_, k_] := MemberQ[Range@k, Last@\[Lambda]] && And @@ Table[\[Lambda][[j]] - k <= Total@Drop[\[Lambda], First@Last@Position[\[Lambda], \[Lambda][[j]]] ], {j, -1 + Length@\[Lambda]}] |

✕
KStepPartitionQ[0, _] := {{0}} |

✕
KStepPartitions[n_, k_] := Select[IntegerPartitions@n, KStepPartitionQ[#, k] &] |

✕
KStep[0, k_] := 1 |

✕
KStep[n_, k_] := KStep[n, k] = Length@KStepPartitions[n, k] |

✕
CompletePartitionQ@p_ := KStepPartitionQ[p, 1] |

✕
CompletePartitions[n_] := KStepPartitions[n, 1] |

✕
Complete[n_] := KStep[n, 1] |

✕
pre\[Gamma]@r_ := Table[KStep[i - 1, j - 1], {i, r}, {j, r}] |

✕
\[Gamma]@r_ := PadRight@Table[KStep[i - j, j - 1], {i, r}, {j, i}] |

✕
StrictCompositions[n_] := Join @@ Permutations /@ IntegerPartitions[n] |

✕
StrictCompositionsByMax[n_, m_] := Total[-(-1)^(Length /@ Select[StrictCompositions@n, Max@# == m &])] |

✕
\[Sigma]@r_ := PadRight@Table[StrictCompositionsByMax[n, m], {n, r}, {m, n}] |

✕
\[Alpha]@r_ := PadRight@Table[1, {i, r}, {j, i}] |

✕
\[Chi]@r_ := PadRight@Table[If[Mod[n, k] == 0, MoebiusMu[n/k], 0], {n, r}, {k, n}] |

Let’s establish the definitions for a multiset and an integer partition:

- A multiset is a collection of elements (like a set) where an element can occur more than once (unlike a set).
- An integer partition of a positive integer is a multiset of positive integers (called its parts) that sum to . In math, we write .

For example, .

In Mathematica, we use a list:

✕
{3, 1, 1} // Total |

Since the elements of a multiset and a set are unordered, we can arbitrarily choose to order the parts of a partition from largest to smallest. Here are the integer partitions of 5:

✕
IntegerPartitions[5] |

Here they are again more compactly:

✕
Row /@ IntegerPartitions@5 |

An older alternative definition is along these lines: “A partition is a way of writing an integer *n* as a sum of positive integers where the order of the addends is not significant…. By convention, partitions are normally written from largest to smallest addends… for example, 10 = 3 + 2 + 2 + 2 + 1.”

With such a definition, 3 + 2 + 2 + 2 + 1 has to be frozen, because as an arithmetic expression it is 10 and the parts are gone.

Yet another definition: is a partition of if the finite sequence is such that and .

For each part of a partition , draw a row of dots, then stack the rows:

✕
Ferrers@{2, 1, 1} |

The conjugate partition of a partition is the partition corresponding to the transpose of the Ferrers diagram of :

✕
Ferrers@ConjugatePartition@{2, 1, 1} |

So is the conjugate partition of , and vice versa.

A distinct partition has no repeated part. Here are the four distinct partitions of 6:

✕
Row /@ DistinctPartitions@6 |

The remaining partitions of 6 have repeated parts:

✕
Row /@ Complement[IntegerPartitions@6, DistinctPartitions@6] |

This is the sequence counting the number of distinct partitions of :

✕
PartitionsQ@Range[20] |

The number of partitions of is but the next number is not 13:

✕
PartitionsP@Range[12] |

The generating function for this sequence is:

✕
Row[{Sum[PartitionsP@n x^n, {n, 12}], " + \[Ellipsis]"}] |

The generating function is equal to the infinite product .

The number of distinct partitions of :

✕
PartitionsQ@Range[12] |

The generating function for this sequence is:

✕
Row[{Sum[PartitionsQ@n x^n, {n, 12}], " + \[Ellipsis]"}] |

It is equal to the infinite product .

A square-free integer is one that is not divisible by a square greater than 1. Here are the square-free numbers up to 100:

✕
Select[Range@100, SquareFreeQ] |

Here are numbers up to 100 that are not square free:

✕
Select[Range@100, Not@*SquareFreeQ] |

In multiplicative number theory, the Möbius μ function is defined on the positive integers as follows:

- If is not square free, .
- If is square free, then can be written as the product of distinct primes, for some positive integer . In that case, .

In other words, of a square-free integer is or according to whether has an odd or an even number of prime factors. For example, , , .

The function is the partition analogue of the ordinary Möbius function :

✕
Text@Grid[{ {"\[Mu]", "\!\(\*SubscriptBox[\(\[Mu]\), \(P\)]\)"}, {, }, {"product", "partition"}, {"primes factors", "parts"}, {"square\[Hyphen]free", "distinct"} }, Alignment -> Left, Dividers -> {{False, True}, {False, True}}] |

The definition of :

- Let if the partition has a repeated part.
- If the partition has distinct parts and parts in all, .

Here are the partitions of 6 and the corresponding values of the Möbius partition function :

✕
Grid[{Row@#, PartitionMu@#} & /@ IntegerPartitions@6, Alignment -> {Right, Left}] |

The prime example of an infinite lower-triangular matrix is Pascal’s triangle . Imagine that the rows keep going down and the columns keep going to the right. For readability, let’s replace 0s with dots:

✕
MatrixForm[t10 = Table[Binomial[n, k], {n, 0, 9}, {k, 0, 9}], TableAlignments -> Right] /. 0 -> "\[CenterDot]" |

Here is the matrix product :

✕
MatrixForm[t10.t10, TableAlignments -> Right] /. 0 -> "\[CenterDot]" |

Here is the matrix inverse of :

✕
MatrixForm[Inverse@t10, TableAlignments -> Right] /. 0 -> "\[CenterDot]" |

The Stirling numbers of the first and second kind are another example of a pair of inverse lower-triangular matrices.

A Stirling number of the first kind counts how many permutations of have cycles:

✕
(s1 = Table[StirlingS1[n, k], {n, 8}, {k, 8}]) /. 0 -> "\[CenterDot]" // MatrixForm |

A set partition of a finite set, say , is a set of disjoint nonempty subsets of .

A Stirling number of the second kind counts how many set partitions of have subsets:

✕
(s2 = Table[StirlingS2[n, k], {n, 8}, {k, 8}]) /. 0 -> "\[CenterDot]" // MatrixForm |

The two matrices are inverses of each other:

✕
Row[{MatrixForm@s1, Style[" \[CenterDot] ", 24], MatrixForm@s2, Style[" = ", 24], MatrixForm[s1.s2]}] /. 0 -> "\[CenterDot]" |

For square matrices and , if , then . As the Demonstration The Derivative and the Integral as Infinite Matrices shows, there are (very familiar) infinite matrices and such that is the identity matrix, but .

Even though infinite lower-triangular matrices with 1s on the main diagonal behave well, we only deal with matrices, where .

Define the matrix by , where the sum is over and , .

Here is an alternative definition.

Let be the partitions of into an odd number of parts with maximum part .

Let be the same, except the number of parts should be even.

Let be the number of elements in .

Then , with .

Here is :

✕
\[Nu]@10 /. 0 -> "\[CenterDot]" // MatrixForm |

To verify that , look at the partitions of 10:

✕
Row /@ IntegerPartitions@10 |

The ones with maximum part 5 are:

✕
Select[IntegerPartitions@10, Max@# == 5 &] |

Applying to each of those gives:

✕
PartitionMu /@ % |

Minus the sum is 2, so , as claimed.

As Jacobi wrote, “Always invert!” (referring to elliptic integrals). This is :

✕
Inverse[\[Nu]@15] /. 0 -> "\[CenterDot]" // MatrixForm |

What is the sequence in the second column, ?

You can find the sequence at the OEIS by looking it up. That hits A126796: Number of complete partitions of *n*, which is a great start in understanding the matrix !

For the matrix γ, let’s look at subpartitions and subsums of a partition. A subpartition of a partition is a submultiset of . For instance, is a subpartition of . A subsum is the sum of a subpartition. So there are eight ( subsums of corresponding to the eight subpartitions of :

✕
Text@Grid[ Transpose@{Prepend[Subsets@{3, 1, 1}, "subpartition"], Prepend[Total /@ Subsets@{3, 1, 1}, "subsum"]} ] |

Now let’s look at complete partitions. We define a partition to be complete if it has all possible subsums .

Here are the five complete partitions of 6:

✕
Row /@ Select[IntegerPartitions@6, CompletePartitionQ] |

And here are the partitions of 6 that are not complete:

✕
Row /@ Select[IntegerPartitions@6, Not@*CompletePartitionQ] |

This is the sequence counting the number of complete partitions of :

✕
Complete /@ Range[0, 10] |

Consider the partition 7311. We get the subsums 1, 2, 3, 4, 5 easily from 311. But we cannot get 6, so 7311 is not complete.

Qualitatively, if a part is too large relative to the other parts, we cannot get some intermediate subsums. Park’s condition makes this precise.

**Theorem** (Park): A partition with is complete iff and for each , , .

For example, is not complete (no subsum is 3) because .

Using Park’s condition, it is easy to check—but only if you want!—that the conjugate of a distinct partition is a complete partition.

Given a non-negative integer , define a partition to be -step iff and for each , , . Define the empty partition to be the only zero-step partition.

Clearly, a one-step partition is a complete partition:

✕
Row /@ CompletePartitions@5 |

Here are the -step partitions of 5, for :

✕
Row /@ KStepPartitions[5, 1] |

✕
Row /@ KStepPartitions[5, 2] |

✕
Row /@ KStepPartitions[5, 3] |

✕
Row /@ KStepPartitions[5, 4] |

This is the same as the partitions of 5 with no restrictions:

✕
Row /@ KStepPartitions[5, 5] |

Define to be the number of -step partitions of :

✕
pre\[Gamma]@10 /. 0 -> "\[CenterDot]" // MatrixForm |

The second column is the number of complete partitions of is .

Define the matrix by , . In words, the columns of are the number of -step partitions shifted down to form a lower-triangular matrix.

Here is the matrix :

✕
\[Gamma]@10 /. 0 -> "\[CenterDot]" // MatrixForm |

It matches the inverse of :

✕
Inverse@\[Nu]@10 /. 0 -> "\[CenterDot]" // MatrixForm |

We can now put everything together. The inverse of the matrix matches the matrix , which is the main theorem for this blog.

**Theorem**. For each , , the identity matrix.

This presents the situation when :

✕
Row[{\[Nu]@6 /. 0 -> "\[CenterDot]" // MatrixForm, Style[" \[CenterDot] ", 16], \[Gamma]@6 /. 0 -> "\[CenterDot]" // MatrixForm, Style[" = ", 16], IdentityMatrix@6 /. 0 -> "\[CenterDot]" // MatrixForm}] |

Hanna conjectured that

, (1)

where is the sequence that counts the number of complete partitions of .

**Proof**: Rewrite the desired identity as

(2)

or

, (3)

where the last sum is over all complete partitions of .

We claim every partition contains a maximal complete subpartition. For example, has the maximal complete subpartition . If the maximal subpartition of partitions , then cannot be a part of the original partition . If it were, we could insert it into , contradicting its maximality.

Furthermore, there is no constraint on the parts in larger than because the fact that is missing in means that no larger complete subpartition can be produced. Hence generates all partitions whose maximal complete subpartition is a partition of .

Summing over all gives (3), and consequently (1).

Identifying coefficients for like powers of proves that , the second column of . The straightforward bookkeeping generalization then proves the main theorem for the other columns.

Here is a proof of the main theorem by example. Consider the dot product of row 10 of with :

✕
Row[{MatrixForm@{Last[\[Nu]@10]}, " \[CenterDot] ", MatrixForm@\[Gamma][10][[All, 2]]}] |

An entry from is the difference between the number of distinct partitions of odd and even length. Here are these partitions:

✕
Table[Row /@ DistinctPartitionsByMax[10, m], {m, 10}] |

Here are the complete partitions counted in the third column of :

✕
MatrixForm@Join[{{}, {}}, Table[Row /@ CompletePartitions[i], {i, 6}]] |

Count them up and recall that the sequence for the number of complete partitions starts like this:

✕
Complete /@ Range[6] |

Consider the fifth term in the dot product: 2×2. It comes from all possible pairs .

That is:

,

,

,

.

The signs of those pairs are all negative, because the four distinct partitions all have an odd number of parts. Using β, we will find four other terms in the dot product that have the opposite sign to get cancellation.

Let be the set of distinct partitions and be the set of complete partitions.

Define the function as follows.

Let and .

- If is even, then .
- If is odd, then .

In other words:

- Add the second-largest part of to the first part and adjoin to .
- Drop the largest part of from and in , subtract from the largest part and adjoin to .

In the previous example, we had four pairs. Here is how changes them:

,

,

,

.

The resulting pairs are still . However, the odd length becomes an even length, giving the cancellation. Also, reverses itself, so we get complete cancellation.

Formally, the function changes the parity of the length of the distinct partition and is an involution on the set of pairs. Therefore, the dot product is zero.

A composition of is a finite sequence of non-negative integers with sum . So unlike an integer partition, order matters. For example, the two compositions and are different.

Allowing 0 as a part only make sense if the number of parts is specified.

A strict composition of is a finite sequence of positive integers with sum . Here are the strict compositions of 4:

✕
Row /@ (C4 = StrictCompositions@4) |

Let be the number of parts of the composition . Here are the lengths of the compositions just shown:

✕
Length /@ C4 |

As is for partitions, so is for strict compositions. Let’s define the matrix by , where , . The sum is over all strict compositions of with maximum part and is the number of parts of .

For example, for , , these are the strict compositions:

✕
Row /@ Select[C4, Max@# == 2 &] |

Three have odd length and one has even length, so .

Just like the matrix before, we define the matrix by , :

✕
\[Sigma]@10 /. 0 -> "\[CenterDot]" // Grid |

Take the inverse of . What are these numbers?

✕
Inverse[\[Sigma]@10] /. 0 -> "\[CenterDot]" // Grid |

Looking up the second column in the OEIS leads to A002321, which is enough to lead to a conjecture. To formulate it, we define two lower-triangular matrices and .

Let be the lower-triangular matrix of all 1s:

✕
\[Alpha]@10 /. 0 -> "\[CenterDot]" // Grid |

Define the lower-triangular matrix by where :

✕
\[Chi]@10 /. 0 -> "\[CenterDot]" // Grid |

Let’s calculate the matrix product:

✕
\[Alpha][10].\[Chi][10].\[Alpha][10] /. 0 -> "\[CenterDot]" // Grid |

That matrix product matches the inverse of :

✕
Inverse[\[Sigma][10]] /. 0 -> "\[CenterDot]" // Grid |

This led me to conjecture that .

Who will prove it?

The relevant OEIS triangles are A134542, A134541, A000012 and A054525.

It is remarkable that the two kinds of partitions should be connected so simply by matrix inversion. That strict compositions are related to the multiplicative Möbius function again via matrix inversion amazes me. Are there more such pairs in the universe of additive number theory?

Andrews, G., G. Beck and B. Hopkins. “On a Conjecture of Hanna Connecting Distinct Part and

Complete Partitions.” *Annals of Combinatorics*, forthcoming.

Brown, J. L. “Note on Complete Sequences of Integers.” *The American Mathematical Monthly* 68,

no. 6 (1961): 557–60.

Hoggatt, V. E and C. H. King. “Problem E-1424.” *The American Mathematical Monthly*
67 (1960):

593.

MacMahon, P. A. *Combinatory Analysis*, vol. 1. Cambridge: Cambridge University Press,
1915.

OEIS Foundation Inc. (2019), *The On-Line Encyclopedia of Integer Sequences*, oeis.org.

Park, S. K. “Complete Partitions.” *Fibonacci Quarterly* 36, no. 4 (1998): 354–60.

Park, S. K. “The *r*-Complete Partitions.” *Discrete Mathematics* 183 (1998): 293–97.

Schneider, R. “Arithmetic of Partitions and the *q*-Bracket Operator.” *Proceedings of the
American Mathematical Society* 145 (2017): 1953–68.

In CCD photometry, we want to be able to determine a measure of the amount of radiation coming from a given star arriving on our CCD detector. Plotted as a function of time, this measurement can reveal important information about the star or star system.

The Wolfram Language is an ideal tool for extracting and analyzing this data because it combines image processing, data reduction, plotting, curve fitting, file import and astronomical data into one package. In this post, I will review some of the basic techniques of stellar aperture CCD photometry using the Wolfram Language.

With a Cepheid variable star, there is a strong correlation between the star’s light curve and the star’s intrinsic brightness, or absolute magnitude. This period-luminosity relationship, first published in 1912, is called Leavitt’s law, after Henrietta Swan Leavitt, based on her work at the Harvard College Observatory.

Leavitt worked as a “human computer” at the observatory, manually extracting observation data from glass plates, for which she was paid $0.30 per hour (eventually).

For Population I Cepheids, Leavitt’s law relates the `Log` of the star’s period (in days) to the mean absolute magnitude:

✕
absoluteMagnitude[P_] := -2.81*Log10[P] - 1.43 |

✕
absoluteMagnitude[3.0] |

Here is a log-linear plot of the data:

✕
LogLinearPlot[absoluteMagnitude[p], {p, .1, 100}, { PlotTheme -> "Scientific", GridLines -> Automatic, FrameLabel -> {"Period (days)", "Absolute Magnitude"}, ImageSize -> 800}] |

Since the observed apparent magnitude diminishes the further the star is from Earth, the distance (in parsecs) can be determined from the period and the apparent magnitude:

✕
distanceInParsecs[AbsoluteMagnitude_, ApparentMagnitude_] := Power[10.0, (ApparentMagnitude - AbsoluteMagnitude + 5)/5]; |

✕
distanceInParsecsFromPeriod[P_, ApparentMagnitude_] := Power[10.0, (ApparentMagnitude - absoluteMagnitude[P] + 5)/5]; |

In 1924, Edwin Hubble identified a Cepheid variable in what was then known as the “Andromeda Nebula.”

By extracting its absolute magnitude from the Cepheid’s light curve and comparing it to the observed magnitude, he was able to show the distance to the star was millions of light years and prove that it was part of an entirely separate galaxy—thus vastly expanding the size of the known universe.

We must make some assumptions about the images in order to compare them against each other. We calibrate the image frames by doing the following:

- Subtracting dark frames (removes thermal noise)
- Subtracting flat frames (removes illumination variation)
- Aligning image frames (allows photometric comparison across multiple images)

These steps allow us to compare different stars across a CCD field and get accurate magnitude comparisons. Bias noise associated with reading the CCD array is automatically subtracted out by the main photometry computation.

First, divide the area around our star into three regions:

The center disk isolates photons coming from the star. The next ring is actually a gap, an empty space separating the star aperture from the sky annulus. The third and outermost ring represents the sky annulus, which isolates sky background photons.

Simple image processing techniques can split apart these sections of the image around the target star so that we can total up the image values (and thus the photon count ) in each section. The way we do this is to generate a disk or an annular graphic in the color white, with everything else being black. This mask image is then multiplied onto the target image (since an image is just an array of numeric data values), which results in an image with everything black (value = 0.0) except for the area where we want to total up the data.

We can use the Wolfram Language to generate these disk and annular masks to isolate the regions on the raster:

✕
starAperture[im_, starDiameter_] := Module[{id = ImageDimensions[im]}, Rasterize[ Graphics[{Black, Rectangle[{0, 0}, id], White, Disk[{id[[1]]/2, id[[2]]/2}, starDiameter/2]}, PlotRangePadding -> None, ImagePadding -> None], RasterSize -> id]]; |

✕
skyAnnulus[im_, innerDiameter_] := Module[{id = ImageDimensions[im]}, Rasterize[ Graphics[{Black, Rectangle[{0, 0}, id], White, Annulus[{id[[1]]/2, id[[2]]/2}, {innerDiameter/2, id[[1]]/2}]}, PlotRangePadding -> None, ImagePadding -> None], RasterSize -> id]]; |

We can apply this to any image. For this example, in the spirit of Edward Weston, let’s use these masks to isolate parts of an image of some peppers:

✕
peppers = ExampleData[{"TestImage", "Peppers"}] |

We can find the star aperture of the peppers:

✕
ImageMultiply[peppers, starAperture[peppers, 200]] |

And here we find the “sky annulus” of the peppers image:

✕
ImageMultiply[peppers, skyAnnulus[peppers, 300]] |

We can see that the remaining ring of peppers is the “sky annulus,” with the black circle in the center accounting for the star aperture and gap.

To compute the total star brightness, we subtract the sky background. First, we isolate the star of interest from the full CCD frame:

✕
isolateRegion[im_Image, {row_, column_}, radius_] := ImageTake[ im, {row - radius, row + radius}, {column - radius, column + radius}]; |

With the star isolated, we subtract the sky background and compute the total signal coming from the star:

✕
Photometry[im_Image, {row_, column_}, starApertureDiameter_, skyAnnulusInnerDiameter_, skyAnnulusOuterDiameter_] := Module[{baseImage, starImage, skyImage, skyBackgroundPerPixel, starTotal, skyTotal, skyAnnulusArea, starApertureArea}, (*Carve out area just around the star*) baseImage = isolateRegion[im, {row, column}, skyAnnulusOuterDiameter/2.0]; (*Extract the star image*) starImage = ImageMultiply[baseImage, starAperture[baseImage, starApertureDiameter]]; (*Extract the pixels in the sky annulus*) skyImage = ImageMultiply[baseImage, skyAnnulus[baseImage, skyAnnulusInnerDiameter]]; (*Total signal from star*) starTotal = Total@Flatten@ImageData@starImage; (*Total signal from sky annulus*) skyTotal = Total@Flatten@ImageData@skyImage; skyAnnulusArea = Pi*(skyAnnulusOuterDiameter^2 - skyAnnulusInnerDiameter^2)/4.0; (*Compute the per pixel sky signal*) starApertureArea = Pi*(starApertureDiameter^2)/4.0; skyBackgroundPerPixel = skyTotal/skyAnnulusArea; (*Subtract out the total sky background from the star image*) starTotal - starApertureArea*skyBackgroundPerPixel ]; |

The goal in photometry is to obtain a series of brightness values as a function of time. For this, we need an accurate observation time for each value.

Astronomical image data is traditionally collected in the form of FITS (Flexible Image Transport System) data files, which Mathematica can easily import. Each FITS file, in addition to image data, can contain a wealth of metadata describing the details of the observation associated with the image, including the time and date the image was taken. For convenience, we are going to represent the image time in the form of a Julian date.

A Julian date is a real number representing the number of days that have transpired since noon Universal Time on January 1, 4713 BC (on the Julian calendar). Many FITS files contain the observation time as a Julian date. Even if a FITS file doesn’t contain an explicit Julian date, we can still derive that value from other time information that is often found in the FITS metadata.

For convenience, let’s use an `Association` to bundle together the FITS image and the metadata into a FITS object:

✕
LoadFITSObject[imagePath_String] := Association[{"Image" -> Image[Import[imagePath, {"Data", 1}], "Real64"], "MetaInformation" -> Import[imagePath, {"FITS", "MetaInformation"}][[1, "GENERAL INFORMATION"]]}]; |

Unfortunately, there are several ways that the observation date can be represented in the metadata of FITS files, so we will need to determine which method we want to use for the remainder of our calculations:

✕
GetJulianDateAlt[fitsObject_] := Module[{date, time, do, jd}, date = StringSplit[ fitsObject[["MetaInformation", "DATE-OBS", 1]], {"-", "/"}]; time = StringSplit[fitsObject[["MetaInformation", "UT", 1]], ":"]; If[time == {"KeyAbsent"}, time = StringSplit[fitsObject[["MetaInformation", "TIME-OBS", 1]], ":"]]; do = DateObject[ToExpression /@ { date[[3]], date[[1]], date[[2]], time[[1]], time[[2]], time[[3]]}]; JulianDate[do] ]; |

✕
GetJulianDate[fitsObject_] := Module[{jd}, jd = fitsObject[["MetaInformation", "JD", 1]]; If[jd == "KeyAbsent", jd = GetJulianDateAlt[fitsObject]]; jd ]; |

Now we can extract brightness as a function of time by using a (non-varying) comparison star of a known magnitude:

✕
PhotometryObs[fitsObject_, star_List, compstar_List, compStarMag_, starDiam_, gapDiam_, skyDiam_] := Module[{imgobj, jd, ms, mcs, diffMag}, (* Get the image from the FITS object *) imgobj = fitsObject[["Image"]]; (* Get the Julian date of the obs time from FITS object *) jd = GetJulianDate[fitsObject]; (* Target star brightness *) ms = Photometry[imgobj, star, starDiam, gapDiam, skyDiam]; (* Reference star brightness *) mcs = Photometry[imgobj, compstar, starDiam, gapDiam, skyDiam]; (* Compute the magnitude difference *) diffMag = -2.5 Log10[ms/mcs]; (* Return the {julian date, magnitude difference pair, offset to the mag of the reference star *) {jd, diffMag + compStarMag} ]; |

We can finally put our code to the test by analyzing a Cepheid variable light curve, a synthetic star field. For this example, we are using a set of synthetic star fields provided as teaching examples from this website. Select “Guest” at the prompt, and download all the FITS files from the S366 directory into a local folder for processing. You can also download the set here.

First we form a list of the names of the FITS files that contain our data:

✕
fileNames = FileNames[ FileNameJoin[{NotebookDirectory[], "Data", "Cepheids", "*.fts"}]]; |

Then load “bundled” FITS objects by mapping the `LoadFITSObject` function onto this list:

✕
cepheidImages = LoadFITSObject /@ fileNames; |

Thankfully, we have a star chart provided with this set that we can `Import`:

✕
Import[FileNameJoin[{NotebookDirectory[], "Data", "Cepheids", "S366Chart.jpg"}]] |

Load the first image so we can locate the Cepheid on it:

✕
ImageAdjust[cepheidImages[[1, "Image"]]] |

It is easy to locate the row and column of the target star by clicking the image of the star field in the notebook and using editing features to examine the star. Once we have the `{ row, column}` coordinates of the star and a reference star on the image, we can extract the photometric data as a function of time, adjusted to the magnitude of the reference star (here taken to be 10.0).

From the examination of the image, we use a star diameter of 6 pixels, the outer gap diameter of 8 pixels and the sky annulus outer diameter of 20 pixels. We can then get the list of apparent magnitudes sorted by their Julian dates by mapping `PhotometryObs` onto the list of FITS objects and then sorting:

✕
cepheidImages[[1]] |

✕
sortedData = SortBy[PhotometryObs[#, {513, 513}, {533, 289}, 10.0, 6, 8, 20] & /@ cepheidImages, First]; |

Our data points are of the form `{ Julian Date, Apparent Magnitude}`. Let’s retrieve the first data point:

✕
sortedData[[1]] |

We can plot our data points to see if there are any visible patterns. Here we plot the Cepheid light curve, starting from the earliest observation:

✕
startJD = sortedData[[1, 1]]; |

✕
offsetSortedData = {#[[1]] - startJD, #[[2]]} & /@ sortedData; |

✕
ListLinePlot[offsetSortedData, PlotTheme -> "Scientific"] |

After establishing our data points, our next valuable step is to analyze the data. Let’s try a simple fit to a `Sin[]` function to establish the period of the Cepheid’s cycle:

✕
model = Association[ FindFit[offsetSortedData, Subscript[M, v] + A Sin[\[Omega] t + \[Phi]], {Subscript[M, v], A, \[Omega], \[Phi]}, t]] |

Here, is the apparent visual magnitude. However, we can do better. `FindFit` isn’t the most accurate way to determine the frequency parameter ω when fitting to periodic data. For periodic data, a fast Fourier transform (FFT) is a traditional method that can be used to convert time series data to frequency space; however, this technique requires data points that are regularly spaced. Unfortunately, photometric data isn’t usually of this type, because observations may be spaced over days, weeks or longer.

The best way to analyze data like this is to use `IrregularPeriodogram`, which is available in the Wolfram Function Repository. I’m not going to go into the math behind `IrregularPeriodogram` here, but if you’re interested, you can refer to the references cited at the end of this article. `IrregularPeriodogram` grew out of the needs of the astronomical community, which frequently needs to analyze data from non-regularly-spaced observations.

✕
observationTimes = offsetSortedData[[All, 1]]; |

✕
observationValues = offsetSortedData[[All, 2]]; |

`IrregularPeriodogram` needs the data to be zero offset from the mean:

✕
observationValues2 = observationValues - Mean[observationValues]; |

Now we can call `IrregularPeriodogram` from the Function Repository and plot the results:

✕
Plot[ResourceFunction["IrregularPeriodogram"][w, observationTimes, observationValues2], {w, .1, 1}, PlotPoints -> 300, PlotRange -> All, AxesLabel -> {"\[Omega] (radians)", "\!\(\*SubscriptBox[\(P\),\(D\)]\)(\[Omega])"}, ImageSize -> 800] |

This shows a strong peak near .4 radians, which corresponds to the fundamental angular frequency of the data. Numerically, this frequency can be determined by `NArgMax`:

✕
freq = NArgMax[{ResourceFunction["IrregularPeriodogram"][w, observationTimes, observationValues2], 0 <= w <= 10}, w] |

The period that corresponds to the lowest-order frequency is the period of the variable star in days:

✕
Period = 2 \[Pi]/freq |

Using our distance-period relationship and the apparent visual magnitude of the star, we can then compute the distance in parsecs:

✕
distanceInParsecsFromPeriod[Period, model[Subscript[M, v]]] |

For those who’d like to trek a little further into stellar aperture photometry, you can check out the eclipsing binary star BU Vulpeculae from the same University of Iowa website. I hope you’ve enjoyed this post, and that you’ll find it useful for your own exploration of the stars—see you out there!

Clarke, D. “String/Rope Length Methods Using the Lafler–Kinman Statistic.” *Astronomy & Astrophysics*. 386 (2002): 763–74.

Deeming, T. J. “Fourier Analysis with Unequally Spaced Data.” *Astrophysics and Space Science*. 36.1

(1975): 137–58.

Lafler, J. and T. D. Kinman. “An RR Lyrae Star Survey with the Lick 20-Inch Astrograph
II. The

Calculation of RR Lyrae Periods by Electronic Computer.” *Astrophysical Journal Supplement*. 11

(1965): 216.

Lomb, N. R. “Least-Squares Frequency Analysis of Unequally Spaced Data.”
*Astrophysics and Space Science* 39 (1976): 447–62.

Scargle, J. D. “Studies in Astronomical Time Series Analysis. II Statistical Aspects of
Spectral

Analysis of Unevenly Spaced Data.” *The Astrophysical Journal*. 263 (1982): 835–53.