Flag Analysis with Mathematica
If I have any problems with Mathematica, it’s that once I’m empowered with the ability to get answers, I can’t help but ask more questions. So it was when a conversation about flags, which are included in Mathematica‘s CountryData, got out of hand. “What are the most popular colors?” I was asked.
Looking at them all together, it’s hard to tell.
What if we look at the textual descriptions and see which words come up most? First let’s pull together a list of color names from another data source.
We can then search for matches in the text. Here I’m taking the prosaic descriptions and breaking them into a list of words, then using set intersection to find those words that are also in our list of colors.
Here are the first ten.
A quick bit of counting and sorting of the full list and we see which are the most popular colors.
So now if I ever set up my own country, I know what are fashionable choices for the flag colors. But surely some of those combinations have already been taken? We don’t want the embarrassment of turning up at the UN with the same flag as someone else. Let’s restrict ourselves to a two-color flag from the most popular seven choices.
Next, let’s eliminate those that already exist.
Finally, let’s sort those according to our popularity score.
This leaves us with red and gold as the most fashionable unique two-color choice for a flag. What about three-color flags? Just change the value in the Subsets command to 3 and you find that it is red, green, blue.
Does the shade matter? Using text analysis is a pretty limited way to look at the colors. Consider my own UK flag and my southern neighbors’ flag for France.
They are both red, white, and blue. But not the SAME red, white, and blue or,
probably, the same amounts of red, white, and blue. If we are going to be
properly analytical, the words “red, white, and blue” are not a good enough description.
What if we use the actual color data from the raster images of the flags and take into account the number of pixels of each? I am, of course, drawn into the new question.
I’m using a somewhat arbitrary way of deciding whether the colors are of the same Norm of the difference of the color vectors < 0.4. There are other ways, and using a smaller tolerance will judge closer colors to be different. The code works by counting the number of pixels of distinctly different color, dividing the count through by the total number of pixels, and then sorting into descending popularity.
Let’s test our function on the Spanish flag.
I can determine from the chart that it is 51.6% red.
We need a function to collect together this data for all the countries and merge values from different flags that are roughly the same color.
Now we simply analyze all the world’s flags and use this function to aggregate the data.
The result is pretty complicated, so I want words and pictures to understand it. First, I need a name to describe the resulting colors.
This new function searches ColorData for the best match, e.g. we find that the combined intensity of 50% red, 20% green, and 70% blue is best described by Mathematica as DeepCobaltViolet.
Now I want a custom version of the BarChart function, with bars in the right colors and labels using this new name function.
Here is the result.
There we have an answer—if you put all 237 flags of the world together, the greatest area covered is red, followed by white and green. There are many significantly different shades of blue in use, with something around midnight blue (labeled Ultramarine on the chart) being the most popular.
But a new question comes to mind! What if we took into account the populations of the countries? There are more people represented by the colors of the Chinese and Indian flags than there are by the flags of Pitcairn Islands and Cocos Keeling Islands.
All we need to do is repeat the step of aggregating the data, but this time multiply each color area by the population of the country in question.
So now we know all about the popularity of individual colors, but I feel a new question coming on. If we average out the colors, do we see a trend across the map?
Let’s start by creating an average color out of a flag. Just take the mean of all the pixels.
Then inspect the result.
The uneven contrasting patchwork is a little surprising. After a little thought, I have a hypothesis: the historical point of flags was to tell your side from your enemy’s in battle, and the enemy was most often your neighbor. The hypothesis would assert that substantial local variance is deliberate. Can we confirm that? Oh no, another question! How to resolve this one? Well, what if we consider the color difference between the averaged flag and the averaged flag of a neighboring country? If I am right, the value will be higher than the difference between that flag and a non-neighboring country.
Of course, we need to average this over all the neighbors and all the non-neighbors to see if there is a general bias.
Let’s test.
This claims that the average difference in color of the averaged Belgian flag and its neighbors France, Germany, Luxembourg, and the Netherlands is 0.45 by my color metric. Slightly less than the 0.46 difference between Belgium and the rest of the world, making my theory wrong. Perhaps the Belgians are particularly peaceful! Let’s finish by running it for the entire world (excluding island nations that have no neighbors) and taking the average.
My theory of warring nations is significantly wrong. There is 10% less average color difference between neighbors. Perhaps my eyes are just too good at seeing color differences?
To get a smoother image, let’s apply a topological moving average. Each color is averaged with its immediate neighbors. However, I think the trends that we see are more influenced by a few well-connected countries with strong colors exerting undue influence on the moving average.
Probably better would be to use a moving average based on distance rather than shared borders. But by now there are lots of questions in my mind… is there a pattern to order of colors, number of colors, or orientation of patterns?
Of course Mathematica can provide the answers, but I’ve decided to stop here as it would also, no doubt, provoke even more questions.