For almost seven years now, millions of people have used the computational powers of Wolfram|Alpha to check their own predictions and explore fascinating topics in math, science, history, linguistics, culture, the arts, and more. Take Brian A. Carr from Jackson, Wyoming, for example. He is a classical scholar and career firefighter with Jackson Hole Fire/EMS. Carr uses Wolfram|Alpha for a variety of interests, “from helping to explain geometrical and mathematical concepts related to my study of early Greek mathematics to investigating the physics of room-and-contents fires.”

Wolfram|Alpha “was a lifesaver in my graduate work,” says Samantha Howard, who is now a physicist in the United States Air Force. Howard, who received her master’s degree in applied physics from the Air Force Institute of Technology, first discovered Wolfram|Alpha during her undergraduate years at the USAF Academy.

“I love Wolfram|Alpha because it helps me solve real-world problems while I’m on the road,” says Andre Koppel, a financial analyst. In addition to finding the natural language interface invaluable for solving complex math questions in his work, Koppel uses Wolfram|Alpha for everyday things like measuring food and drink quantities at restaurants or checking the tides to see if it’s safe to go into coastal waters.

Carr, Howard, and Koppel are only a few representatives of our huge community of users who are working with Wolfram|Alpha every day to explore new horizons. In honor of Mathematics Awareness Month and to encourage mathematics curiosity and exploration, we are offering 20% off subscriptions to Wolfram|Alpha Pro through April 30, 2016. Visit our website, use the promo code MATHMONTH20OFF, and start exploring!

]]>

After four hundred years, Shakespeare’s works are still highly present in our culture. He mastered the English language as never before, and he deeply understood the emotions of the human mind.

Have you ever explored Shakespeare’s texts from the perspective of a data scientist? Wolfram technologies can provide you with new insights into the semantics and statistical analysis of Shakespeare’s plays and the social networks of their characters.

William Shakespeare (April 26, 1564 (baptized)–April 23, 1616) is considered by many to be the greatest writer of the English language. He wrote 154 sonnets, 38 plays (divided into three main groups: comedy, history, and tragedy), and 4 long narrative poems.

I will start by creating a nice `WordCloud` from one of his famous tragedies, *Romeo and Juliet*. You can achieve this with just a couple lines of Wolfram Language code.

First, you need to get the text. One possibility is to import the public-domain HTML versions of the complete works of William Shakespeare from this MIT site:

Then make a word cloud from the text, deleting common stopwords like “and” and “the”:

As you can see, `DeleteStopwords` does not delete all the Elizabethan stopwords like “thou,” “thee,” “thy,” “hath,” etc. But I can delete them manually with `StringDelete`. And with some minor extra effort, you can improve the word cloud’s style a great deal:

Now let’s analyze a tragedy more deeply. Wolfram|Alpha already offers a lot of computed data about Shakespeare’s plays. For example, if you type “Othello” as Wolfram|Alpha input, you will get the following result:

If you want to visualize the interactions among the characters of this tragedy via a social network, you can achieve this with ease using the Wolfram Language. As I did earlier with the word cloud, I need to first import the texts. In this case I want to work with all the acts and scenes from *Othello* separately:

Since I want to import and save the scenes for later use in the same notebook’s folder, I can do the following:

In order to create the `Graph`, I first need all the character names, which will be displayed as vertices. I can gather the names by noticing that each dialog line is preceded by a character name in bold, which in HTML is written like this: <b>Name</b>. Thus it is straightforward to get an ordered list containing all character names (“speakers”) from each dialog line using `StringCases`:

Then, using `Union` and `Flatten`, I can obtain the names of all the characters in the tragedy of *Othello*:

Once I have the vertices, I need to create the edges of the graph. In this case, an edge between two vertices will represent the connection between two characters that are separated by less than two lines within the dialog (similar to the Demonstration by Seth Chandler that analyzes the networks in Shakespeare’s plays). For that purpose, I will use `SequenceCases` to create all the edges, i.e. pairs of lines separated by less than two lines:

Before creating the graph, I need to delete the edges that are duplicated or are equivalent, like OTHELLO↔IAGO and IAGO↔OTHELLO, and the edges connecting to themselves, i.e. IAGO↔IAGO:

Finally, you can specify the size of the vertices with the `VertexSize` option. For example, I want the vertices’ sizes to be proportional to the number of lines per character. I can get the number of lines per character with `Counts`:

After this, I can use a logarithmic function to rescale the vertices to a reasonable size. I will also improve the design with `VertexStyle` and `VertexLabels`.

Since the code is getting more cumbersome, I will omit it and show only the result (for those interested in the details of the code, you can find them in the attached notebook). Also note that in the final result I’m excluding the vertex “All” since it is not a real character in the dialog:

So far, so good. Having the social network from a Shakespeare play written more than four hundred years ago is quite cool, but I’m still not 100% satisfied. I would like to visualize when these interactions occur within the dialog itself. One way to achieve this is by representing each main speaker with a different-colored bar:

Note: `linesColor` is a list of colors representing the lines in one scene, and `linesLength` is the list of the lines’ `StringLength` with a rescaling function. These functions involve some `TextManipulation`, like I did earlier to obtain the character names from the HTML version of the play. If you wish, you can see their construction in the attached notebook:

Additionally, I can mark when a particular character says a particular word—for example, the word “love” (note: the variable `words` is the list of words per line in the scene, created with the new function `TextWords`; see the attached notebook for details):

Now I can combine all of this with the social network graph and have a colorful and compact infographic about a Shakespeare tragedy:

There are so many other interesting things that I would like to explore about Shakespeare’s works and life. But I will finish with a map representing the locations at which his plays occur. I hope you got a glance of what is possible to achieve with the Wolfram Language. The only limits are our imagination:

For a few places, the `Interpreter` fails to find a `GeoPosition`, so I used `Cases` to obtain all the successfully interpreted locations:

Finally, I’m using `Geodisk` to depict geopositions by disks with a radius proportional to the number of times each location appears in Shakespeare’s plays:

Many fellow Wolfram users expressed keen interest in and came up with astonishing approaches to Shakespeare’s corpus analysis on Wolfram Community. We hope this blog will inspire you to join that collaborative effort exploring the mysteries of Shakespeare data.

Download this post as a Computable Document Format (CDF) file.

]]>This marathon is one of the six Abbott World Marathon Majors: the Tokyo, Boston, Virgin Money London, BMW Berlin, Bank of America Chicago, and TCS New York City marathons. If you are looking for things to add to your bucket list, I believe these are great candidates. Given the international appeal, let’s have a look at the runners’ nationalities and their travel paths. Our `GeoGraphics` functionality easily enables us to do so. Clearly many people traveled very far to participate:

The vast majority, of course, came from the US:

Let’s create a heat map to see the distribution of all US runners. As expected, most of them are from Chicago and the Midwest:

What did the race look like in Chicago? Recreating the map in the Wolfram Language, taking every runner’s running times, and utilizing my coworker’s mad programming skills, we can produce the following movie:

As you can see, the green dot is the winning runner. I am red, and the median is shown in blue. This movie made me realize that while the fastest runner was already approaching the most northern point of the course, I was still trying to meet up with my running partner! The purple bars indicate the density of runners at any given time along the race course. You might wonder what the gold curve is. That would be the center of gravity given the distribution of the runners.

The dataset also includes age division and placement within age group, gender and placement within gender group, all split times, and overall placement. The split times were taken every 5 km, at the half-marathon distance, and, of course, at the finish line. The following image illustrates the interpolated split times for all participants after deducting the starting time of the winning runner:

The graphic reflects several things about this race: runners were grouped into two waves, A and B, depending on their expected finishing time. This is illustrated by the split around 2,500 seconds at the starting line. Within each wave, runners were then grouped into corrals. Again, faster runners started in earlier corrals. Thus the later runners got started, the slower they were overall. The resulting slower split times are expressed in a much faster rise of the corresponding lines. It also took 4,503 seconds, a little over 75 minutes, for all runners to get started. In contrast, the last person crossed the finish line 19,949 seconds after the winner of this race. I was neither…

Let’s take a more detailed look at everyone’s start and finish in absolute time. We’re letting the first runner start at 0 seconds by subtracting his time from all participants’. The red dots indicate the mean of the finish time for runners with the exact same starting time:

Again, the two waves are clearly visible. The smaller breaks within each wave indicate the corral changes. But what caught my eye was the handful of people preceding the first wave. Because the dataset provides us with the names of the participants, I was able to drill down and find out whose data I was looking at: it is the “Athletes with Disabilities” (AWD), as the group is named by the Chicago Marathon administration. Checking back with the schedule of events, I was able to confirm that this group started eight minutes ahead of the first wave.

Let’s investigate a bit more and see what we can learn about this group. Of course, the very first person to cross the starting line is part of this group. Everyone else started very closely around him. We can query for the AWD subgroup by looking for everyone who started within a generous 200 seconds of the first person. We find that there were 49 members in this group:

Here is the plot of their start and finish times. It is equivalent to a zoom on the 0-second start line in the above plot:

Due to their physical disabilities, many of these runners were joined by one or two guides who helped them navigate the course. With our `Nearest` functionality, we can try to identify such groups. We just need to gather everyone’s time stamps, convert them to `UnixTime`, and define our `Nearest` function:

Let’s find the group of nearest people for all 49 runners by limiting the variations of their time stamps to 10 seconds over the course of the race:

Out of the 49 runners, we find that 35 ran in 15 groups of 2 or more people:

These are the groups we could identify:

I tip my hat to everyone who participated in this race. But I am in awe of people running a marathon with a physical disability. I would like to give them, as well their guides, a special shoutout!

Did I run with someone? As mentioned above, I sure did. I am lucky to have my next-door neighbor Michael as my running partner. Cursing and whining during a long run is a lot easier if you have someone on your side. Otherwise you just look crazy while mumbling to yourself. Let’s build the `Nearest` function:

Then we can apply it to the entire dataset. Any result of length greater than 1 indicates a running group. We find that 2,784 runners ran in 1,394 groups:

There were 1,329 groups of 2, 62 groups of 3, and 3 groups of 4. The latter were:

By the way, you will not find my and Michael’s names in any of these groups. Why? Because there was nothing in this world that could keep Michael from his tenth attempt to finish the marathon in under four hours—whereas halfway through the race I had to give in to that nagging voice telling me to take a break and walk. Just taking the first half of the race into account, here we are:

We finished only three minutes apart, but that can be a whole lot of time during a marathon. Michael came in just under four hours; I barely missed that time.

Now let’s take a look at how the race progressed split by split. The following histograms show how participants’ split times compared to the mean time at each split distance:

Interestingly, for each split the curve shows a little bump just before the 0 marker, which indicates the mean split time. To find out which runners these might be, we have to consider who the participants are. The vast majority are recreational marathon runners. We hope to stay injury free and maybe achieve a personal record, but our goal is to have a great experience and a rush of endorphins. We are not there to win and collect prize money. But, as Michael did above, one thing that people might attempt is to break the illusive four-hour mark. To beat four hours, a runner—let’s call her “Molly”—has to average 341.517 s/km, or 9 minutes and 9 seconds per mile:

To make sure Molly comes in *under* four hours, let’s assume she runs at a pace five seconds faster per kilometer, 336.517 s/km. By not allowing any change of pace, we are basically turning Molly into a robot. But let’s see where her split times (indicated in red) fall compared to the mean at each of the kilometer markers. Indeed, Molly’s split times match the “hump,” and thus are a representation of all runners trying to finish the marathon in less than four hours:

As can be seen in the above histograms, with each split we plot more bins representing fewer runners, while the variations from the mean steadily increase. Here is another look at the same fact, just from a different angle. Again taking the differences of the runners’ split times to the mean, and then sorting them from smallest to largest, we can see how the differences between the fastest and slowest runners steadily increase over the course of the race:

Again, the group of people trying to finish in under four hours is nicely visible in the small hump to the left of the *y* axis. How many people did make it in under four hours? We could not make this number up: it was exactly 11,111 people, or 29.7% of all participants:

As mentioned above, I could not keep pace with Michael after about halfway through the race. But let’s look at “keeping pace” and how consistent people ran their race. The dataset provides all the information we need to look at everyone’s average pace and absolute variations from it at each split. Adding up those variations per person gives us the following picture:

The maximum of accumulated variations from the average pace is around 10 minutes. I averaged 9 minutes and 16 seconds:

My variations from that average added up to almost three minutes:

In the charts below, we are looking at the distribution of those variations versus a runner’s finishing time. Since a slower runner takes more time between splits and thus automatically accumulates more minutes and more variations, we additionally normalized the pace variation by the corresponding finishing time:

Of course, these pace variations cause people to pass each other. Let’s have a quick look at how often this happened. We counted an amazing 276,121,258 occurrences of runners’ position changes. Below is an illustration. Inside the attached notebook, please hover over the data points to see the number of takeovers at a given distance:

To explain the numerous peaks, we should have another look at the race. Every mile or two, aid stations were providing runners with fluids, medical assistance, and other necessities. These aid stations were about two city blocks long, giving runners plenty of opportunities to move through and to avoid crowds. Consider the aid stations on the map:

Also consider their locations along the course by using our new `GeoDistanceList` function:

We can nicely match the peaks with the locations of the aid stations. At each of these points, a huge number of runners change their paces, resulting in the jump in takeovers. While taking in fluids, one runner might choose to walk while another just slows down but continues to run. A third runner might not utilize the station at all and run through it. Turns out I am not very gifted when it comes to drinking while running, so I walk whenever necessary.

Interestingly, a `Histogram3D` of time versus distance versus the number of takeovers looks like the city of Chicago itself:

Running a marathon does not just take a good number of months of training, battles with injuries, and bouts of laziness (as well as a good sense of the craziness of this endeavor). It also takes a financial commitment. Race registration and travel costs can add up to an intimidating sum of money. This made me wonder if there is a correlation between travel distance and finishing time, i.e. can I assume that the farther you have to travel and the more money you have to spend on the event, the better you are as a runner? The following plot shows the finishing time versus travel distance to the US. Upon hovering inside the notebook, you can see the runners’ countries, their finishing times, and their overall placement in the race:

Clearly my assumption is incorrect. We do see a small number of runners from Kenya and Ethiopia who traveled thousands of miles and came in first. But we also see runners who traveled all the way from India, New Zealand, Indonesia, Swaziland, and Singapore who finished in more than six or seven hours. The means for these countries are all around six hours.

Let’s see if another assumption can be proven wrong, e.g. if the travel expense is not as prohibitive as thought, does the number of runners from a country decrease with increasing travel distance? And could it be true that the more runners a country has in the race, the higher its GDP per capita is? In the notebook, hover over each data point in the charts below to see the country, number of runners from that country, and travel distance or GDP per capita:

The data is not as obvious as one might think. More than 28,000 participants came from the US, whereas only a single person came from countries such as Réunion and Mauritius. We do have a number of countries with less wealth and only single-runner representation. But the single-runner representation also holds true for Qatar and Luxembourg—both known for their financial muscle.

I’ll admit that the country of origin might not be as much of a statement about the size of one’s wallet or someone’s performance as I might have thought. What about age?

Marathons seem to appeal mainly to people in their mid-twenties to mid-forties. And, of course, the higher your age, the better your chances of winning your division. But what is interesting to see is that this is not actually a sport favoring the younger athletes. The fastest times were achieved by the 40–44 age division. So I might still have my Olympic years ahead of me!

To add a note of obscurity: have you ever considered if your name is any indication of your performance? Or if there are other runners by your name in this exact race? There are many shared first and last names. If you were a “Cabada” or a “Zac” in this race, you did awfully well:

You may have guessed the most common first name: there were 641 Michaels. The leading last name was, also not very surprising, “Smith” with a count of 157. Of course, these numbers decrease considerably when we look at shared full names:

And the most common full names and their counts are:

The combination of my family watching on the sidelines, including my mother visiting from Germany, the outstanding work of all the volunteers, and the huge crowds of spectators and the entertainment they provided, all made for a memorable race. Plus the weather, which is usually a liability in Illinois, was just impeccable. Both Michael and I had a blast, which I think is visible using `ImageCollage`:

But as it turns out, not just the event itself was fun. This was a great dataset for me to play around with and learn a lot more about the capabilities of the Wolfram Language. I am not a developer, but I greatly enjoyed this opportunity to combine my professional and personal lives. If you are interested in more scientific approaches to the topic of marathon running, you might find this article and this article intriguing.

But most importantly, registration is now open for the 2016 event!

Download this post as a Computable Document Format (CDF) file.

]]>

*Handbook of Mathematics*, sixth edition

This guidebook to mathematics by I. N. Bronshtein, K. A. Semendyayev, G. Musiol, and H. Muhlig contains a fundamental working knowledge of mathematics, which is needed as an everyday guide for working scientists and engineers, as well as for students. This newer edition emphasizes those fields of mathematics that have become more important for the formulation and modeling of technical and natural processes, namely numerical mathematics, probability theory and statistics, and information processing. Besides many enhancements and new paragraphs, new sections on geometric and coordinate transformations, quaternions and applications, and Lie groups and Lie algebras have also been included.

*Advanced Calculus Using Mathematica: Notebook Edition*

Keith Stroyan’s latest work is a complete text on calculus of several variables written in Mathematica notebooks. The eText has large, movable figures and interactive programs to illustrate things like “zooming in” to see “local linearity.” In addition to lots of traditional-style exercises, the eText also has sections on computing with Mathematica. Solutions to many exercises are in closed cells of the eText.

*Handbook of Linear Partial Differential Equations for Engineers and Scientists*, second edition

Including nearly 4,000 linear partial differential equations (PDEs) and a database of test problems for numerical and approximate analytical methods for solving linear PDEs and systems of coupled PDEs, Andrei D. Polyanin and Vladimir E. Nazaikinskii have created a comprehensive second edition of their handbook. The book also covers solutions to numerous problems relevant to heat and mass transfer, wave theory, hydrodynamics, aerodynamics, elasticity, acoustics, electrodynamics, diffraction theory, quantum mechanics, chemical engineering sciences, electrical engineering, and other fields.

*Mathematical Science of the Developmental Process* (Japanese)

This book by Takashi Miura uses Mathematica to introduce and explore the developmental process, in which a simple, spherical, fertilized egg becomes a complex adult structure. The process is very difficult to understand, and the mechanism behind it has yet to be elucidated. Since no fundamental equation of this process has been established, we need prototyping processes, which means quick formulation of simple phenomenological models and verification by simulation and analysis.

*Single Variable Calculus with Early Transcendentals*

A comprehensive, mathematically rigorous exposition, this text from Paul Sisson and Tibor Szarvas blends precision and depth with a conversational tone to include the reader in developing the ideas and intuition of calculus. A consistent focus on historical context, theoretical discovery, and extensive exercise sets provide insight into the many applications and inherent beauty of the subject.

If you are not a programmer but you need to analyze data, Sergiy Suchok’s new book will show you how to use Mathematica to take just a few strings of intelligible code to solve huge tasks, from statistical issues to pattern recognition. If you’re a programmer, you will learn how to use the library of algorithms implemented in Mathematica in your programs, as well as how to write algorithm testing procedures. Along with intuitive queries for data processing and using functions for time series analysis, we will highlight the nuances and features of Mathematica, allowing you to build effective analysis systems.

*Mathematica: A Problem-Centered Approach*, second edition

The second edition of Roozbeh Hazrat’s textbook introduces the vast array of features and powerful mathematical functions of Mathematica using a multitude of clearly presented examples and worked-out problems. Based on a computer algebra course taught to undergraduate students of mathematics, science, engineering, and finance, the book also includes chapters on calculus and solving equations, as well as graphics, thus covering all the basic topics in Mathematica. With its strong focus on programming and problem solving, and an emphasis on using numerical problems that do not require any particular background in mathematics, this book is also ideal for self-study and as an introduction for researchers who wish to use Mathematica as a computational tool.

*Estructuras Discretas con Mathematica* (Spanish)

This book by Enrique Vilchez Quesada provides a theoretical and practical overview for students studying discrete structures within the curriculum of computer engineering and computer science. The major contribution of this work, compared to other classical textbooks on this subject, consists of providing practical solutions to real-world problems in the context of computer science by creating different examples and solutions (programs, in most cases) using the renowned commercial software Mathematica.

Looking for more Wolfram technologies books? Don’t forget to visit Wolfram Books to browse by both topic and language!

]]>In 1828, an English corn miller named George Green published a paper in which he developed mathematical methods for solving problems in electricity and magnetism. Green had received very little formal education, yet his paper introduced several profound concepts that are now taught in courses on advanced calculus, physics, and engineering. My aim in writing this post is to give a brief biography of this great genius and provide an introduction to `GreenFunction`, which implements one of his pioneering ideas in Version 10.4 of the Wolfram Language.

George Green was born on July 14, 1793, the only son of a Nottingham baker. His father noticed young George’s keen interest in mathematics, and sent him to a local school run by Robert Goodacre, a well-known science popularizer. George studied at Goodacre Academy between the ages of eight and nine, and then went to work in his father’s bakery. Later he ran a corn mill built by his father in Sneinton, near Nottingham. He is said to have hated his work at the bakery and the corn mill, and regarded it as annoying and tedious. In spite of his onerous duties, George appears to have continued studying mathematics in his spare time, retreating to the top floor of the 16-meter-high mill, shown above, for this purpose. In 1828, he published the results of his rigorous self-study in “An Essay on the Application of Mathematical Analysis to the Theories of Electricity and Magnetism,” one of the most influential mathematical papers of all time.

Green’s paper of 1828 introduced the potential function, which is well known to students of physics. He also proved a form of Green’s theorem from advanced calculus in this paper. Finally, he introduced the notion of a Green’s function that, in one form or another, is familiar to students of engineering, and is the theme for this post. By sheer chance, Sir Edward Bromhead, a founder of the Analytical Society, purchased and read a copy of Green’s paper. With his encouragement, Green entered Gonville and Caius College in Cambridge University at the age of forty, and eventually became a fellow of the college. He continued to publish papers until his untimely death in 1841, possibly due to lung complications arising from his work at the corn mill. Sadly, recognition for his mathematical work had to wait until 1993, when a plaque was dedicated to his memory in Westminster Abbey. Today, the Green’s Mill and Science Centre in Nottingham carries on the work of promoting George Green’s reputation as one of the greatest scientists of his age.

I will now give an introduction to `GreenFunction` using concrete examples from electrical circuits, ordinary differential equations, and partial differential equations.

The basic principle underlying a Green’s function is that, in order to understand the response of a system to arbitrary external forces, it is sufficient to understand the system’s response to an impulsive force of the `DiracDelta` type.

As an illustration of the above principle, consider a circuit that is composed of a resistor *R* and an inductor *L*, and is driven by a time-dependent voltage *v*[*t*], as shown below:

The current *i*[*t*] in the circuit can then be computed by solving the differential equation:

*L* *i*´(*t*)+*R* *i*(*t*)==*v*(*t*)

Let’s assume that the voltage source is a battery supplying a unit voltage. Next, suppose that you close the switch *S* for a fleeting moment at time *t* = *s* and then quickly throw it open again. The current induced in the circuit by this impulsive action can be computed by applying `GreenFunction` to the left-hand side of the above differential equation:

The initial value of the current is assumed to be zero, since the switch was open until time *t* = *s*:

Here is the result given by `GreenFunction` for this example:

The following plot for *s* = 1 shows that the current is 0 for all times *t* < 1, then rises instantaneously to its peak value at *t* = 1, and finally decreases to 0 with the passage of time:

The behavior of the circuit in the above situation is usually called its impulse response, since it represents the response of the circuit to an impulsive voltage.

Next, suppose that you close the switch at time *t* = 0 and leave it closed at all later times. Thus the voltage steps up from its initial value 0 to a constant value 1, and can be modeled using the `HeavisideTheta` function:

The step voltage can be visualized as follows:

You can now compute the current in the circuit by performing the following integral involving the voltage and the Green’s function:

The integral computed above is essentially a weighted sum of the Green’s function with the voltage source at all times *s* prior to a given time *t*, and is called a convolution integral.

As the plot below shows, the current for the step voltage source gradually increases from its value 0 at *t* = 0 to a steady-state value:

The behavior of the circuit in the above situation is usually called its step response, since it represents the response of the circuit to a step voltage.

Finally, suppose that the voltage source supplies an alternating voltage—for example:

You can once again compute the current in the circuit by performing a convolution integral of the voltage with the Green’s function, as shown below:

You can also obtain the result using `DSolveValue` as follows:

As the plot below shows, the current settles down to a steady alternating pattern for large values of the time:

To summarize, the Green’s function encodes all the information that is required to study the response of the circuit to any external voltage. This magical property makes it an indispensable tool for studying a wide variety of physical systems.

The two-step procedure for solving the differential equation associated with a circuit, which I discussed above, can be applied to any linear ordinary differential equation (ODE) with a forcing term on its right-hand side and homogeneous (zero) initial or boundary conditions. For example, suppose you wish to solve the following second-order differential equation:

Assume that the forcing term is given by:

Also, suppose that you are given homogeneous boundary conditions on the interval [0,1]:

As a first step in solving the problem, you compute the Green’s function for the corresponding differential operator (left-hand side) of the equation:

The following plot shows the Green’s function for different values of *y* lying between 0 and 1. Each instance of the function satisfies the zero boundary conditions at both ends of the interval:

You can now compute the solution of the original differential equation with the given forcing term using a convolution integral on the interval [0,1], as shown below:

Here is a plot of the solution, which shows that it satisfies the homogeneous boundary conditions for different values of the parameter *a*:

Green’s functions also play an important role in the study of partial differential equations (PDEs). For example, consider the wave equation that describes the propagation of signals with finite speed, and that I discussed in an earlier post. In order to compute the Green’s function for this equation in one spatial dimension, use the wave operator (left-hand side of the wave equation), which is given by:

Here, *x* denotes the spatial coordinate that ranges over (-∞,∞), t denotes the time that always ranges over [0,∞), and *u*[*x*,*t*] gives the displacement of the wave at any position and time.

You can now find the Green’s function for the wave operator as follows:

The following plot of the Green’s function shows that it becomes 0 outside a certain triangular region in the *x*-*t* plane, for any choice of *y* and *s* (I have chosen both these values to be 0). This behavior is consistent with the fact that the wave propagates with a finite speed, and hence signals sent at any time can only influence a limited region of space at any later time:

The Green’s function obtained above can be used to solve the wave equation with any forcing term, assuming that the initial displacement and velocity of the wave are both zero. For example, suppose that the forcing term is given by:

You can solve the wave equation with this forcing term by evaluating the convolution integral

The following plot shows the standing wave generated by the solution:

Finally, I note that the same solution can be obtained by using `DSolveValue` with homogeneous initial conditions, as shown below:

Green’s functions of the above type are called fundamental solutions and play an important role in the modern theory of linear partial differential equations. In fact, they provided the motivation for the theory of distributions that was developed by Laurent Schwartz in the late 1940s.

The ideas put forward by George Green in his paper of 1828 are stunning in their depth and simplicity, and reveal a first-rate mind that was far ahead of the times during which he lived. I have found it very inspiring to study the life and work of this great mathematician while implementing `GreenFunction` for Version 10.4 of the Wolfram Language.

Download this post as a Computable Document Format (CDF) file.

]]>Our conference gives developers and colleagues a rare opportunity for face-to-face discussion of the latest developments and features for cloud computing, interactive deployment, mobile devices, and more. Arrive early for pre-conference training opportunities, and come ready to participate in hands-on workshops, nonstop networking opportunities, and the Wolfram Language One-Liner Competition, just to name a few activities.

We are also looking for users to share their own stories and interests! Submit your presentation proposal by July 15 for full consideration. Last year’s lineup included everything from political data science to winning hackathon solutions to programming in the Wolfram Cloud… and literally almost everything in between. Review a sampling of the 2015 talks below, or visit our website for more.

Commanding the Wolfram Cloud—Todd Gayley

Computational Politics: The Wolfram Data Drop Meets Election 2016—Evan Ott

Genealogy with the Wolfram Language—Robert Nachbar

Infusing STEM Education with Coding and Discovery: An Open Platform for Publishing Modern Curriculum—Kyle Keane

Valuation Navigator and the Emergence of Real Estate Valuation 3.0—Shashi Rivankar

Ready to see it all firsthand this year? Our conference spans the broadest, most diverse group of tech folks we’ve ever seen—from enthusiastic high schoolers to commercial executives, professors to retirees, and experts in education, commercial law, physics, optics, math, engineering, and so very much more! Register today for this year’s conference to reserve your spot!

]]>How to LEGO-fy Your Plots and 3D Models, by Sander Huisman

This marvel by Sander Huisman, a postdoc from École Normale Supérieure de Lyon, attracted more than 6,000 views in one day and was trending on Reddit, Hacker News, and other social media channels. Huisman’s code iteratively covers layers with bricks of increasingly smaller sizes, alternating in the horizontal *x* and *y* directions. Read the full post to see how to turn your own plots, 3D scans, and models into brick-shaped masterpieces.

Supreme Court Ideological Data, by Alan Joyce

Wolfram’s own Alan Joyce was inspired by a recent *New York Times* article to use the Wolfram Language to explore Supreme Court ideological data and Martin–Quinn scores. While he leaves you to draw your own political conclusions, his visualizations will help you see the Supreme Court’s decisions in a new way. Get started on your own analysis and join the conversation by grabbing the cleaned-up dataset at the end of his Community post.

Implementing Minecraft in the Wolfram Language, by Boris Faleichik

Fans of Minecraft are going to love this one. With some amazingly compact code, Boris Faleichik, a professor from Belarusian State University and past Wolfram One-Liner Competition winner, shows how the Wolfram Language handles Minecraft’s classic game functionality. Have an idea for an improvement? Visit the post on Community and leave a comment!

Find Your Species Name on Darwin’s Birthday, by Jofre Espigule

To celebrate Darwin’s February 12 birthday, Brainterstellar cofounder Jofre Espigule wrote an app to help you find out if there’s a species that shares your name. It works using the Wolfram Language’s built-in species data. Read the full post to see how Espigule split each scientific name into two words, used the `Nearest` function to find the species name closest to a given name, and deployed his app to the Wolfram Cloud.

Using Mathematica to See the World in a Different Light, by Marco Thiel

Marco Thiel from the University of Aberdeen celebrated the United Nations’ Year of Light global initiative with an article on how the Wolfram Language, its wealth of data, and connected devices can be used to keep the Year of Light alive at your home. Part 1 explores how spectra enable us to “see the world in a different light.”

Internet of Things (IoT): Controlling an RGB LED with the Wolfram Cloud, by Armeen Mahdian

Thirteen-year-old Armeen Mahdian’s first post on Wolfram Community caught our attention too. He shared how the Wolfram Cloud can be used in conjunction with an embedded Linux device to create IoT applications. Read his full post to see how he used a BeagleBone Black (BBB) and its IO ports to control an RGB LED using the cloud. Don’t miss Mahdian’s other post on PWM pins.

Cops and Robbers (and Zombies and Humans), by Brian Weinstein

Brian Weinstein, data analyst and grad student at Columbia, uses the Wolfram Language to create mathematical pursuit-evasion games. In these games, the goal is to determine how many pursuers are required to capture a given number of evaders. The GIFs he created show two fun versions—Cops and Robbers and Zombies and Humans.

Visit Wolfram Community to join in on these and other interesting discussions and browse the complete list of Staff Picks. Or share and test your own code, ideas, and apps with Community’s more than 11,000 members.

]]>

This offer extends into this April, Mathematics Awareness Month, which we’re also kicking off today, Monday, March 14. First founded in 1986 as Mathematics Awareness Week, Mathematics Awareness Month aims to increase the public understanding of and appreciation for mathematics and its applications. In honor of this year’s theme, “The Future of Prediction,” we will be offering 20% off subscriptions to Wolfram|Alpha Pro starting today and ending April 30, 2016. With Pro you’ll be able to freely explore the realm of mathematics, get your “What’s next for math?” questions answered, and see how mathematics can make accurate predictions possible in any related field. Visit our website and use the promo code MATHMONTH20OFF to take advantage of this special discount.

Balance is a requirement for many types of rotating machinery, such as electric motors, pumps, fans, turbines, generators, centrifugal compressors, and propellers. Many people know about the balance of their car wheels. If these systems are not properly balanced, the vibration will cause not only reduced efficiency and component fatigue but also disturbances for the environment, such as vibration and noise. The most common methods for balancing rotating machinery are the influence coefficient method and the modal balancing method. The car wheel balancing is, for instance, a subpart of the influence coefficient method.

Wolfram SystemModeler is used for modeling the rotor, and the Wolfram Language for the evaluation of the results. The workflow shows how powerful it is to combine these two softwares.

A disc with mass *m* is mounted on a shaft with stiffness *k*. The rotor rotates with the angular velocity *W*. The disc has an imbalance *u*. The unit for the imbalance is kg*m.

The deflection of the shaft from its rest position will be .

A resonance occurs at . To eliminate the vibration, all you have to do is mount an equal imbalance opposite the existing one. However, in reality it is not that simple. There may be more than one disc. It is often not possible to put the balancing weight on an arbitrary position. You only have certain axial positions, called balancing planes, to work with. In, for instance, a generator or gas turbine, you cannot open up a system and mount weights inside a closed compartment. You most likely need to put the balancing weight close to the bearings. A more realistic example is shown in the first film. It consists of a shaft that carries a flywheel and a gear, along with two smaller discs for balancing. Both the discs and flywheel have an imbalance. Resonance occurs at close to 25 seconds into the film. (Note that the deflections have been scaled 10x).

In film one, it is not possible to place an equal imbalance opposite the existing one in order to balance the rotor. This is due to two reasons. The first is that the imbalance is not known. The second is that it is not possible to put extra masses on the disc and flywheel. This is a very common situation in real applications. Most parts of a rotor are typically not reachable after mounting. Instead, the balancing engineer needs to work with balancing planes that are normally closed to the bearings.

In the example, both the discs and flywheel have an imbalance. This is illustrated with a mass on the outer diameter and denoted as *v*_{1} and *v*_{2} in the figure below. Neither the size nor positions of them are known. There are also two balancing planes, *u*_{1} and *u*_{2}. With two balancing planes, it is possible to correct both a static and dynamic imbalance:

Now, the influence coefficient method is rather straightforward. The vibrations are measured in two different locations, *v*_{1} and *v*_{2}. In this example we measure directly on the discs, but it is more realistic to measure on the bearings. The aim here is to show how the deflections of the discs can be reduced. The vibrations can be measured with displacements, velocities, or acceleration. For the basic principle, it doesn’t matter which one of the measuring methods is used, but in reality the accuracy of the results is dependent on the measuring method. For higher frequencies, measuring acceleration is preferred; for lower frequencies, velocities or even displacements may be a better choice.

The imbalances *u*_{1} and *u*_{2} are known weights mounted on a measured position (radius and angle).

Both *u* and *v* are consequently complex variables, which means that they describe both amplitude and phase. Assuming that the system is linear, the vibration of the rotor can then be described as:

*v*_{1} = *r*_{11} *u*_{1}+ *r*_{12} *u*_{2} + *v*_{10}

*v*_{2} = *r*_{21} *u*_{1}+ *r*_{22} *u*_{2} + *v*_{20}

where *v*_{10} and *v*_{20} are the initial vibrations. In matrix form:

Or:

**v** = **R** **u** +**v**_{0},

where **R** is the receptance of the system. The aim of the balancing is, of course, to eliminate or at least minimize the vibrations *v*_{1} and *v*_{2}.

The procedure is as follows.

1) Run the system without the balancing weights, i.e. *u*_{1} = *u*_{2} = 0. The measurement will give *v*_{10} and *v*_{20}.

2) Apply a test weight at one of the balancing planes—for instance, *u*_{1 t}. The size and direction don’t matter. The measurement now gives *v*_{11} and *v*_{21}:

*v*_{11} = *r*_{11} *u*_{1 t} + *v*_{10}

*v*_{21} = *r*_{21} *u*_{2 t} + *v*_{20},

which gives

3) Remove *u*_{1} * _{t}* and apply a new test weight

4) We now know * R* and

* u* =

Apply these balancing weights, and the vibration will be zero.

The SystemModeler model is built up with standard components and an Euler–Bernoulli beam. The Euler–Bernoulli beam theory does not take into account shear deformation and rotational inertia effects, making it suitable for describing the behavior of long beams. This is typical when the length of the shaft is three or more times the size of the diameter. For shorter beams, the Timoshenko beam theory is more accurate. The difference in this case is one or a few percent on eigenfrequencies, and less for deflections. In this application, we use 16 beam elements. We need to have a component for external damping close to the mass, since we have used “pinned” support. In reality, external damping will include the bearings:

The deflection (*v*_{1} and *v*_{2}) of the flywheel (disc 1) and a gear (disc 2) during a startup from 0 to 40 Hz can be seen in the plot below. A resonance close to 25 Hz (= 1500 rpm) is noted. The high vibration could also be seen in film one:

The aim now is to reduce this vibration as much as possible with the influence coefficient method, and to do this we combine the Wolfram Language and SystemModeler.

Initialize the link between the Wolfram Language and SystemModeler. Set up the working directory and choose the correct model:

Run three different simulations for 40 seconds: the first one without any added test imbalance; the second with 2·0.05 kg m^2 at imbalance plane #1; and the third with 2·0.05 kg m^{2} at imbalance plane #2. In the last two cases, the imbalance is applied at 0°:

Evaluate the phase and deflections for these simulations:

The receptance of the system can now be calculated, and after that, the optimum balancing weights and positions can be calculated:

Finally, run the model with optimal balancing weights:

The rotor used in this example is nonsymmetric in the axial direction, i.e. the masses and the shaft diameters are different. If the system is symmetric, the deflection at 30 Hz after balancing will be less than 0.1%, compared to the deflection before balancing. It is easy to check this by changing the values in the model. Due to the axial asymmetry in this example, does it matter at which speed the balancing is performed? Normally a balancing is preferably performed close to the running speed. With the Wolfram Language, it’s very easy to study this. Simply plot the balancing weights and phases during the run-up:

As can be seen, it is hard to find balance weights when the rotor rotates close to its resonance speed. It would also be wise to wait a while before measuring when a speed is reached; in reality, you need to wait till the temperature etc. has stabilized. But in this blog, we will skip the waiting time. The optimal balanced rotors’ reduction of the vibration amplitude for disc 1 will be:

And for disc 2 will be:

Plot the result:

From the above figure it can be noted that for this simple case, the vibration will be reduced to around 2%–4% of the original vibration during the run-up. The initial noise before five seconds can be ignored. The system has not yet stabilized, and the total deflection is very low.

Below, the difference can be seen more clearly. The two curves with highest vibration occur before the balancing:

The main reason for balancing at the shaft’s operational running speed is that the system normally is more complex than this—for instance, systems with nonlinear supports, more than two bearings, or bent shafts all need to have a stable rotor and (when applicable) oil temperature. Circumstances that give the optimum balancing speed occur at running speed. If the speed varies, a best choice is needed. Exactly how to optimize depends on the application, but with Mathematica a statistical evaluation will be straightforward no matter what approach is chosen.

The behavior after the balancing is shown in the following video. (Note that the deflections have been scaled 10x.)

SystemModeler is a powerful tool for studying advanced problems in rotating machinery. Combined with the Wolfram Language, it gives tremendous opportunities to work with and analyze your models and results. I have shown how rotating machinery can be balanced, and how the balancing speed affects the results. The model can easily be extended to encompass everything from nonlinearities to stochastic sensor noise.

To learn more about what affects the balancing results, I recommend playing around with a model like this one. For instance, will the vibration reduce even further if the “balanced rotor” is balanced once again? If we had used the balancing weights from, say, 5 Hz, what would the vibration at 40 Hz become? How does signal noise affect the results? There are many more or less intelligent balancing methods besides the influence coefficient method and modal balancing method. Try one of those and learn how it works.

Download this post as a Computable Document Format (CDF) file.

There is a vast amount of literature on the appearance of the golden ratio in nature, in physiology and psychology, and in human artifacts (see this page on the golden ratio; these articles on the golden ratio in art, in nature, and in the human body; and this paper on the structure of the creative process in science and art). In the past thirty years, there has been increasing skepticism about the prevalence of the golden ratio in these domains. Earlier studies have been revisited or redone. See, for example, Foutakis, Markowsky on Greek temples, Foster et al., Holland, Benjafield, and Svobodova et al. for human physiology.

In my last blog, I analyzed the aspect ratios of more than one million old and new paintings. Based on psychological experiments from the second half of the nineteenth century, especially by Fechner in the 1870s, one would expect many paintings to have a height-to-width ratio equal to the golden ratio or its inverse. But the large sets of paintings analyzed did not confirm such a conjecture.

While we did not find the expected prevalence of the golden ratio in external measurements of paintings, maybe looking “inside” will show signs of the golden ratio (or its inverse)?

In today’s blog, we will analyze collections of paintings, photographs, and magazine covers that feature human faces. We will also analyze where human faces appear in a few selected movies.

The literature on art history and the aesthetics of photography puts forward a theory of dividing the canvas into thirds, horizontally and vertically. And when human faces are portrayed, two concrete rules for the position of the eyeline are often mentioned:

- the rule of thirds: the eyeline should be 2/3 (≈0.67) from the bottom
- the golden ratio rule: the eyeline should be at 1/(
*golden ratio*) (≈0.62) from the bottom

The rule of thirds is often abbreviated as ROT. In 1998 Frascari and Ghirardini—in the spirit of Adolf Zeising, the father of the so-called golden numberism—coined the term “ϕaithful” (making clever use of the Greek symbol ϕ that is used to denote the golden ratio) to label the unrestricted belief in the primacy of the golden ratio. Some consider the rule of thirds an approximation of the golden ratio rule; “ROT on steroids” and similar phrases are used. Various photograph-related websites contain a lot of discussion about the relation of these two rules. For early uses of the rule of thirds, see Nafisi. For the more modern use starting in the eighteenth century, see this history of the rule of thirds. For a recent human-judgment-based evaluation of the rule of thirds in paintings and photographs, see Amirshahi et al.

So because we cannot determine which rule is more common by first-principle mathematical computations, let’s again look at some data. At what height, measured from the bottom, are the eyes in paintings showing human faces?

Let’s start with paintings. As with the previous blog, we will use a few different data sources. We will look at four painting collections: Wikimedia, the Smithsonian, Britain’s Your Paintings, and Saatchi.

If we want to analyze the positions of faces within a painting, we must first locate the faces. The function `FindFaces` comes in handy. While typically used for photographs, it works pretty well on (representational) paintings too. Here are a few randomly selected paintings of people from Wikimedia. First, the images are imported and the faces located and highlighted by a yellow, translucent rectangle. We see potentially different amounts of horizontal space around a face, but the vertical extension is pretty uniform from the chin to the bottom of the forehead hairs.

A more detailed look reveals that the eyeline is approximately at 60% of the height of the selected face area. (Note that this is approximately 1/ϕ). To demonstrate the correctness of the 60%-of-the-face-height rule for some randomly selected images from Wikipedia, we show the resulting eyeline in red and the two lines ±5% above and below.

Independent of gender and haircut, the 60% height seems to be a good approximation for the eyeline. Of course, not all faces that we encounter in paintings and photographs are perfectly straightened. For tilting heads, we note both eyes will not be on a horizontal line. But as an average, the 60% rule works well.

Overall we see that the eyeline can be located within a few percent of the vertical height of the face rectangle. The error of the resulting estimation of the eyeline height in a painting/photograph in most collections should be about ≤2% for a typical ratio of face height to painting/photograph height. Plus or minus 2% should be small enough such that for a large enough painting/photograph collection we can discriminate the golden ratio height 1/ϕ from the rule of thirds 2/3. On the range [0,1], the distance between 1/ϕ and 2/3 is about 5%. (Using a specialized eye detection method to determine the vertical height of the eyes we leave for a later blog.)

We start with images of paintings from Wikimedia.

Using the 0.6 factor for the eyeline heights, we get the following distribution of the faces identified. About 12,000 faces were found in 8,000 images. The blue curve shows the probability density of the position of the eyelines of all faces, and the red curve the faces whose bounding rectangles occupy more than 1/12 of the total area of the painting. (While somewhat arbitrary, here and in the following, we will use 1/12 as the relative face rectangle area, above which a face will be considered to be a larger part of the whole image.) We see a clear single maximum at 2/3 from the bottom, as predicted by the ROT. (The two black vertical lines are at 2/3 and 1/ϕ).

Because we determine the faces from potentially cropped images rather than ruler-based measurements on the actual paintings, we get some potential errors in our data. As analyzed in the last blog, these effects seem to average out and introduce final errors well under 1% for over 10,000 paintings.

Here are two heat maps: one for all faces, and the other for larger faces only. We place face-enclosing rectangles over each other, and the color indicates the fraction of all faces at a given position. One sees that human faces appear as frequently in the left half as in the right half. To allow comparisons of the face positions of paintings with different aspect ratios, the widths and heights of all paintings were rescaled to fit into a square. The centers of the faces fall nicely into the [2/3,1/ϕ] range. (The Wolfram Language code to generate the PDF and heat map plots is given below.)

Here is a short animation showing how the peak of the face distributions forms as more and more paintings are laid over each other.

Repeating the Wikimedia analysis with 4,000 portrait paintings from the portrait collection of the Smithsonian yields a similar result. This time, because we selected portrait paintings from the very beginning, the blue curve already shows a more located peak.

The British Your Paintings website has a much larger collection of paintings. We find 58,000 paintings with a total of 76,000 faces.

The mean and standard deviation for all eyeline heights is 0.64±0.19, and the median is 0.69.

In the eyeline position/relative face size plane, we obtain the following distribution showing that larger faces are, on average, positioned lower. Even for very small relative face sizes, the most common eyeline height is between 1/ϕ and 2/3.

The last image also begs for a plot of the PDF of the relative size of the faces in a painting. The mean area of a face rectangle is 3.9% of the whole painting area, with a standard deviation of 5.5%.

Here is the corresponding cumulative distribution of all eyeline positions of faces larger than a given relative size. The two planes in the *yz* plane are at 1/ϕ and 2/3.

Did the fraction of paintings obeying the ROT of ϕ change over time? Looking at the data, the answer is no. For instance, here is the distribution of the eyeline heights for all nineteenth- and twentieth-century paintings from our dataset. (There are some claims that even Stone Age paintings already took the ROT into account.)

As paintings often contain more than one person, we repeat the analysis with the paintings that just have a single face. Now we see a broader maximum that spans the range from 1/ϕ to 2/3.

Looking at the binned rather than the smoothed data in the range of the global maximum, we see two well-resolved maxima: one according to the ROT and one according to the golden ratio.

Now that we have gone through all the work to locate the faces, we might as well do something with them. For instance, we could superimpose them. And as a result, here is the average face from 11,000 large faces from nineteenth-century British paintings. The superimposed images of tens of thousands of faces also gives us some confidence in the robustness and quality of the face extraction process.

Given a face from a nineteenth-century painting, which (famous) living person looks similar? Using `Classify`["`NotablePerson`",…], we can quickly find some unexpected facial similarities of living celebrities to people shown in older British paintings. The function `findSimilarNotablePerson` takes as the argument the abbreviated URL of a page from the Your Paintings website, imports the painting, extracts the face, and then finds the most similar notable person from the built-in database.

Here is a Demonstration that shows a few more similar pairs (please see the attached notebook to look through the different pairings).

Now let us look at some more modern paintings. We find 15,000 modern portraits at Saatchi. Faces in modern portraits can look quite abstract, but `FindFaces` still is able to locate a fair number of them. Here are some concrete examples.

And here is an array of 144 randomly selected faces in modern art paintings. From a distance, one recognizes human faces, but deviations due to stylistic differences become less visible.

If we again superimpose all faces, we get a quite normal-looking human face. With a more female appearance (e.g. softer jawline and fuller lips) as compared to the nineteenth-century British paintings, the overall face has more female characteristics. The fact that the average face looks quite “normal” is surprising when looking at the above 12*12 matrix of faces.

If we add not just all color values but also random positive and negative weights, we get much more modern-art-like average faces.

Now concerning the main question of this blog: what are the face positions in these modern portraits? Turns out, they again follow the golden ratio much more frequently than the ROT. About 30% more paintings have the eyeline at 1/ϕ±1% compared to 2/3±1%.

The mean and standard deviation for all eyeline heights is 0.60±0.16, and the median is 0.62. A clearly lower-centered and narrower distribution.

And if we plot the PDF of the eyeline height versus the relative face size, we clearly see a sweet spot at eyeline height 2/3 and relative face area 1/5. Smaller faces with relative size of about 5% occur higher, at eyeline height about 3/4.

And here is again the corresponding 3D graphic that shows the 1/ϕ eyeline height for larger relative faces is quite pronounced.

We should check with another data source to confirm that more modern paintings have a more ϕaithful eyeline. The site Fine Art America offers thousands of modern paintings of celebrities. Here is the average of 5,000 such celebrity paintings (equal amounts politicians, actors and actresses, musicians, and athletes). Again we clearly see the maximum of the PDF at 1/ϕ rather than at 2/3.

For individual celebrities, the distribution might be different. Here is a small piece of code that uses some functions defined in the last section to analyze portrait paintings of individual persons.

Here are some examples. (We used about 150 paintings per person.)

Perhaps unexpectedly, Jimi Hendrix is nearly perfectly ϕaithful, while Mick Jagger seems perfectly ROTen. Obama and Jesus obey nearly exactly the rule of thirds in its classic form.

Now, for comparison to the eyeline positions in paintings, let us look at some sets of photographs and determine the positions of the faces in these. Let’s start with professional portrait photographs. The Getty Image collection is a premier collection of good photographs. In contrast to the paintings, the maximum for large faces is much closer to 2/3 (ROT) than to 1/ϕ for a random selection of 200,000 portrait photographs.

And here is again the distribution in the eyeline height/relative face size plane. For very large relative face sizes, the most common eyeline height even drops below 1/ϕ.

And here is the corresponding heat map arising from overlaying 300,000 head rectangles.

So what about other photographs, those aesthetically less perfect than Getty Images? The Shutterstock website has many photos. Selecting photos with subjects of various tags, we quite robustly (meaning independent of the concrete tags) see the maximum of the eyeline height PDF near 2/3. This time, we display the results for portraits showing groups of identically tagged people.

These are the eyeline height distributions and the average faces of 100,000 male and female portraits. (The relatively narrow peak in the twin-peak structure of the distribution between 0.5 and 0.55 comes from photos that are close-up headshots that don’t show the entire face.)

Restricting the photograph selection even more, e.g. to over 10,000 photographs of persons tagged with *nerd* or *beard* shows again ROTen-ness.

The next two rows show photos tagged with *happy* or *sad*.

All of the last six tag types (male, female, nerd, beard, happy, sad) of photographs show a remarkable robustness of the position of the eyeline maximum. It is always in the interval [1/ϕ,2/3], with a trend toward 2/3 (ROT).

But where are the babies (the baby eyeline, to be precise)? The two peaks are now even more pronounced, with the first peak even bigger than the second—the reason being that many more baby pictures are just close-ups of the baby’s whole face.

Next we’ll have a look at the eyeline height PDFs for two professional photographers: Peggy Sirota and Mario Testino. Because both artists often photograph models, the whole human body will be in the photograph, which shifts the eyeline height well above 2/3. (We will come back to this phenomenon later.)

After looking at professionally made photos, we should, of course, also have a look at the pinnacle of modern amateur portraiture—the selfie. (For a nice summary of the history of the selfie, see Saltz. For a detailed study in the increase of selfie popularity over the last three years by nearly three orders of magnitude, see Souza et al. Using some of the service connects, e.g. the “`Flickr`” connection, we can immediately download a sample of selfies. Here are five selfies from the last week in September around the Eiffel Tower. Not all images tagged as “selfies” are just the faces in close up.

Every day, more than 100,000 selfies are added to Instagram (one can easily browse them here)—this is a perfect source for selfies. Here are the eyeline height distributions for 100,000 selfie thumbnails.

Compared with the professional photographs, we see that the maximum of the eyeline height distributions is clearly above 2/3 for photos that contain a face larger than 1/12 of the total photo. So the next time you take a selfie, position your face a bit lower in the picture to better obey the ROT and ϕ. (Systematic deviations of selfies from established photographic aesthetic principles have already been observed by Bruno et al.)

The eyeline height in a selfie changes much less with the total face area as compared to professional photographs.

And again, the corresponding heat map.

The maximum of the total area of the faces in selfies is—not unexpectedly—due to the finite length of the human arm or typical telescopic selfie sticks, bounded by about one meter. So selfies with very small faces are scarcer than photographs or paintings with small faces.

What’s the average selfie face look like? The left image is the average over all faces, the middle image the average over all male faces, and the right image the average over all female faces. (Genders were heuristically determined by matching the genders associated with a given name to user names.) The fact that the average selfie looks female arises from the fact that a larger number of selfies are of female faces. This was also found in the recent study by Manovich et al.

Now, it could be that the relative height of the eyeline is dependent on the concrete person portrayed. We give the full code in case the reader wants to experiment with people not investigated here. Eyeline heights we measure in images from the Getty website, tagged with the keywords to be specified in the function `positionSummary`.

Now it takes just a minute to get the average eyeline height of people seen in the news, each based on analyzing 600 portrait shots of Lady Gaga, Taylor Swift, Brad Pitt, and Donald Trump. Lady Gaga’s eyeline is, on average, clearly higher, quite similar to typical selfie positions. On the other hand, Taylor Swift’s eyeline is peaked at the modern painting-like maximum at 1/ϕ.

Many more types of photographs could be analyzed. But we end here and leave further exploration and more playtime to the reader.

Many LinkedIn profile pages have photographs of the page owners. These photographs are another data source for our eyeline height investigations. Taking 25,000 male and 25,000 female profile photos, we obtain the following results. Because the vast majority of LinkedIn photographs are close-up shots, the curve for faces occupying more than 1/12 of the whole area is quite similar to the curve of all faces, and so we show only the distribution of all faces. This time, the yellow curve shows all faces that occupy between 10% and 30% of the total area.

Here are the eyeline height PDF, the bivariate PDF, and the average face for 10,000 male members from LinkedIn. Based on the frequency of male first names in the US, Bing image searches restricted to the LinkedIn domain were carried out, and the images found were collected.

Interestingly, the global maximum of the eyeline height distribution occurs clearly below 1/ϕ, the opposite effect compared to the selfies analyzed above. The center graph shows the distribution of the eyeline height as a function of the face area. The global maximum appears at a face area of 1/5 and at eyeline height quite close to 1/ϕ. This means the low global maximum is mostly caused by photographs where the face rectangles occupy more than 30% of the total area. The most typical LinkedIn photograph has a face rectangle area of 1/5th of the total area and the eyeline height is at 1/ϕ.

The corresponding distribution over all female US first names is quite similar to the corresponding curve for males. But for faces that occupy a larger fraction of the image, the female distribution is visibly different. The average eyeline height of these photos of women on LinkedIn is a few percent smaller than the corresponding male curve.

With the large number of members on LinkedIn, it even becomes feasible to look for eyeline height distribution for individual names. We carry out a facial profiling for three names: Josh, Raj, and Mei. Taking 2,500 photos for each name, we obtain the following distributions and average faces.

The distributions agree quite well with the corresponding gender distributions above.

After observing the remarkable peak of the eyeline height PDF at 1/ϕ, I was wondering which of my Wolfram Research or Wolfram|Alpha coworkers obey the ϕaithful rule. And indeed I found more of my male coworkers have the 1/ϕ height than female coworkers. Not unexpectedly, our design director’s is among the ϕaithful. The next input imports photos from the LinkedIn pages of other Wolfram employees and draws a red line at height 1/ϕ.

Let us compare the peak distribution with the one from the current members of Congress. We import photos of all members of Congress.

Here are some example photos.

Similar to the LinkedIn profile photos, the maximum of the eyeline PDF is slightly lower than 2/3. We also show the face of the averaged member of Congress.

After having analyzed the face positions of amateur and professional photographs, a next natural area for exploration is magazine covers: their photographs are carefully made, selected, and placed. *TIME* magazine maintains a special website for their 4,800 covers covering over ninety years of published issues. (For a quick view of all covers, see Manovich’s cover analysis from a few years ago.)

It is straightforward to download the covers, and then find and extract the faces.

These are the two resulting distributions for the eyelines.

The maximum occurs at a height smaller than 1/2. This is mostly caused by the title “*TIME*” on top of the cover. Newer editions have partial overlaps between the magazine title and the image. The following plot shows the yearly average of the eyeline height over time. Since the 1980s, there has been a trend for higher eyeline positions on the cover.

If we calculate the PDFs of the eyeline positions of all issues from the last twenty-five years, we see quite a different distribution with a bimodal structure. One of the peaks is nearly exactly at 1/ϕ.

And here are the average faces per decade. We see also that the covers of the first two decades were in black and white.

For a second example, we will look at the German magazine *SPIEGEL*. It is again straightforward to download all the covers, locate the faces, and extract the eyelines.

Again, because of the title text “*SPIEGEL*” on top of the cover, the maximum of the PDF of the eyeline height on the cover occurs at relatively low heights (≈0.56).

A heat map of the face positions shows this clearly.

Taking into account both that the magazine title “*SPIEGEL*” is typically 13% of the cover height and that there is whitespace at the bottom, the renormalized peak of the eyeline height is nearly exactly at 1/ϕ.

For a third, not-so-politically-oriented magazine, we chose the biweekly *Rolling Stone*. They too have a collection of their covers (through 2013) online. The eyeline height distribution is again bimodal, with the largest peak at 1/ϕ. So *Rolling Stone* is a ϕaithful magazine.

By year, the average eyeline height shows some regularities within an eight-year period.

The cumulative mean of the eyeline heights is very near to 1/ϕ, and the average through 2013 deviates only 0.4% from 1/ϕ.

To parallel the earlier two magazines, here are the averaged faces by decade.

Comic covers are another fairly large source of images to analyze. The Comic Book Database has a large collection of comic book covers. Here we restrict ourselves to Marvel Comics and DC Comics, totaling about 72,000 covers. Because comics are not photographs, recognizing faces is now a harder job. But even so, we successfully extract about 90,000 faces.

Here are our typical characterizations (eyeline height PDF, face position heat map, average face) for Marvel Comics.

And the same for DC Comics.

All three characteristics show remarkable consistency between the two comic publishers.

Many more collections of faces can now be investigated for the eyeline positions. It is straightforward to write a small crawler function that starts with a given website and extracts images and links to pages with more images. (This is just a straightforward implementation. Many optimizations, such as parallel retrieval, could be implemented to improve this function.)

For example, here is the resulting average data for all images (larger than 200 pixels) from *The New York Times* website from February 8, 2016. The eyeline PDF maximum is between 2/3 and 1/ϕ.

And here from the weekly German newspaper, *Die Zeit*. This time, the eyeline maximum is clearly 2/3 for larger faces.

Here is a snapshot of 1,000 images from CNN.

The eyeline heights in fashion magazines show a totally different distribution. Here are the results of 1,000 images from *Vogue*. Because many images on the site show the stylishly dressed models from head to toe, the head is small and the eyeline very high in the images. As a result, we get the strong, narrow peak of the blue curve.

*GQ Magazine* also shows a global eyeline height peak at 2/3 for large faces.

The maximum of the eyeline in the magazine *People* is again at 2/3 for large faces.

And here are the results for *Ebony* magazine. This time, the large face eyeline height has a peak at 1/ϕ.

Using a bodybuilding magazine, as with the *Vogue* images, we see a very high eyeline, again because often whole-body images are shown. The average face looks different from the previous averages.

We obtain a softer-looking face with an eyeline maximum greater than 2/3 from *Allure* magazine.

And goths from the *Gothic Beauty* magazine are on average ROTen, but large goths are more ϕaithful.

The magazine *20/20* specializes in glasses. Not unexpectedly, the average face shows pronounced sunglasses and the eyeline height as greater than 2/3.

A good-sized source of a wide variety of drawn and photographed paintings are movie posters. The site Movie Posters has 35,000 posters going back to the 1920s.

More interesting is a plot of the mean over time. Before the 1980s, eyelines were more in the center of posters. Since then, the average eyeline position is more in the interval [1/ϕ,2/3].

The shift in average eyeline height in movie posters is even more clearly visible in the corresponding face heat maps.

Here is the average face from all movie posters from the last five years.

In the last blog, we ended with plots of the evolution of the average movie aspect ratio, so this time we will also end by analyzing some movies. The Internet Archive has a collection of 20,000 movies that are available for download. We will look at the face positions of two well-known classics: Buster Keaton’s *The General* from 1926 and Fritz Lang’s *Metropolis* from 1927. We start with *The General*. The average of all faces (without taking size into account) is at 2/3, and the large faces clearly appear lower.

Not every frame of a movie contains faces, so it is natural to ask if the mean (windowed) eyeline height changes as the movie progresses. Here is a different kind of heat map that shows the mean eyeline height over time. The colors indicate the number of frames that contain identified faces.

Because the main character in the film moves a lot, the heat map of the face position has now much more structure as compared to the above heat maps of photographs and paintings.

Fritz Lang’s *Metropolis*, although made only one year after *The General*, was shot in quite a different style. Just by quickly zooming through the movie, one observes that the majority of faces appear at a much larger height. This impression is confirmed by the actual data about the eyeline positions.

The PDF of all eyeline positions shows that especially large faces appear high in the frames.

We compare with a modern TV series production—episode nine of season nine from *The Big Bang Theory*, “The Platonic Permutation”. Most faces appear above the 2/3 height.

But the PDF of the eyeline position of larger faces peaks very near to 2/3, and the average face shows characteristic facial features of the show’s main characters.

Or, for a very recent example, here is the PDF of episode one of Amazon’s recent *The Man in the High Castle*. The peak of the eyeline of larger faces is nearer to 1/ϕ than to 2/3.

We end with a third TV series example, episode eight of season six of *The Walking Dead*. For larger faces, we see a well-pronounced bimodal eyeline height distribution, with the two maxima at 1/ϕ and 2/3.

In this second part of our explorations of the golden ratio in the visual arts, we looked at the height of the eyeline of human faces and the face position. Using the function `FindFaces` and approximate rules for determining the eyeline height in faces, we computed averages of more than a million faces and eyeline heights.

The maxima of the eyeline height distribution for photographs and paintings is predominately in the range of 0.6 to 0.67. Older paintings and modern photographs have maxima near 2/3, as the rule of thirds predicts (demands). Interestingly, modern art portraits show the eyeline height PDF peak at 1/golden ratio for large faces. (We used >1/12 of the total area to define “large” faces.) The peak eyeline position in selfies is about 0.7, higher than in paintings and many professional photographs. The magazine covers we analyzed, especially those of the past few decades, seem to have a peak of the eyeline position PDF at 1/golden ratio. Similarly, the photos from various newspaper sites show a peak at 1/golden ratio. For LinkedIn photos, clear gender differences between the positions of the eyeline height were found—men turned out to be more ϕaithful. And the analyzed movies show that faces, especially smaller ones, appear quite often significantly above the 2/3 height. But modern TV series show peaks at either the 1/golden ratio or 2/3—or even both simultaneously.

Download this post as a Computable Document Format (CDF) file.

]]>