# Musical Archaeology with Mathematica

June 23, 2009 — Jon McLoone, International Business & Strategic Development

While tidying up after my kids once again, I found myself staring at the toy shown below and thinking of a conversation that I had had with an archaeologist Mathematica user a few days before. He had been interested in image processing of aerial photographs, but it occurred to me that image processing would also allow reconstruction of the musical secrets of this precious artifact that I had just uncovered in the remains of a lost toy civilization.

Well, this should be fun for 5–10 minutes. The toy is a music box, where you crank the handle to turn the drum that has pins on it to pluck the prongs to the left. Can I discover the tune, without having to move the parts?

First, let’s crop the image so that only the pins are visible and rotate it so that it can be read from left to right, like music, rather than from right to left, which is the direction the drum turns.

There is a lot of texture, so I will blur it next, to get rid of that distracting detail.

Now I’ll apply a Sobel filter to edge-detect the blobs, but I’ll do this only in the vertical direction so as to ignore the reflected light changes caused by the curvature of the drum.

Next I will convert to black or white and erode the blobs to make them smaller, and to eliminate the various small spurious blobs that are artifacts of the texture.

Now for the only bit of programming—I need a function to identify the center of blobs by finding the average of the coordinates of all the pixels in each blob.

All that’s left is to get the blob coordinates from the cleaned image using this new function and sort them into playing order. This will generate a list of {x,y} coordinates, where the x value tells us about the time since the start of the tune, and y tells us the pitch—the essence of a musical score.

The score should form the same pattern as the image.

Now I want to convert that score into sounds so that I can hear it. Here we hit a big piece of guesswork. Observationally I can see that the prongs get longer toward the top of the image (or bottom after our rotation) and so will be lower in pitch. But how much? To make matters worse, what you can’t see from the image is that the undersides of the prongs have been milled so that they are not the same thickness—essentially they have been individually engineered to particular notes. I could photograph the undersides, or better still, 3D scan them, and do some serious analysis, but this is supposed to be around 10 minutes of fun, so I am just going to assume that frequency is inversely proportional to distance. The scaling parameters were set by trial and error to give the following function:

OK, it sounds terrible, but you can just about make out that the tune is the start of “Happy Birthday to You.” If this were a real mystery, I would be pretty pleased with that result from 10–15 lines of code.

I will leave it as an exercise to the readers with better senses of pitch to work out what that ToSound function should really be.

Posted in: Image Processing