# Benedict Cumberbatch Can Charm Humans, but Can He Fool a Computer?

November 26, 2014 — Rita Crook, Marketing Projects Manager

The Imitation Game, a movie portraying Alan Turing’s life (who would have celebrated his 100th birthday on Mathematica‘s 23rd birthday—read our blog post), was released this week, which we’ve been looking forward to. Turing machines were one of the focal points of the movie, and we launched a prize in 2007 to determine whether the 2,3 Turing machine was universal.

So of course, Cumberbatch’s promotional video where he impersonates other beloved actors reached us as well, which got me wondering, could Mathematica‘s machine learning capabilities recognize his voice, or could he fool a computer too?

I personally can’t stop myself from chuckling uncontrollably while watching his impressions, however, I wanted to look beyond the entertainment factor.

So I started wondering: Is he actually good at doing these impressions? Or are we all just charmed by his persona?

Is my psyche just being fooled by the meta-language, perhaps? If we take the data of pure voices, does he actually cut the mustard in matching these?

In order to determine the answer, 10 years ago we would have needed to stroll the streets and play audio snippets to 300 people from the James Bond movies, The Shining, Batman, and Cumberbatch’s impression snippets—then survey whether those people were fooled.

But no need, if you have your Mathematica handy!

With Mathematica‘s machine learning capabilities, it’s possible to classify sample voice snippets easily, which means we can determine whether Benedict’s impressions would be able to fool a computer. So I set myself the challenge of building a decent enough database of voice samples, plus I took snippets from each of Benedict’s impression attempts, and I let Mathematica do its magic.

We built a path to each person’s snippet database, which Mathematica exported for analysis:

We imported all of the real voices:

The classifier was trained simply by providing the associated real voices to Classify; in the interest of speed, a pre-trained ClassifierFunction was loaded from cfActorWDX.wdx:

My audio database needed to include snippets of Benedict’s own voice, snippets of the impersonated actors’ own voices, and the impressions from Cumberbatch. The sources for the training were the following: Alan Rickman, Christopher Walken, Jack Nicholson, John Malkovich, Michael Caine, Owen Wilson, Sean Connery, Tom Hiddleston, and Benedict Cumberbatch. I used a total of 560 snippets, but of course, the more data used, the more reliable the results. The snippets needed to be as “clean” as possible (no laughter, music, chatter, etc. in the background).

These all needed to be exactly the same length (3.00 seconds), and we made sure all snippets were the same length by using this function in the Wolfram Language:

Some weren’t single-channel audio files, so we needed to exclude this factor as an additional feature to optimize our results during the export stage:

Thanks go to Martin Hadley and Jon McLoone for the code.

Drum-roll… time for the verdict!

I have to break everyone’s heart now, and I’m not sure I want to be the one to do it… so I will “blame” Mathematica, because machine learning could indeed mostly tell the difference between the actors’ real voices and the impressions (bar two).

As the results below reveal, Mathematica provides 97–100% confidence on the impressions tested:

For most impressions, there is a very small reported probability of any classification other than Benedict Cumberbatch or Alan Rickman.

It might be worth noting that Rickman, Connery, and Wilson all have a slow rhythm to their speech, with many pauses (especially noticeable in the snippets I used), which could have confused the algorithm.

Now it’s time to be grown up about this, and not hold it against Benedict. He is still a beloved charmer, after all.

My admiration for him lives on, and I look forward to seeing him in The Imitation Game!

Download the accompanying code for this blog post as a Computable Document Format (CDF) file.

RELATED POSTS

 Great post, I was definitely charmed by his persona. Posted by Jane    November 26, 2014 at 10:52 am
 now THIS is truly beyond fantastic :) Posted by Mikey    November 26, 2014 at 12:24 pm
 Fascinating work! Would love to see this done with someone who actually does/did impressions for a living. Your Rich Littles, your Frank Gorshins, your Andre_Philippe Gagnons. That would be impressive! Thanks Posted by Paul Thomson    November 26, 2014 at 1:06 pm
 Hi Paul, thank you! We haven’t done it with other professional impressionists, however we’d love to challenge our users and for them to post their results in the Wolfram Community. They would definitely interest me… Posted by Rita Crook    December 2, 2014 at 7:00 am
 Might have to go and watch the film now! This Cumberbatch fella seems decent. Posted by Riccardo    November 27, 2014 at 5:40 am
 How did you get rid of the clock ticking noise? Or did you? Could an algorithm not be set on finding that single “chime” in each sample to ID Cumberbatch? Posted by Luke Stanley    November 28, 2014 at 9:17 pm
 Hi Luke, We didn’t attempt to filter out the ticking noise or generally clean the sound files, other than making sure that the snippets I used of the impressions, were cut so the end results were as clean as possible – which I think makes the feature detection of Classify even more impressive. If we were more rigorous, Martin, who wrote most of the code itself, suggests that we could have looked into doing this http://mathematica.stackexchange.com/a/15266/1952 Posted by Rita Crook    December 2, 2014 at 7:08 am
 Hi Luke, On top of Rita’s comments, if you’re interested in using the Wolfram Language for audio processing I’d recommend looking into Mariusz’s excellent talk here: http://library.wolfram.com/infocenter/Conferences/8563/. Posted by Martin John Hadley    December 2, 2014 at 7:14 am
 Now what would be truly impressive is: If Mathematica could tell why BC has such wonderful hair in all his flicks, and his hair looks like a total dork in real life. Posted by YouRang    January 22, 2015 at 7:45 pm
 This is a fantastic use for Mathematica. I love it. And am very impressed. Posted by Jennifer    January 22, 2015 at 7:53 pm
 I am charmed by your wonderful idea to use Mathematica for such a cool set of voice recognition tests! Great idea. And excellent execution. My hat is off to YOU! Posted by David D-VA    January 23, 2015 at 10:50 am
 Resurrecting an old post, but it appears that the code for soundPartition isn’t listed in either the post or the attached CDF article. Do you mind pointing me to it? Or was soundPartition renamed to soundTake? Posted by David Koslicki    September 7, 2015 at 4:07 pm
 Only a year late… soundTake[ Sound[SampledSoundList[data_List, rate_], opts2___], {low_?NumberQ, high_?NumberQ}] := Block[{ lov = Floor[low*rate + 1], hiv = Floor[high*rate + 1]}, Sound[SampledSoundList[Take[#, {lov, hiv}] & /@ data, rate], opts2]] soundDuration[Sound[SampledSoundList[data_, rate_], opts___]] := Length[First[data]]/rate; randomSoundTake[source_Sound, n_?NumberQ] := Block[{duration = soundDuration[source], start}, start = RandomReal[{0, duration - n}]; soundTake[source, {start, start + n}]] soundPartition[source_Sound, n_?NumberQ] := Block[{duration = soundDuration[source]}, Table[soundTake[source, {i , i + n}], {i, 0, n Floor[duration/n] – n, n}]] Posted by Martin Hadley    November 26, 2015 at 10:45 am