Benedict Cumberbatch Can Charm Humans, but Can He Fool a Computer?
The Imitation Game, a movie portraying Alan Turing’s life (who would have celebrated his 100th birthday on Mathematica‘s 23rd birthday—read our blog post), was released this week, which we’ve been looking forward to. Turing machines were one of the focal points of the movie, and we launched a prize in 2007 to determine whether the 2,3 Turing machine was universal.
So of course, Cumberbatch’s promotional video where he impersonates other beloved actors reached us as well, which got me wondering, could Mathematica‘s machine learning capabilities recognize his voice, or could he fool a computer too?
I personally can’t stop myself from chuckling uncontrollably while watching his impressions, however, I wanted to look beyond the entertainment factor.
So I started wondering: Is he actually good at doing these impressions? Or are we all just charmed by his persona?
Is my psyche just being fooled by the meta-language, perhaps? If we take the data of pure voices, does he actually cut the mustard in matching these?
In order to determine the answer, 10 years ago we would have needed to stroll the streets and play audio snippets to 300 people from the James Bond movies, The Shining, Batman, and Cumberbatch’s impression snippets—then survey whether those people were fooled.
But no need, if you have your Mathematica handy!
With Mathematica‘s machine learning capabilities, it’s possible to classify sample voice snippets easily, which means we can determine whether Benedict’s impressions would be able to fool a computer. So I set myself the challenge of building a decent enough database of voice samples, plus I took snippets from each of Benedict’s impression attempts, and I let Mathematica do its magic.
We built a path to each person’s snippet database, which Mathematica exported for analysis:
We imported all of the real voices:
The classifier was trained simply by providing the associated real voices to Classify; in the interest of speed, a pre-trained ClassifierFunction was loaded from cfActorWDX.wdx:
My audio database needed to include snippets of Benedict’s own voice, snippets of the impersonated actors’ own voices, and the impressions from Cumberbatch. The sources for the training were the following: Alan Rickman, Christopher Walken, Jack Nicholson, John Malkovich, Michael Caine, Owen Wilson, Sean Connery, Tom Hiddleston, and Benedict Cumberbatch. I used a total of 560 snippets, but of course, the more data used, the more reliable the results. The snippets needed to be as “clean” as possible (no laughter, music, chatter, etc. in the background).
These all needed to be exactly the same length (3.00 seconds), and we made sure all snippets were the same length by using this function in the Wolfram Language:
Some weren’t single-channel audio files, so we needed to exclude this factor as an additional feature to optimize our results during the export stage:
Thanks go to Martin Hadley and Jon McLoone for the code.
Drum-roll… time for the verdict!
I have to break everyone’s heart now, and I’m not sure I want to be the one to do it… so I will “blame” Mathematica, because machine learning could indeed mostly tell the difference between the actors’ real voices and the impressions (bar two).
As the results below reveal, Mathematica provides 97–100% confidence on the impressions tested:
For most impressions, there is a very small reported probability of any classification other than Benedict Cumberbatch or Alan Rickman.
It might be worth noting that Rickman, Connery, and Wilson all have a slow rhythm to their speech, with many pauses (especially noticeable in the snippets I used), which could have confused the algorithm.
Now it’s time to be grown up about this, and not hold it against Benedict. He is still a beloved charmer, after all.
My admiration for him lives on, and I look forward to seeing him in The Imitation Game!
Download the accompanying code for this blog post as a Computable Document Format (CDF) file.
Great post, I was definitely charmed by his persona.
now THIS is truly beyond fantastic :)
Fascinating work!
Would love to see this done with someone who actually does/did impressions for a living. Your Rich Littles, your Frank Gorshins, your Andre_Philippe Gagnons.
That would be impressive!
Thanks
Might have to go and watch the film now! This Cumberbatch fella seems decent.
How did you get rid of the clock ticking noise? Or did you? Could an algorithm not be set on finding that single “chime” in each sample to ID Cumberbatch?
Now what would be truly impressive is: If Mathematica could tell why BC has such wonderful hair in all his flicks, and his hair looks like a total dork in real life.
This is a fantastic use for Mathematica. I love it. And am very impressed.
I am charmed by your wonderful idea to use Mathematica for such a cool set of voice recognition tests! Great idea. And excellent execution. My hat is off to YOU!
Resurrecting an old post, but it appears that the code for soundPartition isn’t listed in either the post or the attached CDF article. Do you mind pointing me to it? Or was soundPartition renamed to soundTake?
Only a year late…
soundTake[
Sound[SampledSoundList[data_List, rate_], opts2___], {low_?NumberQ,
high_?NumberQ}] := Block[{
lov = Floor[low*rate + 1],
hiv = Floor[high*rate + 1]},
Sound[SampledSoundList[Take[#, {lov, hiv}] & /@ data, rate],
opts2]]
soundDuration[Sound[SampledSoundList[data_, rate_], opts___]] :=
Length[First[data]]/rate;
randomSoundTake[source_Sound, n_?NumberQ] :=
Block[{duration = soundDuration[source], start},
start = RandomReal[{0, duration – n}];
soundTake[source, {start, start + n}]]
soundPartition[source_Sound, n_?NumberQ] :=
Block[{duration = soundDuration[source]},
Table[soundTake[source, {i , i + n}], {i, 0,
n Floor[duration/n] – n, n}]]