News, Views & Insights

Analyzing Shakespeare’s Texts on the 400th Anniversary of His Death

April 21, 2016

Putting some color in Shakespeare’s tragedies with the Wolfram Language

After four hundred years, Shakespeare’s works are still highly present in our culture. He mastered the English language as never before, and he deeply understood the emotions of the human mind.

Have you ever explored Shakespeare’s texts from the perspective of a data scientist? Wolfram technologies can provide you with new insights into the semantics and statistical analysis of Shakespeare’s plays and the social networks of their characters.

William Shakespeare (April 26, 1564 (baptized)–April 23, 1616) is considered by many to be the greatest writer of the English language. He wrote 154 sonnets, 38 plays (divided into three main groups: comedy, history, and tragedy), and 4 long narrative poems.

Shakespeare's works

I will start by creating a nice WordCloud from one of his famous tragedies, Romeo and Juliet. You can achieve this with just a couple lines of Wolfram Language code.

First, you need to get the text. One possibility is to import the public-domain HTML versions of the complete works of William Shakespeare from this MIT site:

Then make a word cloud from the text, deleting common stopwords like “and” and “the”:

Romeo and Juliet word cloud

As you can see, DeleteStopwords does not delete all the Elizabethan stopwords like “thou,” “thee,” “thy,” “hath,” etc. But I can delete them manually with StringDelete. And with some minor extra effort, you can improve the word cloud’s style a great deal:

Improving the style of a word cloud

Now let’s analyze a tragedy more deeply. Wolfram|Alpha already offers a lot of computed data about Shakespeare’s plays. For example, if you type “Othello” as Wolfram|Alpha input, you will get the following result:

Information on Othello in Wolfram|Alpha

If you want to visualize the interactions among the characters of this tragedy via a social network, you can achieve this with ease using the Wolfram Language. As I did earlier with the word cloud, I need to first import the texts. In this case I want to work with all the acts and scenes from Othello separately:

Since I want to import and save the scenes for later use in the same notebook’s folder, I can do the following:

Saving the scenes for later use in the same notebook's folder

In order to create the Graph, I first need all the character names, which will be displayed as vertices. I can gather the names by noticing that each dialog line is preceded by a character name in bold, which in HTML is written like this: <b>Name</b>. Thus it is straightforward to get an ordered list containing all character names (“speakers”) from each dialog line using StringCases:

Then, using Union and Flatten, I can obtain the names of all the characters in the tragedy of Othello:

Using Union and Flatten to obtain the character names in Othello

Once I have the vertices, I need to create the edges of the graph. In this case, an edge between two vertices will represent the connection between two characters that are separated by less than two lines within the dialog (similar to the Demonstration by Seth Chandler that analyzes the networks in Shakespeare’s plays). For that purpose, I will use SequenceCases to create all the edges, i.e. pairs of lines separated by less than two lines:

Using SequenceCases to create all the edges

Before creating the graph, I need to delete the edges that are duplicated or are equivalent, like OTHELLO↔IAGO and IAGO↔OTHELLO, and the edges connecting to themselves, i.e. IAGO↔IAGO:

Deleting duplicate edges or equivalents

Finally, you can specify the size of the vertices with the VertexSize option. For example, I want the vertices’ sizes to be proportional to the number of lines per character. I can get the number of lines per character with Counts:

Lines per character using Counts

After this, I can use a logarithmic function to rescale the vertices to a reasonable size. I will also improve the design with VertexStyle and VertexLabels.

Since the code is getting more cumbersome, I will omit it and show only the result (for those interested in the details of the code, you can find them in the attached notebook). Also note that in the final result I’m excluding the vertex “All” since it is not a real character in the dialog:

Interactions among characters in Othello

So far, so good. Having the social network from a Shakespeare play written more than four hundred years ago is quite cool, but I’m still not 100% satisfied. I would like to visualize when these interactions occur within the dialog itself. One way to achieve this is by representing each main speaker with a different-colored bar:

Representing each main character with a different-colored bar

Note: linesColor is a list of colors representing the lines in one scene, and linesLength is the list of the lines’ StringLength with a rescaling function. These functions involve some TextManipulation, like I did earlier to obtain the character names from the HTML version of the play. If you wish, you can see their construction in the attached notebook:

Additionally, I can mark when a particular character says a particular word—for example, the word “love” (note: the variable words is the list of words per line in the scene, created with the new function TextWords; see the attached notebook for details):

Marking when a particular character says a particular word

Now I can combine all of this with the social network graph and have a colorful and compact infographic about a Shakespeare tragedy:

Dialog lines with the word love

There are so many other interesting things that I would like to explore about Shakespeare’s works and life. But I will finish with a map representing the locations at which his plays occur. I hope you got a glance of what is possible to achieve with the Wolfram Language. The only limits are our imagination:

Mapping the locations at which Shakespeare's plays occurred

For a few places, the Interpreter fails to find a GeoPosition, so I used Cases to obtain all the successfully interpreted locations:

Finally, I’m using Geodisk to depict geopositions by disks with a radius proportional to the number of times each location appears in Shakespeare’s plays:

Map of locations where Shakespeare's plays occur

Many fellow Wolfram users expressed keen interest in and came up with astonishing approaches to Shakespeare’s corpus analysis on Wolfram Community. We hope this blog will inspire you to join that collaborative effort exploring the mysteries of Shakespeare data.

Download this post as a Computable Document Format (CDF) file.

Jofre Espigulé-Pons, Machine Learning

Comments

Join the discussion

10 comments

This is excellent work! I’ve done some analysis of Romeo and Juliet myself using Jon Bosak’s excellent Shakespeare 2.00 dataset (http://xml.coverpages.org/bosakShakespeare200.html) It has all of the Bard’s plays marked up in XML, with each line separated and the speaker identified. It makes computational analysis much easier. I’d strongly suggest anyone interested in text analysis check it out and play around with it.
Reply

Jesse Friedman

April 21, 2016 at 3:13 pm 04/21/2016 at 3:13 pm
That’s utterly charming.
Reply

Michael Stern

April 26, 2016 at 8:53 am 04/26/2016 at 8:53 am
This is Literally techsavy!
Reply

Paul

May 9, 2016 at 1:52 pm 05/09/2016 at 1:52 pm
Just been referred to this by the Wolfram U X-plorations webinar and after spending ages I still cannot see the code that Ruben requested and you said is in the CDF. Sorry to be a pain but can you explain exactly where or how this code can be seen? Many thanks.
Reply

Linda

August 15, 2020 at 4:44 am 08/15/2020 at 4:44 am
- Hello Linda,
  
  It’s about 3/4ths of the way through the CDF. Evaluating line by line may help instead of evaluating the whole notebook. (You can download the file at the end of the blog post.)
  
  – Wolfram Blog Team
  Reply
  
  admin
  
  August 17, 2020 at 9:22 am 08/17/2020 at 9:22 am
Thanks team but this does not help at all. 3/4ths of the way through the CDF, it reads:

Since the code is getting more cumbersome, I will omit it and show only the result (for those interested in the details of the code, you can find them in the attached notebook).

Just as in the blog post above, there is no code visible in the CDF to evaluate, line by line, and why I asked ‘can you explain exactly where or how this code can be seen?’ What am I missing? If the ‘code’ is actually in the CDF why can neither I nor Ruben actually see it? Sorry, but I am still completely mystified and hope you can help me actually see the code you say is there. Many thanks.
Reply

Linda

August 18, 2020 at 4:43 am 08/18/2020 at 4:43 am
- Linda,
  
  You’re correct, the snippet of code was removed as it’s quite large. I’ve posted it below to provide clarity.
  
  vertexSizes = Normal[Log[ 1.4 + Counts[Flatten[lines]]/ Max[Counts[Flatten[lines]]]]] /. {("All" -> _) -> Nothing, ("Herald" -> _) -> Nothing}; Graph[(edgesReduced /. "All" \[UndirectedEdge] _ -> Nothing), VertexSize -> vertexSizes, VertexLabels -> { "BIANCA" -> Placed[Style["Bianca", Bold, FontSize -> 16], {2.3, -0.4}], "BRABANTIO" -> Placed[Style["Brabantio", Bold, FontSize -> 22], {2.1, -0.8}], "CASSIO" -> Placed[Style["Cassio", Bold, FontSize -> 22], {1, -0.8}], "Clown" -> Placed[Style["Clown", Bold, FontSize -> 16], {2.3, -0.4}], "DESDEMONA" -> Placed[Style["Desdemona", Bold, FontSize -> 20], {0.5, -0.5}], "EMILIA" -> Placed[Style["Emilia", Bold, FontSize -> 22], {2.2, -0.4}], "GRATIANO" -> Placed[Style["Gratiano", Bold, FontSize -> 16], {2.3, -0.6}], "IAGO" -> Placed[Style["Iago", Bold, FontSize -> 22], {1.7, 0}], "LODOVICO" -> Placed[Style["Lodovico", Bold, FontSize -> 22], {1.2, -0.9}], "MONTANO" -> Placed[Style["Montano", Bold, FontSize -> 16], {1.6, -0.6}], "OTHELLO" -> Placed[Style["Othello", Bold, FontSize -> 28], {0.6, 1.5}], "RODERIGO" -> Placed[Style["Roderigo", Bold, FontSize -> 22], {.2, -1}], "DUKE OF VENICE" -> Placed[Style["Duke of Venice", Bold, FontSize -> 16], Above], "Fourth Gentleman" -> Placed[Style["Fourth Gentleman", Bold, FontSize -> 14], {-1., 1.7}] }, VertexLabelStyle -> Directive[Bold, FontSize -> 14], VertexStyle -> {"OTHELLO" -> RGBColor[1, 0.84, 0, 0.75], "BRABANTIO" -> RGBColor[0.79, 0.38, 0, 0.63], "DESDEMONA" -> RGBColor[0.73, 0.09, 0.89, 0.65], "LODOVICO" -> RGBColor[0.28026441037696703`, 0.715, 0.62, 0.88], "IAGO" -> RGBColor[0.363898, 0.71, 0.91, 0.85], "RODERIGO" -> RGBColor[0.571589, 0.79, 0., 0.71], "CASSIO" -> RGBColor[0.14, 0.15, 0.81, 0.65], "EMILIA" -> RGBColor[1., 0.29, 0.76, 0.68]}, EdgeStyle -> Gray, GraphStyle -> "BasicGray", ImageSize -> 1200]
  
  – Wolfram Blog Team
  Reply
  
  admin
  
  August 18, 2020 at 11:19 am 08/18/2020 at 11:19 am
Lovely, thanks, it works
Reply

Linda

August 18, 2020 at 5:23 pm 08/18/2020 at 5:23 pm
Hi Ruben. Sorry to hear that you’re having difficulties. The code actually is in the CDF file available for download at the end of the blog. If there is something specific that you are looking for in addition to that code, please let me know. I don’t have any additional codes though.
Reply

Wolfram Blog

May 30, 2018 at 1:22 pm 05/30/2018 at 1:22 pm

Analyzing Shakespeare’s Texts on the 400th Anniversary of His Death

Putting some color in Shakespeare’s tragedies with the Wolfram Language

Posted in:

Comments

10 comments

Related Posts

Navigating Quantum Computing: Accelerating Next-Generation Innovation

Food and Sun: Wolfram Language Recipe Graphs for the Solar Eclipse

Computational Astronomy: Exploring the Cosmos with Wolfram