Wolfram Computation Meets Knowledge

Tracking a Descent to Savagery with the Wolfram Language: Plotting Sentiment Analysis in Lord of the Flies

Computation is no longer the preserve of science and engineering, so I thought I would share a simple computational literary analysis that I did with my daughter.

Shell Lord of the Flies

Hannah’s favorite book is Lord of the Flies by William Golding, and as part of a project she was doing, she wanted to find some quantitative information to support her critique.

Spoiler alert: for those who don’t know it, the book tells the story of a group of schoolboys shipwrecked on an island. Written as a reaction to The Coral Island, an optimistic and uplifting book with a similar initial premise, Lord of the Flies instead relates the boys’ descent into savagery once they are separated from societal influence.

The principle data that Hannah asked for was a timeline of the appearance of the characters. This is a pretty straightforward bit of counting. For a given character name, I can search the text for the positions it appears in, and while I am at it, label the data with Legended so that it looks nicer when plotted.

nameData[name_, label_] :=    Legended[First /@ StringPosition[$lotf, name, IgnoreCase -> True]/    StringLength[$lotf], label]; nameData[name_] := nameData[name, name];

The variable $lotf contains the text of the book (there is some discussion later about how to get that). By dividing the string position by the length of the book, I am rescaling the range to 0–1 to make alignment with later work easier. Now I simply create a Histogram of the data. I used a SmoothHistogram, as it looks nicer. The smoothing parameter of 0.06 is rather subjective, but gave this rather pleasingly smooth overview without squashing all the details.
characters = SmoothHistogram[{    nameData["Ralph"],    nameData["Jack"],    nameData["Beast", "The Beast"]}, 0.06, PlotStyle -> Thickness[0.01],   PlotRange -> {{0, 1}, All}, PlotLabel -> "Character appearances",   Ticks -> None]

Already we can see some of the narrative arc of the book. The protagonist, Ralph, makes an early appearance, closely followed by the antagonist, Jack. The nonexistent Beast appears as a minor character early in the book as the boys explore the island before becoming a major feature in the middle of the book, preceding Jack’s rise as Jack exploits fear of the Beast to take power. Ralph becomes significant again toward the end as the conflict between he and Jack reaches its peak.

But most of Hannah’s critique was about meaning, not plot, so we started talking about the tone of the book. To quantify this, we can use a simple machine learning classifier on the sentences and then do basic statistics on the result.

By breaking the text into sentences and then using the built-in sentiment analyzer, we can hunt out the sentence most likely to be a positive one.

MaximalBy[TextSentences[$lotf],   Classify["Sentiment", #, {"Probability", "Positive"}] &]

The classifier returns only "Positive", "Negative" and "Neutral" classes, so if we map those to numbers we can take a moving average to produce an average sentiment vector with a window of 500 sentences.

sentimentData =    Legended[MeanFilter[      ReplaceAll[       Classify["Sentiment", TextSentences[$lotf]] , {"Positive" -> 1,         "Negative" -> -1, "Neutral" | Indeterminate -> 0}], 500]*20,     "Sentiment"];

Putting that together with the character occurrences allows some interesting insights.

Lord of the Flies high-res

We can see that there is an early negative tone as the boys are shipwrecked, which quickly becomes positive as they explore the island and their newfound freedom. The tone becomes more neutral as concerns rise about the Beast, and turn negative as Jack rises to power. There is a brief period of positivity as Ralph returns to prominence before the book dives into bleak territory as everything goes bad (especially for Piggy).

Digital humanities is a growing field, and I think the relative ease with which the Wolfram Language can be applied to text analysis and other kinds of data science should help computation make useful contributions to many fields that were once considered entirely subjective.

Appendix: Notes on Data Preparation

Because I wanted to avoid needing to OCR my hard copy of the book, I used a digital copy. However, the data was corrupted by some page headers, artifacts of navigation hyperlinks and an index page. So, here is some rather dirty string pattern work to strip out those extraneous words and numbers to produce a clean string containing only narrative:

$lotf = StringReplace[    StringDelete[     StringReplace[      StringDrop[       StringDelete[        Import["http://<redacted>.txt"], {"Page " ~~ Shortest[___] ~~           " of 290\n\nGo Back\n\nFull Screen\n\nClose\n\nQuit",                           "Home Page\n\nTitle Page\n\nContents\n\n!!\n\n\"\"\n\n!\n\n\ \""}], 425],       "\n" -> " "], {"Home Page  " ~~ Shortest[___] ~~        "  Title Page  Contents", "!!  \"\"  !  \"    ", "\f"}],     "  " -> " "];

Appendix 2: Text Labels

This is the code for labeling the plot with key moments in the book:

sentenceLabel[{sentence_String, label_}] :=   Block[{pos =      Once[First[FirstPosition[TextSentences[$lotf], sentence]]]},   Callout[{pos/Length[TextSentences[$lotf]], sentimentData[[1, pos]]},     label, Appearance -> "Balloon"]]

eventLabels = ListPlot[sentenceLabel /@     {      {"As they watched, a flash of fire appeared at the root of one \ wisp, and then the smoke thickened.", "The fire"},            {"Only the beast lay still, a few yards from the sea.",        "Simon dies"},      {"Samneric were savages like the rest; Piggy was dead, and the \ conch smashed to powder.", "Piggy dies"},      {"\[OpenCurlyDoubleQuote]I\[CloseCurlyQuote]m chief then.\ \[CloseCurlyDoubleQuote]", "Ralph leads"},      {"We aren\[CloseCurlyQuote]t enough to keep the fire burning.\ \[CloseCurlyDoubleQuote]", "Ralph usurped"}}];

Appendix 3: Conch Shell Word Cloud Image

It doesn’t provide much insight, but the conch word cloud at the top of the article was generated with this code:

img = Import["http://pngimg.com/uploads/conch/conch_PNG18242.png"]; ImageMultiply[  Rasterize[   WordCloud[DeleteStopwords[$lotf], AlphaChannel[img], MaxItems -> 50,     WordSelectionFunction -> (StringLength[#] >= 4 &)],    ImageSize -> ImageDimensions[img]], ImageAdjust[img, {-0.7, 1.3}]]

Reference—conch shell: Source image from pngimg.com Creative Commons 4.0 BY-NC.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

7 comments

  1. As usual with everything written by Jon, this is excellent! Is it possible to provide a Link to the corresponding Notebook or CDF file for this analysis?

    Michael

    Reply
  2. Great that you made a plotting sentiment analysis of Lord of the flies! My nice loves to book too, bet she would be interested to see your statistics.

    Reply
  3. Rather splendid! This is so interesting. Thanks for posting and sharing. As an English prof. I am fascinated. And I hope Hannah got a good grade on her project.

    Reply
  4. Your analysis is very interesting. I really want to learn about it! Could you provide the corresponding Notebook or CDF file please?

    Reply