Wolfram Blog
Brian Wood

Computation + Literature in High School: Doctoral-Level Digital Humanities

November 20, 2018 — Brian Wood, Lead Technical Marketing Writer, Document and Media Systems

Thanks to the Wolfram Language, English teacher Peter Nilsson is empowering his students with computational methods in literature, history, geography and a range of other non-STEM fields. Working with a group of other teachers at Deerfield Academy, he developed Distant Reading: an innovative course for introducing high-level digital humanities concepts to high-school students. Throughout the course, students learn in-demand coding skills and data science techniques while also finding creative ways to apply computational thinking to real-world topics that interest them.

In this video, Nilsson describes how the built-in knowledge, broad subject coverage and intuitive coding workflow of the Wolfram Language were crucial to the success of his course:

Modernizing the Humanities with Computation

Nilsson’s ultimate goal with the course is to encourage computational exploration in his students by showing them applications relevant to their lives and interests. He notes that professionals in the humanities have increasingly turned toward computational methods for their research, but that many students entering the field are lacking in the coding skills and the conceptual understanding to get started. With the Wolfram Language, he is able to expose students to both in a way they find intuitive and easy to follow.

To introduce fundamental concepts, he shows students a pre-built Wolfram Notebook exploration of John Milton’s Areopagitica featuring a range of text analysis functions from the Wolfram Language. First he retrieves the full text from Project Gutenberg using Import:



He then demonstrates basic strategies for cleaning the text, using StringPosition and StringTake to find and eliminate anything that isn’t part of the actual work (i.e. supplementary content before and after the text):


StringPosition[textRaw,{ "A SPEECH FOR","End of the Project"}]



To quickly show the difference, he makes a WordCloud of the most common words before and after the cleanup process:



From here, Nilsson demonstrates some common text analyses and visualizations used in the digital humanities, such as making a Histogram of where the word “books” occurs throughout the piece:



Or computing the average number of words per sentence with WordCount and TextSentences:



Or finding how many unique words are used in the piece with TextWords:



He also discusses additional exploration outside the text itself, such as using WordFrequencyData to find the historical frequency of words (or n-grams) in typical published English text:



Building this example in a Wolfram Notebook allows Nilsson to mix live code, text, images and results in a highly structured document. And after presenting to the class, he can pass his notebook along to students to try themselves. Even students with no programming experience learn the Wolfram Language quickly, starting their own explorations after just a few days. Throughout the course, Nilsson encourages students to apply the concepts in different ways and try additional methods. “The challenge,” he says, “is getting them to think, ‘Oh, I can count this.’”

Doctoral-Level Research in a High-School Course

Once students are acquainted with the language and the methods, they start formulating research ideas. Nilsson says he is consistently impressed with the ingenuity of their projects, which span a broad range of humanities topics and datasets. For example, here is an analysis comparing phonetic distribution (phoneme counts) between two rap artists’ works:

Analysis comparing phonetic distribution

Students take advantage of the range of visualization types in the Wolfram Language to discover patterns they wouldn’t otherwise have noticed, such as this comparison of social networks in the Bible (using Graph plots):

Comparison of social networks in the Bible

Nilsson points out how much easier it is for students to do these high-level analyses in the digital age. “What took monks and scholars months and years to accumulate, we can now do in five minutes,” he says. He cites a classic analysis that has been recreated in his class, tracking geographic references in War and Peace with a GeoSmoothHistogram:


loc=Interpreter["Country"]/@TextCases[Rest@StringSplit[ResourceData["War and Peace"],"BOOK"],"Country"];


ListAnimate[GeoSmoothHistogram[#,GeoRange->{{-40, 80}, {-20, 120}}]&/@loc]

War and Peace

When sharing his activities with colleagues in higher education, he says many have been impressed with the depth he’s able to achieve. Some have compared his students’ projects to doctoral-level work—and that’s in a one-semester high-school course. But, he says, “You don’t have to be a doctoral student to do these really interesting analyses. You just have to know how to ask a good question.”

Reflecting on and Improving Student Writing

Nilsson also has his students analyze their own writing, measuring and charting key factors over time—from simple concepts like word length and vocabulary size to more advanced properties like sentence complexity. He sees it as an opportunity for them to examine the progression of their writing, empowering them to improve and adapt over time.

Many of these exercises go beyond the realm of simple text analysis, borrowing concepts from fields like network science and matrix algebra. Fortunately, the Wolfram Language makes it easy to represent textual data in different ways. For instance, TextStructure generates structural forms based on the grammar of a natural language text excerpt. Using the "ConstituentGraph" option gives a graph of the phrase structure in each sentence:





AdjacencyMatrix gives a matrix representation of connectivity within the graph for easier visual inspection and computation:



Closeness centrality is a measure of how closely connected a node is to all others in a network. Since each constituent graph represents a network of related words, sentences with a low average closeness centrality can be thought of as simpler. Applying ClosenessCentrality (and Mean) to each graph gives a base measure of how complex each sentence is:



Using these and other analytical techniques, students produce in-depth research reports based on their findings. Here is a snapshot of one paper from a student who used these strategies to examine sentence complexity in his own writing:

Using Closeness Centrality

Besides giving students the opportunity to analyze their high-school writing, Nilsson says this exercise also gives upcoming graduates a solid foundation for research analytics that will be useful in their college careers.

The Right Tool for the Job

Overall, the Wolfram Language has provided Nilsson with the perfect system for research and education in the digital humanities. Since adopting it into his curriculum, he has been able to make real improvements in student understanding and outcomes that he couldn’t have achieved otherwise. He notes that, when attempting similar exploration with Excel, MATLAB, R and other systems, none provided the unique combination of power, usability and built-in knowledge of the Wolfram Language. By wrapping everything into one coherent system, he says, the Wolfram Language gives him “a really potent tool for doing all kinds of analyses that are much more difficult in any other context.”

More Information

Get Started

Leave a Comment




An interesting example of the analysis of texts using Mathematica’s high level functionality. Is there a notebook available for this blog?


Posted by Michael    November 21, 2018 at 1:37 am
Ruben Garcia

Great post. Unfortunately, I ran into difficulties when trying to recreate the GeoSmoothHistogram command.
Mathematica returned a bunch of error messages such as:

GeoSmoothHistogram::ldata: {Austria,Austria,Russia,Malta,Russia,Russia,Russia,Russia,France,Russia,Austria,Germany,Austria,Russia,Russia,Russia,Austria,Sweden,Italy} is not a valid dataset or list of datasets.

Any suggestions how to get it to work?

Posted by Ruben Garcia    November 26, 2018 at 11:09 pm
    Wolfram Blog

    Hi Ruben…thanks so much for pointing this out! It looks like an important piece in the previous line was accidentally left out. TextCases returns a list of String objects by default, so they need to be interpreted as geographic entities before passing them into GeoSmoothHistogram. This can be done by mapping Interpreter over the result:

    loc = Interpreter["Country"]/@TextCases[Rest@StringSplit[ResourceData["War and Peace"],”BOOK”],”Country”];

    Thank you again for pointing this out!

    Posted by Wolfram Blog    November 30, 2018 at 1:56 pm

Leave a comment


Or continue as a guest (your comment will be held for moderation):