Wolfram Computation Meets Knowledge

Democratic Presidential Debate Analysis Using the Wolfram Language

When 20 presidential candidates duke it out on the debate stage, who wins? Americans have been watching a crowded and contentious primary season for the 2020 Democratic nomination for president. After the debates, everyone’s talking about who got the most talk time or attention, which exchanges were most exciting or some other measure of who “won” the night—and who might ultimately clinch a victory at the caucuses. So I decided I’d do a little exploration of the debates using the entity framework, text analytics and graph capabilities of the Wolfram Language and see if I could come up with my own measure of a “win” for a debate, based on which candidate was most central to the conversation.

Democratic Presidential Debate Analysis Using the Wolfram Language

Text Import: What Was Said?

In a perfect world, the debate audio and video would be machine readable. But as it stands, we at least have access to the text in transcript form. Before we dive in, it’s helpful to be ready with a bit of preliminary info.

First, let’s retrieve a list of candidate names. Since candidates keep joining and dropping out, I built this list manually over time:

Engage with the code in this post by downloading the Wolfram Notebook
candidateNames
&#10005

candidateNames = {"Amy Klobuchar", "Andrew Yang", "Bernie Sanders",
   "Beto O'Rourke", "Bill De Blasio", "Cory Booker",
   "Elizabeth Warren", "Eric Swalwell", "Jay Inslee", "Joe Biden",
   "John Delaney", "John Hickenlooper", "Julián Castro",
   "Kamala Harris", "Kirsten Gillibrand", "Marianne Williamson",
   "Michael Bennet", "Pete Buttigieg", "Steve Bullock", "Tim Ryan",
   "Tulsi Gabbard", "Tom Steyer"};

We can use Interpreter to see which candidates the Wolfram System recognizes—as it turns out, all of them:

candidates = Interpreter
&#10005

candidates = Interpreter["Person"]@candidateNames

For proper parsing of questions, we’ll also want the names of the moderators (no entities required in this case):

modNames
&#10005

modNames = {"Lester Holt", "Savannah Guthrie", "Chuck Todd",
   "Rachel Maddow", "Jake Tapper", "Dana Bash", "Don Lemon",
   "George Stephanopoulos", "Jorge Ramos", "Linsey Davis",
   "David Muir", "Anderson Cooper", "Erin Burnett", "Marc Lacey",
   "Andrea Mitchell", "Ashley Parker", "Kristen Welker",
   "Judy Woodruff", "Yamiche Alcindor", "Amna Nawaz", "Tim Alberta",
   "Jose Diaz-Balart", "Wolf Blitzer", "Abby Phillip",
   "Brianne Pfannenstiel", "ANNOUNCER"};

Finally, we combine these into a list of everyone who either spoke or was spoken to:

debateNames = Join
&#10005

debateNames = Join[candidateNames, modNames];
debatePeople = Join[candidates, modNames];

For navigating the transcript, it will also be useful to split into lists of first and last names:

{firstNames, lastNames} = Transpose
&#10005

{firstNames, lastNames} =
  Transpose[{First[#], StringRiffle@Rest@#} & /@
    StringSplit[debateNames]];

Transcripts can come from any number of outlets; fortunately, the formatting is fairly standard across sources. Importing the transcript is as easy as Import; for this analysis, I will use the transcript of the most recent debate (January 14) as transcribed by Rev.com:

d7raw = Import
&#10005

d7raw = Import["https://wolfr.am/K7dyQhOh"];

You can see on the website that every spoken line begins with the speaker’s name—(with a few variations like “E. Warren”),—followed by a colon and a timestamp, then the contents of the line. We can use StringCases to grab each of those elements (including all name variations, iconized for brevity) from the transcript:

d7Lines = StringTrim /@ StringCases
&#10005

d7Lines = StringTrim /@ StringCases[
    StringReplace[
     d7raw, {{
       "Amy Klobuchar", "A. Klobuchar", "A Klobuchar", "Amy K.",
        "Amy K"} -> "Amy Klobuchar", {
       "Andrew Yang", "A. Yang", "A Yang", "Andrew Y.",
        "Andrew Y"} -> "Andrew Yang", {
       "Bernie Sanders", "B. Sanders", "B Sanders", "Bernie S.",
        "Bernie S"} -> "Bernie Sanders", {
       "Beto O'Rourke", "B. O'Rourke", "B O'Rourke", "Beto O.",
        "Beto O"} -> "Beto O'Rourke", {
       "Bill De Blasio", "B. De Blasio", "B De Blasio", "Bill D.",
        "Bill D"} -> "Bill De Blasio", {
       "Cory Booker", "C. Booker", "C Booker", "Cory B.",
        "Cory B"} -> "Cory Booker", {
       "Elizabeth Warren", "E. Warren", "E Warren", "Elizabeth W.",
        "Elizabeth W"} -> "Elizabeth Warren", {
       "Eric Swalwell", "E. Swalwell", "E Swalwell", "Eric S.",
        "Eric S"} -> "Eric Swalwell", {
       "Jay Inslee", "J. Inslee", "J Inslee", "Jay I.",
        "Jay I"} -> "Jay Inslee", {
       "Joe Biden", "J. Biden", "J Biden", "Joe B.",
        "Joe B"} -> "Joe Biden", {
       "John Delaney", "J. Delaney", "J Delaney", "John D.",
        "John D"} -> "John Delaney", {
       "John Hickenlooper", "J. Hickenlooper", "J Hickenlooper",
        "John H.", "John H"} -> "John Hickenlooper", {
       "Julian Castro", "J. Castro", "J Castro", "Julian C.",
        "Julian C"} -> "Julian Castro", {
       "Kamala Harris", "K. Harris", "K Harris", "Kamala H.",
        "Kamala H"} -> "Kamala Harris", {
       "Kirsten Gillibrand", "K. Gillibrand", "K Gillibrand",
        "Kirsten G.", "Kirsten G"} -> "Kirsten Gillibrand", {
       "Marianne Williamson", "M. Williamson", "M Williamson",
        "Marianne W.", "Marianne W"} -> "Marianne Williamson", {
       "Michael Bennet", "M. Bennet", "M Bennet", "Michael B.",
        "Michael B"} -> "Michael Bennet", {
       "Pete Buttigieg", "P. Buttigieg", "P Buttigieg", "Pete B.",
        "Pete B"} -> "Pete Buttigieg", {
       "Steve Bullock", "S. Bullock", "S Bullock", "Steve B.",
        "Steve B"} -> "Steve Bullock", {
       "Tim Ryan", "T. Ryan", "T Ryan", "Tim R.",
        "Tim R"} -> "Tim Ryan", {
       "Tulsi Gabbard", "T. Gabbard", "T Gabbard", "Tulsi G.",
        "Tulsi G"} -> "Tulsi Gabbard", {
       "Tom Steyer", "T. Steyer", "T Steyer", "Tom S.",
        "Tom S"} -> "Tom Steyer", {
       "Lester Holt", "L. Holt", "L Holt", "Lester H.",
        "Lester H"} -> "Lester Holt", {
       "Savannah Guthrie", "S. Guthrie", "S Guthrie", "Savannah G.",
        "Savannah G"} -> "Savannah Guthrie", {
       "Chuck Todd", "C. Todd", "C Todd", "Chuck T.",
        "Chuck T"} -> "Chuck Todd", {
       "Rachel Maddow", "R. Maddow", "R Maddow", "Rachel M.",
        "Rachel M"} -> "Rachel Maddow", {
       "Jake Tapper", "J. Tapper", "J Tapper", "Jake T.",
        "Jake T"} -> "Jake Tapper", {
       "Dana Bash", "D. Bash", "D Bash", "Dana B.",
        "Dana B"} -> "Dana Bash", {
       "Don Lemon", "D. Lemon", "D Lemon", "Don L.",
        "Don L"} -> "Don Lemon", {
       "George Stephanopoulos", "G. Stephanopoulos",
        "G Stephanopoulos", "George S.",
        "George S"} -> "George Stephanopoulos", {
       "Jorge Ramos", "J. Ramos", "J Ramos", "Jorge R.",
        "Jorge R"} -> "Jorge Ramos", {
       "Linsey Davis", "L. Davis", "L Davis", "Linsey D.",
        "Linsey D"} -> "Linsey Davis", {
       "David Muir", "D. Muir", "D Muir", "David M.",
        "David M"} -> "David Muir", {
       "Anderson Cooper", "A. Cooper", "A Cooper", "Anderson C.",
        "Anderson C"} -> "Anderson Cooper", {
       "Erin Burnett", "E. Burnett", "E Burnett", "Erin B.",
        "Erin B"} -> "Erin Burnett", {
       "Marc Lacey", "M. Lacey", "M Lacey", "Marc L.",
        "Marc L"} -> "Marc Lacey", {
       "Andrea Mitchell", "A. Mitchell", "A Mitchell", "Andrea M.",
        "Andrea M"} -> "Andrea Mitchell", {
       "Ashley Parker", "A. Parker", "A Parker", "Ashley P.",
        "Ashley P"} -> "Ashley Parker", {
       "Kristen Welker", "K. Welker", "K Welker", "Kristen W.",
        "Kristen W"} -> "Kristen Welker", {
       "Judy Woodruff", "J. Woodruff", "J Woodruff", "Judy W.",
        "Judy W"} -> "Judy Woodruff", {
       "Yamiche Alcindor", "Y. Alcindor", "Y Alcindor", "Yamiche A.",
        "Yamiche A"} -> "Yamiche Alcindor", {
       "Amna Nawaz", "A. Nawaz", "A Nawaz", "Amna N.",
        "Amna N"} -> "Amna Nawaz", {
       "Tim Alberta", "T. Alberta", "T Alberta", "Tim A.",
        "Tim A"} -> "Tim Alberta", {
       "Jose Diaz-Balart", "J. Diaz-Balart", "J Diaz-Balart",
        "Jose D.", "Jose D"} -> "Jose Diaz-Balart", {
       "Wolf Blitzer", "W. Blitzer", "W Blitzer", "Wolf B.", "Wolf B",
         "Moderator 3"} -> "Wolf Blitzer", {
       "Abby Phillip", "A. Phillip", "A Phillip", "Abby P.", "Abby P",
         "Moderator 2"} -> "Abby Phillip", {
       "Brianne Pfannenstiel", "B. Pfannenstiel", "B Pfannenstiel",
        "Brianne P.", "Brianne P",
        "Moderator 1"} -> "Brianne Pfannenstiel"}],
    Shortest[
      name : (debateNames) ~~ ": " ~~

       time : ("(" ~~ __ ~~ ")") ~~ __ ~~
       line__ ~~ "\n" ~~
       debateNames] :> {name, time, line},
    IgnoreCase -> False, Overlaps -> True];

Each entry is a list containing those three elements:

RandomChoice
&#10005

RandomChoice[d7Lines]

Then we convert this into a chronological list of speaker→line rules. Not every transcript I found had timestamps, so I’ve left them out of this data:

d7Data = (#1 /. Thread
&#10005

d7Data = (#1 /. Thread[debateNames -> debatePeople]) -> #3 & @@@
   d7Lines;

Here’s an entry from that list:

RandomChoice
&#10005

RandomChoice[d7Data]

Although the steps varied, I was able to build these datasets for each debate transcript. Here is the full data from all seven debates so far:

allDebateData = CloudGet
&#10005

allDebateData = CloudGet["https://wolfr.am/K7wv2AEd"];

Analysis: Who Spoke about What, and How Much?

The cloud data gives us enough information to start exploring with computation. For example, people often judge the debate by who got the most speaking time. We can roughly estimate this using WordCount to see how many words each person spoke:

wordsSpoken
&#10005

wordsSpoken[data_, person_] :=
 StringRiffle@Values[Select[data, First@# == person &]] // WordCount

A PieChart of this data from the most recent debate shows a pretty even spread:

PieChart
&#10005

PieChart[wordsSpoken[d7Data, #] & /@ candidates,
 ChartLabels -> candidates]

We can also dive a bit deeper into the content of the candidates’ words. With TextCases, we can get specific people, places and concepts mentioned in each line of text:

entityMentions
&#10005

entityMentions[data_, entType_String] :=
 Thread[data[[All, 1]] -> Union /@ TextCases[Values@data,
     entType -> "Interpretation", VerifyInterpretation -> True]]

We could use this to explore any number of trends, such as which companies were mentioned during a given debate:

Counts
&#10005

Counts[Flatten@Values@entityMentions[d7Data, "Company"]]

Throughout the debates, several countries have also been discussed; viewing the data with WordCloud shows the recent emphasis on Iran and Afghanistan:

WordCloud
&#10005

WordCloud[
 Most@Sort@Counts[Flatten@Values@entityMentions[d7Data, "Country"]]]

Since audiences seem to love watching politicians talk to and about each other, I thought it would be interesting to look at direct mentions of other candidates. We can use StringCases to determine when a candidate is referred to by first or last name, including former President Barack Obama and current President Donald Trump, both of whom are invoked quite often. This function finds such mentions, associating each text mention with a specific person:

peopleMentions
&#10005

peopleMentions[data_] := Module[{
   names = {
Alternatives["Amy", "Klobuchar"],
Alternatives["Andrew", "Yang"],
Alternatives["Bernie", "Sanders"],
Alternatives["Beto", "O'Rourke"],
Alternatives["Bill", "De Blasio"],
Alternatives["Cory", "Booker"],
Alternatives["Elizabeth", "Warren"],
Alternatives["Eric", "Swalwell"],
Alternatives["Jay", "Inslee"],
Alternatives["Joe", "Biden"],
Alternatives["John", "Delaney"],
Alternatives["John", "Hickenlooper"],
Alternatives["Julian", "Castro"],
Alternatives["Kamala", "Harris"],
Alternatives["Kirsten", "Gillibrand"],
Alternatives["Marianne", "Williamson"],
Alternatives["Michael", "Bennet"],
Alternatives["Pete", "Buttigieg"],
Alternatives["Steve", "Bullock"],
Alternatives["Tim", "Ryan"],
Alternatives["Tulsi", "Gabbard"],
Alternatives["Tom", "Steyer"],
Alternatives["Donald", "Trump"],
Alternatives["Barack", "Obama"]}, mentions, rules},
  mentions = Union /@ StringCases[data // Values, names];
  rules = Thread[names -> Join[candidates, {
Entity["Person", "DonaldTrump::6vv3q"],
Entity["Person", "BarackObama::7yj6w"]}]];
  Thread /@ Thread[Keys[data] -> mentions /. rules] // Flatten]

Although this function could potentially return false positives (e.g. “Warren Buffet” matching “Elizabeth Warren”), I found this to be a rare enough occurrence that I didn’t try to account for it. In a broader conversational piece, this might be a bit more difficult.

Since the moderators are not crucial for this analysis, here is a function to get a list of only Democratic candidates mentioning each other (or Trump/Obama):

candidateMentions
&#10005

candidateMentions[data_] :=
 DeleteMissing[peopleMentions[data] /. Thread[modNames -> Missing[]],
  1, 2]

When applied across the full dataset, we see that the candidates have mentioned names nearly a thousand times:

allMentions = Flatten
&#10005

allMentions = Flatten[candidateMentions /@ allDebateData];
Length[allMentions]

Out of curiosity, I had to see who has spoken about the two presidents most throughout the debates, so I made a quick function to count who mentioned a given person the most:

mostMentions
&#10005

mostMentions[candidate_] :=
 Dataset@TakeLargest[
   Counts[Select[allMentions, #[[2]] == candidate &] // Keys], UpTo@5]
Row
&#10005

Row[mostMentions /@ {Entity["Person", "DonaldTrump::6vv3q"],
   Entity["Person", "BarackObama::7yj6w"]}]

Amy Klobuchar has a lot to say about President Trump, whereas Obama gets a lot of name-drops from his former vice president, Joe Biden. Notably, between the two lists, we see all five of the current frontrunners (excluding Harris and Castro).

Here are all the self-mentions; amusingly, Biden talks about himself the most:

DeleteCases
&#10005

DeleteCases[
  Table[KeySelect[mostMentions[c] // Normal, MatchQ[#, c] &], {c,
    candidates}], <||>] // Dataset

To narrow the focus even more, here is a function that gives a list of mentions between participants of a specific debate only—removing Trump and Obama, as well as self-references:

debateMentions
&#10005

debateMentions[data_] := Select[candidateMentions[data],
  MatchQ[#[[2]], Alternatives @@ Union@Keys[data]] &&

    FreeQ[#[[2]], Alternatives @@ {
Entity["Person", "DonaldTrump::6vv3q"],
Entity["Person", "BarackObama::7yj6w"]}] &&
    !
     SameQ[#[[1]], #[[2]]] &]

The result is a reduced list of rules that we can use for further exploration of the interactions in each individual debate.

Graphs: Who Was the Center of Attention?

We’ve done a bit of text analysis and basic counting; now it’s time to switch gears to graph theory. If we think of the output from our previous function as a list of one-way connections between candidates (i.e. directed edges pointing from speaker to subject) we can easily display the resulting graph for a given debate:

Graph
&#10005

Graph[debateMentions[d7Data], VertexLabels -> "Name"]

We can simplify this view by representing repeated mentions with edge weights, styling each edge accordingly. First, we need to convert repeated edges into weighted edges:

weighted = Graph
&#10005

weighted = Graph[Union@debateMentions[d7Data],
   EdgeWeight -> Values@Counts[debateMentions[d7Data]]];

To apply styles to the edges, we need to get the weights using PropertyValue, use Rescale to get values appropriate for the styling functions and then apply the styles using Property:

edgeWeightStyledGraph
&#10005

edgeWeightStyledGraph[g_, styleFunc_List, ranges_List] := Graph@Module[
   {edges = EdgeList[g], weights, weightsScaled, styles},
   weights = Table[PropertyValue[{g, e}, EdgeWeight], {e, edges}];
   weightsScaled =
    Table[Rescale[weights, MinMax[weights], r], {r, ranges}];
   styles = Directive @@@ Transpose[
      Thread /@
       Table[styleFunc[[i]][weightsScaled[[i]]], {i,
         Length[styleFunc]}]];
   Table[Property[
     edges[[i]], {EdgeStyle -> styles[[i]],
      EdgeWeight -> weights[[i]]}], {i, Length[edges]}]]

Here is the same graph, but with edges consolidated and resized based on weights:

Graph
&#10005

Graph[edgeWeightStyledGraph[weighted, {Thickness}, {{.002, .01}}],
 VertexLabels -> "Name"]

Combining these steps (and adding some color), we can make a function to display the graph in a compact way:

debateMentionsGraph
&#10005

debateMentionsGraph[data_] := With[
  {g = Graph[debateMentions[data] // Union,
     EdgeWeight -> Values@Counts@debateMentions[data]]},
  Graph[edgeWeightStyledGraph[
    g, {Thickness, Darker[RGBColor[0, 0.68625, 0.95295], #] &},
    {{.002, .01}, {.5, 0}}], ({
    GraphLayout -> "LayeredDigraphEmbedding",
     AspectRatio -> (If[# > 0.7, 0.7, #]& )[VertexCount[#]/20.],
     VertexLabels -> Placed[
      "Name", Center,
       Framed[#, RoundingRadius -> 5,
        Background -> Directive[{White,
Opacity[0.8]}]]& ], VertexLabelStyle -> Directive[{"Text", Medium}],
     VertexSize -> Large, VertexStyle -> White}& )[g]]]

These graphs give a nice visual summary of the exchanges in each debate. Note that the positioning of the candidates is not important here—just the edges and their styles. In the latest debate, you can see a pretty strong emphasis on Sanders and Warren, the two progressive candidates:

g = debateMentionsGraph
&#10005

g = debateMentionsGraph[d7Data]

Most candidates have built-in images tied to their entities, so instead of labeling with names, we could also use faces:

facesOverlay
&#10005

facesOverlay[g_] :=
 Graph[g, VertexSize -> 1.1, VertexLabels -> (Table[c -> Placed[
Overlay[{
Image[
RemoveBackground[
ImageApply[# + 1& ,
(ImageTrim[#,
First[
FindFaces[#]], 20]& )[
If[
ImageQ[
c["Image"]],
c["Image"],
First[
WebImageSearch[
StringJoin[
c["Name"], " headshot"], Method -> "Google", MaxItems -> 1]]]],
            Masking -> Graphics[
Disk[]]]], ImageSize -> 60],
Graphics[{
GrayLevel[0.7],
Thickness[0.05],
Circle[{0, 0}, 30]}, ImageSize -> 62]}, Alignment -> Center],
       Center], {c,
VertexList[#]}]& )[g]]
facesOverlay
&#10005

facesOverlay[g]

Applying this to the first debate makes for an interesting—but crowded—graphic:

facesOverlay
&#10005

facesOverlay[debateMentionsGraph[allDebateData[[1]]]]

Next, we can explore DegreeCentrality: the relative number of edges connected to each vertex. In a directed graph, we can look at either in-degrees (a candidate being spoken about) or out-degrees (a candidate speaking about others). A high in-degree might indicate popularity, whereas a high out-degree would imply assertiveness. So for this exploration, we’ll consider both:

Thread
&#10005

Thread[VertexList[g] ->
  Transpose@Table[DegreeCentrality[g, i], {i, {"In", "Out"}}]]

By these numbers alone, we might give a win to Bernie Sanders for assertiveness or Tom Steyer for popularity. But we can take a more visual look by overlaying the data on our graph. First we compute both measures for each vertex:

centralityList
&#10005

centralityList[g_] :=
 Table[DegreeCentrality[g, i], {i, {"In", "Out"}}]

Then we use Blend to define a color for each vertex (candidate)—Blue for in-degrees (popularity) and Red for out-degrees (assertiveness):

colorList
&#10005

colorList[g_] :=
 Blend[{RGBColor[0, 0, 1], RGBColor[1, 0, 0]}, #] & /@
  Transpose@centralityList[g]

Combining this with our existing function, we can make a new function to display the graph with vertex colors based on that blend:

mentionsCentralityGraph
&#10005

mentionsCentralityGraph[data_] := With[{g = debateMentionsGraph[data]},
  Graph[g, {VertexLabelStyle ->
     Thread[VertexList[g] -> colorList[g]]}]]

Let’s take another look at the most recent debate:

mentionsCentralityGraph
&#10005

mentionsCentralityGraph[d7Data]

The story here is clear: Sanders and Warren (purple) were at the core of numerous exchanges, with additional commentary from Biden and Steyer (who actually mentioned everyone). Klobuchar called out others but was barely named, whereas Buttegieg was mentioned but didn’t talk about others. By a “balanced” criterion (high assertiveness and popularity scores), Sanders and Warren appear to be the winners.

Let’s look back a bit further at the fifth debate, which aired on November 20:

mentionsCentralityGraph
&#10005

mentionsCentralityGraph[allDebateData[[5]]]

One thing is evident: more candidates lead to fewer overall mentions. This makes sense from a logistical standpoint—each person has less time to speak—but I think it also shows the growing emphasis on candidates’ differences as the debates wear on.

Combining all the data, we get an indication of how people tended to interact throughout the debates:

Thread
&#10005

Thread[candidates ->
  colorList@debateMentionsGraph[Flatten@allDebateData]]

Again, the data tells a story. Biden, the long-time presumptive frontrunner, receives a lot more attention than he gives. Many of the minor candidates—Bullock, Inslee, Delaney—tend to (or rather, are able to) call out others without getting much return fire.

Polling: Who Gained the Most Voters?

If there’s one metric most analysts can agree upon (or at least, agree to criticize), it’s the national polls. So if we want some objective measure of who “won” the debate, that might be a good place to start. Much of this section is probably similar to standard analyses you’ll find elsewhere.

For this section, we’ll jump back into text analytics; I found a Wikipedia article rife with up-to-date info in organized tables:

pollData = Import
&#10005

pollData = Import["https://wolfr.am/K7iT0dHR", "Data"];

Each row contains the same information—dates administered, sample size, margin of error and percentages for each candidate. Using StringTrim to get rid of unnecessary characters, here is a function to parse the data from each list and interpret it with SemanticImportString:

parsePollData
&#10005

parsePollData[list_] := SemanticImportString[ExportString[Select[
    StringTrim[#, {
         Shortest["\[Dash]" | "-" ~~ ___ ~~ DigitCharacter ..],
         "\[PlusMinus] ",
         ("(" | "[") ~~ __ ~~ (")" | "]") ~~
          WhitespaceCharacter ...
         }] & /@ list // Quiet,
    Length[#] == Length[list[[1]]] &], "CSV",
   CharacterEncoding -> "ASCII"],
  PadRight[{"String", Interpreter@"Date", "Number", "Percent"},
   Length[list[[1]]], "Percent"], HeaderLines -> 1]

The page includes a lot of extra info—and it keeps growing—but we’re mainly interested in polls between last June and today. Here is a combined list of all entries from that window:

pollDataCombined = Flatten@Table
&#10005

pollDataCombined =
  Flatten@Table[Normal@parsePollData[p], {p, pollData[[1, 5 ;; 12]]}];

Now we have entries for each poll, but all the dates are interpreted as being in 2020:

RandomChoice
&#10005

RandomChoice[pollDataCombined]

As a final cleaning step, we’ll fix the dates and standardize the key names:

cleanPollData
&#10005

cleanPollData[list_] := Table[KeyMap[StringReplace[StringTrim[#], {
       ___ ~~ "source" ~~ ___ -> "Source",
       ___ ~~ "date" ~~ ___ -> "Date",
       ___ ~~ "size" ~~ ___ -> "SampleSize",
       ___ ~~ "err" ~~ ___ -> "ErrorMargin",
       ___ ~~ "undecided" ~~ ___ -> "Undecided"},
      IgnoreCase -> True] &, p] //
   If[#Date >
      Today, # /. #Date -> (#Date - Quantity[1, "Years"]), #] &, {p,
   list}]
pollDataClean = cleanPollData
&#10005

pollDataClean = cleanPollData[pollDataCombined];

With this dataset, we can view polling numbers for a given candidate or date range. Here is each candidate’s polling history as a TimeSeries object:

pollNames = Intersection
&#10005

pollNames =
  Intersection[candidateNames, Union[Flatten[Keys /@ pollDataClean]]];
pollHistory = Table
&#10005

pollHistory = Table[c -> TimeSeries[
     Values /@ DeleteMissing[pollDataClean[[All, {"Date", c}]], 1, 2]
     ], {c, pollNames}];

The data is summed up nicely by a DateListPlot:

DateListPlot
&#10005

DateListPlot[
 Thread@Legended[
   TimeSeriesResample[Values@pollHistory, Quantity[1, "Weeks"]],
   Last /@ StringSplit /@ pollNames]]

However, this overview includes more information than we need, and many of the lower-polling candidates are obscured. Let’s zoom in to only look at the leaders, i.e. anyone who has risen above 10% in the polls since the debates began:

leaders = Select
&#10005

leaders =
  Select[pollHistory,
   ContainsAny[
     Union[# > Quantity[10, "Percent"] & /@
       Values[#[[2]]]], {True}] &];

By adding vertical lines for each debate date, we can get a rough idea of how the debates have affected polling:

DateListPlot
&#10005

DateListPlot[Legended[
   TimeSeriesResample[Values@leaders, Quantity[1, "Weeks"]], {
Style["Sanders", 15],
Style["Warren", 15],
Style["Biden", 15],
Style["Harris", 15],
Style["Buttigieg", 15]}] // Thread, {
 ImageSize -> 430, LabelStyle -> {FontSize -> 11},
  PlotRange -> {{{2019, 6, 1}, {2020, 2, 5}}, Automatic},
  Epilog -> {{
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 40.}}],
GrayLevel[0], FontSize -> 11,
Inset[
Framed["Jan 14", Background -> GrayLevel[1]], {
DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 42.}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 40.}}],
GrayLevel[0], FontSize -> 11,
Inset[
Framed["Dec 19", Background -> GrayLevel[1]], {
DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 42.}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 40.}}],
GrayLevel[0], FontSize -> 11,
Inset[
Framed["Nov 20", Background -> GrayLevel[1]], {
DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 42.}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 40.}}],
GrayLevel[0], FontSize -> 11,
Inset[
Framed["Oct 15", Background -> GrayLevel[1]], {
DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 42.}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 40.}}],
GrayLevel[0], FontSize -> 11,
Inset[
Framed["Sep 12", Background -> GrayLevel[1]], {
DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 42.}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 40.}}],
GrayLevel[0], FontSize -> 11,
Inset[
Framed["Jul 30", Background -> GrayLevel[1]], {
DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 42.}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 40.}}],
GrayLevel[0], FontSize -> 11,
Inset[
Framed["Jun 26", Background -> GrayLevel[1]], {
DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 42.}]}}}]

Though the rankings haven’t moved much, you can see a few peaks and troughs that likely resulted from debates. For a quick estimate of who might have gained or lost, we can aggregate data from the week before a debate, then compare to polling numbers the week after the same debate:

beforeAfter
&#10005

beforeAfter[date_, data_: pollDataClean] := With[{
   before =
    Select[data, date >= #Date > date - Quantity[1, "Weeks"] &],
   after =
    Select[data, date <= #Date < date + Quantity[1, "Weeks"] &]},
  DeleteCases[<|Table[c -> N@Mean[Cases[#[[All, c]], _Quantity]], {c,
         pollNames}] /.
       Thread[candidateNames -> candidates]|>, _Mean] & /@ {before,
    after}]

Subtracting the two values gives an estimate of percentage changes around a given date:

percentChange
&#10005

percentChange[date_, data_: pollDataClean] :=
 Subtract @@ Reverse@beforeAfter[date, data]

From there, we can use the mean polling per candidate to judge whether public opinion shifted. For instance, Kamala Harris’s average went up significantly after the first debate in June:

percentChange
&#10005

percentChange[DateObject[{2019, 6, 26}]]

This should give a broad measure of how debate performance affected polling for each candidate. Here are the numbers for the latest debate—only including those candidates who actually participated:

KeySelect
&#10005

KeySelect[
 percentChange[DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.`]],
 MemberQ[Union@Keys@d7Data, #] &]

Iowa Caucus and Beyond

Of course, none of these numbers really mean anything until voting starts. The first candidate selection event of the cycle, the Iowa caucus, was held earlier this week—with Pete Buttegieg appearing to come out on top (pending final results).

Let’s see how this compares with our debate mentions. We’ll need to get a dataset of statewide polling:

stateData = Import
&#10005

stateData = Import["https://wolfr.am/K7juP63d", "Data"];

Using the functions from the previous section, we can import the most recent Iowa polls:

iowaPollData = Normal@parsePollData
&#10005

iowaPollData =
  Normal@parsePollData[stateData[[1, 5]]] // cleanPollData;

As in the previous section, we grab the polling history for each candidate:

iowaNames = Intersection
&#10005

iowaNames =
  Intersection[candidateNames, Union[Flatten[Keys /@ iowaPollData]]];
iowaHistory = Table
&#10005

iowaHistory = Table[c ->
    TimeSeries[
     Values /@
      DeleteMissing[iowaPollData[[All, {"Date", c}]], 1, 2]], {c,
    iowaNames}];

The polling history shows that the rankings changed drastically after the Iowa debate:

DateListPlot
&#10005

DateListPlot[
 Thread@Legended[
   TimeSeriesResample[Values@iowaHistory, Quantity[2, "Days"]],
   Last /@ StringSplit /@ iowaNames], {Epilog -> {{
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 32}}],
GrayLevel[0], FontSize -> 13,
Inset[
Framed["Jan 14", Background -> GrayLevel[1]], {
DateObject[{2020, 1, 14}, "Day", "Gregorian", -6.], 34}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 32}}],
GrayLevel[0], FontSize -> 13,
Inset[
Framed["Dec 19", Background -> GrayLevel[1]], {
DateObject[{2019, 12, 19}, "Day", "Gregorian", -6.], 34}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 32}}],
GrayLevel[0], FontSize -> 13,
Inset[
Framed["Nov 20", Background -> GrayLevel[1]], {
DateObject[{2019, 11, 20}, "Day", "Gregorian", -6.], 34}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 32}}],
GrayLevel[0], FontSize -> 13,
Inset[
Framed["Oct 15", Background -> GrayLevel[1]], {
DateObject[{2019, 10, 15}, "Day", "Gregorian", -6.], 34}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 32}}],
GrayLevel[0], FontSize -> 13,
Inset[
Framed["Sep 12", Background -> GrayLevel[1]], {
DateObject[{2019, 9, 12}, "Day", "Gregorian", -6.], 34}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 32}}],
GrayLevel[0], FontSize -> 13,
Inset[
Framed["Jul 30", Background -> GrayLevel[1]], {
DateObject[{2019, 7, 30}, "Day", "Gregorian", -6.], 34}]}, {
RGBColor[0.6666666666666666, 0.6666666666666666, 0.6666666666666666], 

Opacity[0.5],
Line[{{
DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 0}, {
DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 32}}],
GrayLevel[0], FontSize -> 13,
Inset[
Framed["Jun 26", Background -> GrayLevel[1]], {
DateObject[{2019, 6, 26}, "Day", "Gregorian", -6.], 34}]}}}]

Sanders and Biden have been battling for the top spot in Iowa, but Buttigieg and Warren came in with some potential too. Sanders had the biggest polling surge in the weeks leading up to the caucus:

percentChange
&#10005

percentChange[DateObject[{2020, 1, 20}], iowaPollData]

Going by the standard analysis, Sanders or Biden should obviously have come out on top. But referring back to the centrality graph from earlier, Buttigieg was actually the most “popular” in that others mentioned him without him ever mentioning anyone else:

facesOverlay
&#10005

facesOverlay[debateMentionsGraph[d7Data]]

Perhaps ignoring critics was a winning strategy in that case!

Graph theory is a useful lens for visualizing interpersonal exchanges, providing some unique computational insights about how the candidates interact. Combining this with other computations could provide even more interesting information, such as using sentiment analysis to determine candidates’ attitudes toward each other. In the coming months, it will be interesting to find out what kinds of debaters end up winning the most primaries: assertive, popular or mixed. Try it yourself—maybe you can define your own computational measure of a debate winner!

Get full access to the latest Wolfram Language functionality with a Mathematica 12 or Wolfram|One trial.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

2 comments

  1. Great analysis!

    It might be a bit better if there were also titles on the individual plots, to hook the reader’s attention a bit more. Right now, I scroll through and just see a bunch of faces and arrows and am not sure what the differences between one pic and the next might be and must weigh whether I want to slow my pathological scrolling long enough to read the context of each section to figure it out!

    Reply
    • Thank you for reading and commenting, bernard! Your suggestion is a good one. In this case, I think I was so preoccupied with styling the graphs and showing the workflow that I neglected to add overall plot labels. I will certainly keep it in mind for next time!

      Reply