Wolfram Computation Meets Knowledge

Squid Game Dominates Global TV Scene per Wikipedia Trends

The new Korean TV series Squid Game has taken the world by storm and become a global breakout phenomenon. Netflix, its distributor, announced, “Squid Game has officially reached 111 million fans—making it our biggest series launch ever!” It’s topped the charts in 90 different countries, including the United States, and has been called “a word-of-mouth global sensation” that’s flooded the news and social media.

Squid Game Dominates Global TV Scene per Wikipedia Trends

Squid Game’s viewership growth was so rapid—and massive—that soon after its release, the show was projected to make almost $900 million from a roughly $20 million production budget, according to Bloomberg. The same report also mentions that Netflix does not reveal the details of its various data metrics.

So, how can one assess objectively such explosive popularity, especially relative to other acclaimed and trending shows?

Wikipedia as Data

Pageviews of Wikipedia articles are an interesting, independent public data source that can serve as a sensible proxy to detect popularity trends. Another typical alternative is the number of relevant search engine queries. But those are easily mixed in with other similar subjects’ searches, especially across different languages.

Wikipedia Pageview statistics is a recent project with data going back only to July 1, 2015. But it plays along with the only recent success of Squid Game I am trying to quantify. Wikipedia data is also very focused: it has both dedicated pages for TV shows and separate pages per language, and the encyclopedic quality of the content attracts and helps to evaluate the truly engaged readership—a quality generally very hard to come by.

200 top Wikipedia groups

To assess the “explosive global popularity” (or “virality” here), I will use a simple metric: the highest number of pageviews generated by a TV show in a single day, summed over nine different languages of the top language demographics of Wikipedia users. There are alternative metrics, but this one was the most interesting to me.

Reflecting on Squid Game’s cultural background helps to understand its breakthrough to audiences around the world. I recommend this New York Times article by Jin Yu Young, who reports from Seoul and delivers an insider’s outlook. As she points out, South Korea’s historically powerful economic growth in the postwar era made it one of the richest countries in Asia, but wealth disparity has grown too and undermined national cultural unity.

Squid Game lays bare the irony between the social pressure to succeed in South Korea and the difficulty of doing just that,” says Shin Yeeun in the article. “It’s really difficult for people in their 20s to find a full-time job these days.” She’s a 2020 college graduate who suffered from a difficult, prolonged job search, aggravated by the COVID pandemic.

The disparity of wealth is one of the key foci in Squid Game being manifested through the deeply South Korean daily grind. At the same time, this very closely echoes the worries of the average global viewer, which helped make the show an international sensation.

Controversy, dystopian artistic visuals, violence and elements of sports and reality TV sharpen the show’s dramatic edge. And you know something is widely discussed when it’s the subject of a Saturday Night Live sketch and a New Yorker cartoon:

Collecting the Data

Identifying the most relevant data is the key first step. There are no truly consistent and open rankings for TV series, and even major players such as Netflix and Nielsen, which has a long history of producing broadcast TV ratings, might disagree about their data and analytics. Through careful examination of popular media and reported rankings, I handpicked a test sample of 10 recent shows:

  • Squid Game (Netflix)
  • Game of Thrones (HBO)
  • Stranger Things (Netflix)
  • Money Heist (Antena 3, Netflix)
  • The Mandalorian (Disney+)
  • The Witcher (Netflix)
  • The Handmaid’s Tale (Hulu)
  • Sex Education (Netflix)
  • Bridgerton (Netflix)
  • The Queen’s Gambit (Netflix)

They were reported as very popular by their distributing networks and various news outlets. Some of these series belong to a different network than Netflix, e.g. HBO’s Game of Thrones, and some were created in a different original language than English, e.g. the Spanish Money Heist.

Each string below is the ending of the Wikipedia hyperlink to the dedicated article about the corresponding TV show:

titles = {"Squid_Game", "Game_of_Thrones", "Stranger_Things",  "Money_Heist", "The_Mandalorian", "The_Witcher_(TV_series)",  "The_Handmaid
&#10005


Verifying global success means one should represent at least the major global demographic groups. Wikipedia has editions in multiple languages, where articles corresponding to the same subject are not necessarily exact translations but are written from scratch by user communities speaking a particular language. According to Wikipedia, its top eight editions with the largest user communities are written in the following languages (plus I’ve added Korean, as I’ll explain):

languages = {"English", "Spanish", "French", "German", "Chinese",  "Russian", "Portuguese", "Italian", "Korean"};
&#10005


The Korean language, which has a much lower rank at 20, was added as the ninth language, so all the original languages of all shows (English, Spanish, Korean) are represented. The WikipediaData function in the Wolfram Language is a flexible, user-friendly interface for the Wikipedia API. Utilizing parallel processing on all my CPU cores, I get the time series for all our selected shows and languages:

views = Association
&#10005


One needs to be very careful with the data from crowdsourcing projects such as Wikipedia. Examining the Squid Game time series, I noticed a suspiciously short line for the Chinese language. One could imagine that Asian pages would be created earlier, as Asia is the original market for the show. A thorough audit of the history of the Chinese Wikipedia page shows a disruptive change in the page URL that led to the loss of direct data access. I show the significant contribution of the lost data in black in the following plot:

DateListPlot
DateListPlot
&#10005


I recovered that data manually. Defining a function for merging time series:

timeSeriesMerge
&#10005


… I can add it to my data:

views
&#10005


Data Visualization

Visualizing the data requires some design considerations and further processing. First, it is useful to see how different lingual demographics contribute to the total pageviews we collected (albeit only for the nine languages). Not surprisingly, the Wikipedia English page leads in every chart. The English language is often considered an international communication language, and many people in the world use it as an important second language. The pages of English Wikipedia editions are also often the most detailed and developed. So, besides reading an article in its native language, international readers also check out its English version for more information. Hence, only the English page alone could be a proxy for “global” readership, but adding more languages with significant fractions of demographics helps improve the proxy. Removing the superfluous parts of strings helps to make plot labels more readable:

labels = StringDelete
&#10005


For each show, I visualize percentages of pageviews corresponding to different languages with the help of a PieChart. Note that these percentages are only for the data of top lingual Wikipedia demographics that I accessed. There are Wikipedia pageviews for more languages, but they are relatively small:

Grid
Grid
&#10005


Merging all time series of a particular show across all nine languages can be done with the help of the timeSeriesMerge function I defined earlier:

totals = timeSeriesMerge
&#10005


A single day’s maximum pageviews—the highest peaks—can be sorted to rank the TV series, at least according to the Wikipedia data available since July 1, 2015. Squid Game prevails, beating its closest rivals Game of Thrones, Stranger Things and Money Heist:

BarChart
&#10005


Game of Thrones premiered on HBO in the US on April 17, 2011, and its Wikipedia page as well as some the other shows’ pages were created before Wikipedia Pageview statistics was launched. Hence, we cannot guarantee that no highest peaks happen before 2015 for these shows. Nevertheless, at the least it is quite probable for all the highest peaks to happen after 2015, as all these shows are very recent and their popularity likely grew much slower than the popularity of Squid Game.

For instance, consider Game of Thrones’s complete time series for the entire available time span:

DateListPlot
&#10005


The peaks grow as seasons progress. Unlike Netflix’s approach of releasing all season episodes at once, one cannot binge-watch HBO series during an active season, as episodes are released weekly. So perhaps the highest viral peak for Game of Thrones was achieved when the show had its grand finale, and the critics, viewers and media were abuzz discussing it. The following functions find the highest peak in the time series together with its date of occurrence:

peak
&#10005


This can be used to create a labeled diagram of peak points to use in the main plot:

peaksPLOT = DateListPlot
&#10005


Finally, I created the data visualization at the beginning of this article:

Labeled
&#10005


A Genuine Blockbuster

Squid Game is indeed a global megahit as Netflix and the media proclaim it to be, at least with respect to the demographic groups engaged enough to read about the show on Wikipedia. I can also conclude that Wikipedia Pageviews data is a great source of information. One could consider different metrics or even different data, but the general Wikipedia trends seen here align well with other reported rankings and the general public’s reception of the sampled shows.

Have you already seen Squid Game, or are you planning to see it? Let me know what you think about it in the comments!

I would like to express special thanks to Jesse Friedman, who provided valuable advice in identifying data for this project.

Visit Wolfram Community or the Wolfram Function Repository to embark on your own computational adventures!

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

3 comments

  1. Great analysis Vitaliy!

    Reply
  2. Should also mention Facebook and History of Japanese invasions, since these topics are implicit to the visual language, see also: https://en.m.wikipedia.org/wiki/Japanese_invasions_of_Korea_(1592–1598)

    Reply