New in the Wolfram Language: WikipediaData
March 20, 2015 — Alan Joyce, Director, Content Development
Since the inception of Wolfram|Alpha, Wikipedia has held a special place in its development pipeline. We usually use it not as a primary source for data, but rather as an essential resource for improving our natural language understanding, particularly for mining the common and colloquial ways people refer to entities and concepts in various domains.
We’ve developed a lot of internal tools to help us analyze and extract information from Wikipedia over the years, but now we’ve also added a Wikipedia “integrated service” to the latest version of the Wolfram Language—making it incredibly easy for anyone to incorporate Wiki content into Wolfram Language workflows.
You can simply grab the text of an article, of course, and feed it into some of the Wolfram Language’s new functions for text processing and visualization:
Or if you don’t have a specific article in mind, you can search by title or content:
You can even use Wolfram Language entities directly in WikipediaData to, say, get equivalent page titles in any of the dozens of available Wikipedia language versions:
One of my favorite functions allows you to explore article links out from (or pointing in toward) any given article or category—either in the form of a simple list of titles, or as a list of rules that can be used with the Wolfram Language’s powerful functions for graph visualization. In fact, with just a few lines of code, you can create a beautiful and interesting visualization of the shared links between any set of Wikipedia articles:
There’s a lot of useful functionality here, and we’ve really only scratched the surface. Watch for many more integrated services to follow throughout the coming year.
Version 10.1 of the Wolfram Language is now supported in Mathematica and rolling out in all other Wolfram products.