WOLFRAM

New in the Wolfram Language: WikipediaData

Since the inception of Wolfram|Alpha, Wikipedia has held a special place in its development pipeline. We usually use it not as a primary source for data, but rather as an essential resource for improving our natural language understanding, particularly for mining the common and colloquial ways people refer to entities and concepts in various domains.

We’ve developed a lot of internal tools to help us analyze and extract information from Wikipedia over the years, but now we’ve also added a Wikipedia “integrated service” to the latest version of the Wolfram Language—making it incredibly easy for anyone to incorporate Wiki content into Wolfram Language workflows.

You can simply grab the text of an article, of course, and feed it into some of the Wolfram Language’s new functions for text processing and visualization:

text sentence WikipediaData

word cloud WikipediaData

Or if you don’t have a specific article in mind, you can search by title or content:

WikipediaSearch by content or title

You can even use Wolfram Language entities directly in WikipediaData to, say, get equivalent page titles in any of the dozens of available Wikipedia language versions:

using entitites in WikipediaData

One of my favorite functions allows you to explore article links out from (or pointing in toward) any given article or category—either in the form of a simple list of titles, or as a list of rules that can be used with the Wolfram Language’s powerful functions for graph visualization. In fact, with just a few lines of code, you can create a beautiful and interesting visualization of the shared links between any set of Wikipedia articles:

WikisSharedLinks in a given article or category

There’s a lot of useful functionality here, and we’ve really only scratched the surface. Watch for many more integrated services to follow throughout the coming year.

Version 10.1 of the Wolfram Language is now supported in Mathematica and rolling out in all other Wolfram products.

Download this post as a Computable Document Format (CDF) file.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.

12 comments

  1. This is great, as well as other previews of new functionality. Is there any timetable for when we might see this?

    Reply
  2. I cannot find the documentation for “WikipediaData” and it does not yet work in Mathematica 10.0.2 or the Programming Cloud. Could you tell me where to find the new version and / or when it will be live?

    Reply
  3. The Arabic entry in the table is completely wrong: The characters are not connected and not written from right to left, as they should.

    Reply
  4. Love it. I image this will be fun to play with.

    Reply
  5. Thanks for your comment, the features in this blog are a part of our 10.1 update that was recently released.

    Reply