Wolfram Computation Meets Knowledge

Let’s Tango: Computational Musicology Using Wikidata, MusicBrainz and the Wolfram Language

Imagine you could import any website to obtain meaningful data for further processing, like creating a diagram, highlighting places on a map or integrating with other data sources. What if you could query data on the web knowing only one simple query language? That’s the vision of the semantic web. The semantic web is based on standards like the Resource Description Framework (RDF) and SPARQL (a query language for RDF). The upcoming release of Version 12 of the Wolfram Language introduces experimental support for interacting with the semantic web: you will be able to Import and Export a variety of RDF data formats as well as query remote SPARQL endpoints and in-memory data using either a query string or a symbolic representation of SPARQL.

Computational Musicology: Using Wikidata And MusicBrainz

Wikidata MusicBrainz

I’m going to introduce the representation of RDF data and SPARQL queries using two famous open-data repositories: Wikidata and MusicBrainz. Both provide their data as RDF, either as part of their websites or—in the case of Wikidata—on a SPARQL endpoint. RDF has a graph-based data model (with various concrete syntaxes), and its associated query language is SPARQL, whose basic building block is the graph pattern. We’ll begin by introducing SPARQL using “query strings,” which are usually easier to type than symbolic SPARQL (but that might depend on your taste). Once that’s set up, we’ll use the Wolfram Language’s unique symbolic representation that is most useful when writing programs that generate SPARQL queries that depend on user input or the result of a computation.

Wikidata

Wikidata is a general-purpose, free-to-use-and-edit database, powering, for instance, infoboxes on Wikipedia. Wikidata stores “claims,” which are not necessarily “facts”: these are values that someone has claimed or that are generally know to be true, ideally together with references (articles, books, etc.) and optionally with qualifiers (the date a value was valid, the measurement method, etc.). Claims are associated with items. Each item is identified with a Q-number: for instance, Q1 identifies “the universe.” Properties are also items (so they can have associated claims too); they are identified with P-numbers: for instance, P31 identifies the property “instance of.” For each new item or property, Wikidata creates a new identifier. This identifier is meant to be stable: if two items are created and it later turns out that they are about the same concept, they are “merged”—one identifier becomes the “main” identifier, and the other becomes a redirect. A Wikidata item can be edited from its item page; data can be queried using Wikidata’s query service or SPARQL endpoint.

Find Composers

To get familiar with Wikidata, go to its home page and enter “La cumparsita” into the search field to reach the data page describing that famous tango song:

WebImage
&#10005

WebImage["https://www.wikidata.org/wiki/Q765883"]

There you can find its composer, lyricist and even audio that you can play directly in your browser. Note the gray comment (Q765883) in the title, which is also part of the URL: this identifier can later be used to query data associated with this item. By clicking the link of one of the properties (say, “instance of”—not the value to its right), you can find out the identifier (P-number) of this property.

How about retrieving some of that data?

In the old days you would probably look at the page source and try to write a scraper for that website. But Wikidata offers a SPARQL endpoint, so let’s use that:

Needs
&#10005

Needs["GraphStore`"]

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select * where {
   wd:Q765883 wdt:P86 ?composer .
 }
 "]

GraphStore is new in the upcoming release of Version 12 of Mathematica. It’s an experimental functionality of the Wolfram Language for interacting with the semantic web. Once loaded, you can execute SPARQL queries against any SPARQL endpoint.

A basic SPARQL query starts with a query form (SELECT in this case), then lists the variables that you are interested in (* stands for all variables), followed by a graph pattern (the WHERE clause).

A basic graph pattern consists of triple patterns for the form "subject predicate object .". In our first example, subject is an Internationalized Resource Identifier (IRI) that identifies the song “La cumparsita” (here using a shorthand notation prefix:local—we will discuss this later). predicate is the IRI for the property “composer” (again using short notation). object is a variable (variables start with a question mark “?”).

When tasked to execute the query, the SPARQL endpoint tries to instantiate the variables with actual values so that the graph pattern becomes a subgraph of the queried graph. It then returns possible values for variables. Those are represented as lists of associations in the Wolfram Language.

Our first query resulted in one solution, which is an IRI. Inside the IRI object, click the “>>” to be directed to a data page describing this item. There, you can find that the name of the composer is “Gerardo Matos Rodríguez.” How would you retrieve this label using a query?

SPARQLExecute["https://query.wikidata.org/sparql"
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select * where {
   wd:Q937502 rdfs:label ?label .
 } limit 5
 "]

(Here we just copied and pasted the Q-number into the query. More automation will be shown later in this blog when we introduce “symbolic SPARQL.”)

Here wd:Q937502 is short for the composer IRI of the previous result. rdfs is a conventionally used prefix for the RDF Schema namespace IRI. You could have written out the complete IRIs like this:

"select * where {
&#10005

"
 select * where {
   < http://www.wikidata.org/entity/Q937502 > \
< http://www.w3.org/2000/01/rdf-schema#label > ?label .
 } limit 5
 ";

It is out of the scope of this blog post to explain the whole Wikidata data model. Suffice it to say that Wikidata “entities” are usually referred to as wd:Qnnn and properties as wdt:Pnnn, where nnn stands for an integer, wd for the entity namespace and wdt for the “direct” property namespace (when using wd instead of wdt, instead of the “main” value you get access to the statement node, from which you can continue to look up qualifiers and references for a value). For more information, look at the example queries and the list of prefixes.

Now combine the queries to get the label in one go. Also, add a filter to only get Spanish labels:

SPARQLExecute["https://query.wikidata.org/sparql"
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select * where {
   wd:Q765883 wdt:P86 ?composer .
   ?composer rdfs:label ?composerLabel .
   filter (lang(?composerLabel) = \"es\")
 }
 "]

Genres

Looking again at the data page for “La cumparsita,” we can find that the genre (P136) is tango (Q14390274):

SPARQLExecute["https://query.wikidata.org/sparql",
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select * where {
   wd:Q765883 wdt:P136 ?genre .
 }
 "]

Now let’s do the reverse: find items that are listed under the “tango” genre. To be sure we get “songs,” let’s also include the statement that the item is an “instance of” (P31) a “song” (Q7366):

SPARQLExecute["https://query.wikidata.org/sparql",
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select * where {
   ?tango wdt:P31 wd:Q7366 .
   ?tango wdt:P136 wd:Q14390274 .
   ?tango rdfs:label ?tangoLabel .
   filter(lang(?tangoLabel) = \"es\")
 } limit 5
 "]

We also limit the number of results to five. We can also count the number of tangos using an aggregate:

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select (count(?tango) as ?tangoCount) where {
   ?tango wdt:P31 wd:Q7366 .      # song
   ?tango wdt:P136 wd:Q14390274 . # tango
 }
 "]

Are those all of them? Wikidata also knows the genre “Argentine tango” (Q25116):

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select (count(?tango) as ?tangoCount) where {
   ?tango wdt:P31 wd:Q7366 .   # song
   ?tango wdt:P136 wd:Q25116 . # Argentine tango
 }
 "]

Let’s combine the queries to get songs that are either “tango” or “Argentine tango” using a UNION query:

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select (count(?tango) as ?tangoCount) where {
   ?tango wdt:P31 wd:Q7366 .        # song
   {?tango wdt:P136 wd:Q14390274 .} # tango
   UNION
   {?tango wdt:P136 wd:Q25116 .}    # Argentine tango
 }
 "]

There is another way that those queries could be combined: “Argentine tango” is a “subclass of” (P279) “tango.” Using a property path—a form of triple patterns that allows a regular expression of predicates in its predicate position—we can find songs that are “tango” or in a subclass of “tango” or a subclass of a subclass of… “tango”:

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select (count(?tango) as ?tangoCount) where {
   ?tango wdt:P31 wd:Q7366 .                  # song
   ?tango wdt:P136 / wdt:P279* wd:Q14390274 . # tango or a sub (sub \
...) genre
 }
 "]

The construct “instance of” followed by zero or more “subclass of” predicates can in general be used to retrieve both explicit and implicit instances of a class.

Combine Composer and Genre Queries

Let’s combine the previous queries to get tangos together with their composers. To create a nice (readable by humans) table, we’ll need the labels for both songs and composers. Specifying label subqueries, filtering each of them might quickly become cumbersome. Luckily, Wikidata provides a “label service” that provides the labels for us:

tangosComposers=SPARQLExecute
&#10005

tangosComposers = 
  SPARQLExecute["https://query.wikidata.org/sparql", "
   select ?tangoLabel ?composerLabel where {
     ?tango wdt:P31 wd:Q7366 .                  # song
     ?tango wdt:P136 / wdt:P279* wd:Q14390274 . # tango or sub (sub \
...) genre
     ?tango wdt:P86 ?composer .
     service wikibase:label {bd:serviceParam wikibase:language \"es\" \
.}
   }
   "];

Make a Dataset (stripping the RDFString wrapper—after all, we already know that we requested labels in Spanish):

Dataset
&#10005

Dataset[tangosComposers /. RDFString[s_, _] :> s]

MusicBrainz

MusicBrainz is a music encyclopedia whose main data can be used freely and that registered users can contribute to by using a review and voting process to ensure quality.

Let’s see what data looks like in MusicBrainz. The site has a page about “La cumparsita”:

WebImage
&#10005

WebImage["https://musicbrainz.org/work/17ff494f-7698-33ab-b841-\
b1ce5e01423a"]

There you can find a long list of artists that performed this “work” (to use MusicBrainz terminology), sometimes with recording dates. It also lists the composer, and on the right there is a link to Wikidata.

The Wikidata link brings us to the page that we looked at in the first section of this post. There (on Wikidata), under identifiers, you can find a property MusicBrainz work ID (P435). If you click the value, you come back to where you started—MusicBrainz. With those links, you can be absolutely certain that the contributors of both projects agree that both pages are about the same “resource” and that it is not just a name coincidence.

On the MusicBrainz page, look at the composer: it’s “Gerardo Matos Rodríguez,” just as on Wikidata. To confirm that it is the same person, click the link to reach the “artist” page. There again is a link to Wikidata, item Q937502. That means that the composer information about “La cumparsita” is consistent between Wikidata and MusicBrainz.

Embedded RDF

MusicBrainz does not (to my knowledge) have a SPARQL endpoint. We could write a scraper or use the API. But looking at the page source, we see JSON-LD, one of the serialization formats of RDF.

Writing a scraper usually means looking at the page source, finding where exactly the data is located and writing a function that extracts it. Even minor changes in the page can make this scraper function invalid, so the process might have to be repeated every time you want to extract the data. Also, it is unlikely that a scraper written for one website will work for a different one. A good API solves the first problem by providing a deterministic result in some well-known format (say "JSON") and stability over time. But still, before getting any data you’ll likely have to study the documentation of the respective API.

Let’s see whether instead of writing a scraper or studying the API documentation, we can simply request “linked data.” We do this by including in the request an “accept” header of “application/ld+json”. That will cause MusicBrainz to return data in the JSON-LD format. Then we tell URLExecute to apply the "JSONLD" importer to the received data:

workData=URLExecute
&#10005

workData = URLExecute[
  HTTPRequest[
   "https://musicbrainz.org/work/17ff494f-7698-33ab-b841-b1ce5e01423a",
   <|"Headers" -> {"accept" -> "application/ld+json"}|>
   ],
  "JSONLD"
  ]

What we get is an RDFStore, a symbolic representation of RDF data. The resource that this store describes is the song “La cumparsita,” identified by this IRI. To look into it, we can simply write a SPARQLQuery. Let’s first check which properties MusicBrainz used to describe its resources:

workData//SPARQLQuery
&#10005

workData // SPARQLQuery["
  select distinct ?p where {
    ?s ?p ?o .
  }
  "]

Note that SPARQLQuery[…] is an operator: its argument is a query string or a symbolic specification of the query. To evaluate it, apply it to an RDFStore (in this case using postfix notation).

Look up the composer:

workData//SPARQLQuery
&#10005

workData // SPARQLQuery["
  select ?composer where {
    ?work  ?composer .
  }
  "]

Alternatively, you can look up the composer with the name. Let’s also define a prefix to shorten the query:

workData//SPARQLQuery
&#10005

workData//SPARQLQuery["
prefix schema: 
select ?composer ?composerName where {
  ?work schema:composer ?composer .
  ?composer schema:name ?composerName .
}
"]

By the way, where was the query evaluated? Version 12 of the Wolfram Language will have an experimental SPARQL query evaluator, supporting version 1.1 of the SPARQL standard. That means that this query was evaluated locally, in your session.

Symbolic Queries

Maybe now it’s time to digress for a moment and consider symbolic SPARQL queries. So far we have written SPARQL query strings and either sent them to a SPARQL endpoint or evaluated them locally. If you are writing a data-driven program or a function that involves querying RDF data, then you’ll have to construct queries programmatically. If you want to construct the query string manually, you’ll quickly run into a variety of issues: How do you serialize literals? Do you need to escape certain values? How do you organize your code and all the instances of StringJoin used so that others can still understand it?

To help you programmatically construct SPARQL queries, Version 12 of the Wolfram Language will provide a symbolic representation of SPARQL queries. Here is the previous query in symbolic form:

schema
&#10005

schema[s_] := IRI["http://schema.org/" <> s];

workData // SPARQLSelect
&#10005

workData // SPARQLSelect[{
    RDFTriple[
     SPARQLVariable["work"], schema["composer"], 
     SPARQLVariable["composer"]
     ],
    RDFTriple[
     SPARQLVariable["composer"], schema["name"], 
     SPARQLVariable["composerName"]
     ]
    } -> {"composer", "composerName"}]

SPARQLSelect corresponds to a SPARQL SELECT query. Other query forms are SPARQLAsk and SPARQLConstruct, which represent queries that evaluate to a Boolean or an RDFStore, respectively.

A SPARQLSelect query contains a graph pattern, which in this case consists of a list of triple patterns (also known as “basic graph pattern”), or a rule, where the right-hand side indicates the variables to be included in the result.

Combine Data from Wikidata and MusicBrainz

Consistency

Now we have all the tools to check the consistency of “composer” statements between Wikidata and MusicBrainz.

Given a Wikidata ID, we should first obtain the MusicBrainz artist ID of the composer from Wikidata. Then we should get the MusicBrainz work ID (again from Wikidata). Using the latter, we can then find the MusicBrainz artist ID of the composer from MusicBrainz.

Here is an overview:

Graph

And here is a function that implements this diagram. A message is printed whenever data is missing or incorrect:

checkWorkConsistency
&#10005

checkWorkConsistency[work_IRI] := Module[
   {composer1, mbWork, composer2},
   composer1 = fixMBID[getMBComposerUsingWD[work]];
   If[MissingQ[composer1],
    Print["No composer specified in Wikidata."];
    Return[];
    ];
   mbWork = fixMBID[getMBWorkUsingWD[work]];
   If[MissingQ[mbWork],
    Print["No MusicBrainz work ID specified in Wikidata."];
    Return[];
    ];
   composer2 = getMBComposerUsingMB[mbWork];
   If[MissingQ[composer2],
    Print["No composer specified in MusicBrainz."];
    Return[];
    ];
   If[composer1 =!= composer2,
    Print["MusicBrainz composer ", composer2, 
     " is different from Wikidata composer ", composer1, "."];
    Return[];
    ];
   Style["consistent", Green]
   ];

The Wikidata queries: first we introduce two utilities to create “direct” (wdt) and “normalized” (wdtn) property IRIs. The former get you from the item directly to the value (ignoring qualifiers and references) and the latter produce, for certain properties, an IRI (instead of a “bare” string):

wdt
&#10005

wdt[s_] := IRI["http://www.wikidata.org/prop/direct/" <> s];
wdtn[s_] := 
  IRI["http://www.wikidata.org/prop/direct-normalized/" <> s];

Now view the actual queries:

getMBWorkUsingWD
&#10005

getMBComposerUsingWD[work_IRI] := SPARQLExecute[
    "https://query.wikidata.org/sparql",
    SPARQLSelect[{
      RDFTriple[work, wdt["P86"], RDFBlankNode["composer"]],
      RDFTriple[RDFBlankNode["composer"], wdtn["P434"], 
       SPARQLVariable["MBComposer"]]
      }]
    ] // Query[1, "MBComposer"];

getMBComposerUsingWD
&#10005

getMBWorkUsingWD[work_IRI] := SPARQLExecute[
    "https://query.wikidata.org/sparql",
    SPARQLSelect[{
      RDFTriple[work, wdtn["P435"], SPARQLVariable["MBWork"]]
      }]
    ] // Query[1, "MBWork"];

In Wikidata, IRIs are generated from an ID using the property “formatter URI for RDF resource” (P1921). Wikidata has an open bug that prevents changes for this formatter property from being immediately reflected in query results. Until this is fixed, we’ll use this function to construct the correct MusicBrainz IRI:

fixMBID
&#10005

fixMBID[IRI[i_]] := IRI[StringReplace[i, {
     "http:" -> "https:",
     "/" ~~ id : Repeated[_, {36}] ~~ "/" ~~ type__ :> 
      "/" <> type <> "/" <> id
     }]];

The MusicBrainz queries: first we introduce a utility that requests “linked data,” as we’ve done before.

getRDFStore
&#10005

getRDFStore[IRI[i_String]] := URLExecute[
   HTTPRequest[
    i,
    <|"Headers" -> {"accept" -> "application/ld+json"}|>
    ],
   "JSONLD"
   ];

Now view the actual query:

getMBComposerUsingMB
&#10005

getMBComposerUsingMB[work_IRI] := getRDFStore[work] // SPARQLSelect[{
      RDFTriple[work, schema["composer"], SPARQLVariable["MBComposer"]]
      }] // Query[1, "MBComposer"];

Check whether “La cumparsita” has consistent “composer” information in Wikidata and MusicBrainz:

checkWorkConsistency
&#10005

checkWorkConsistency[IRI["http://www.wikidata.org/entity/Q765883"]]

Listen to Music

Enough theory. Let’s actually listen to some music.

Given a MusicBrainz work IRI, Import and listen to a recording of “La cumparsita”:

getRecording
&#10005

getRecording[work_IRI] :=
 
 getRDFStore[work] //
     
     SPARQLSelect[
      RDFTriple[work, schema["sameAs"], SPARQLVariable["id"]]] //
    
    FirstCase[<|_ -> i : IRI[_?(StringContainsQ["wikidata"])]|> :> 
      fixWDID[i]] //
   SPARQLExecute[
     "https://query.wikidata.org/sparql",
     SPARQLSelect[RDFTriple[#, wdt["P51"], SPARQLVariable["audio"]]]
     ] & //
  FirstCase[<|_ -> audio_|> :> Import]

There is one little issue: MusicBrainz uses the (meant-for-humans) “/wiki/” IRI to identify an item, although this is not strictly correct. Item identifiers, according to the Wikidata data model, have an “/entity/” in their IRI. This “/entity/” IRI identifies a concept and, following best practice, redirects depending on the purpose: If that “concept IRI” is entered into a browser, a human-readable description is returned. If a machine requests the information (a machine identifies itself as such by including an appropriate “accept” header in the request), a machine-readable description is returned. Also, a SPARQL query expects the concept IRI. I have filed a ticket, but until this is fixed, we’ll use this small utility as a workaround:

fixWDID
&#10005

fixWDID[IRI[i_String]] := 
  IRI[StringReplace[
    i, {"https:" :> "http:", "/wiki/" :> "/entity/"}]];

Let’s get the song that we’ve been talking about this whole time. In your notebook, click the Play button to, well, play the song:

getRecording
&#10005

getRecording[
 IRI["https://musicbrainz.org/work/17ff494f-7698-33ab-b841-\
b1ce5e01423a"]]

Is the Data Completely Consistent?

Now we’d like to know if all the data is consistent between MusicBrainz and Wikidata. DBTune provides a SPARQL endpoint by “wrapping” the MusicBrainz relational data, but the list of properties does not seem to contain “composer”:

SPARQLExecute
&#10005

SPARQLExecute["http://dbtune.org/musicbrainz/sparql", 
   "select distinct ?p where {[] ?p []}"] // 
  Query[All, 1, 1 /* (StringSplit[#, "/" | "#"] &) /* Last] // Short

AnyTrue
&#10005

AnyTrue[%, StringContainsQ["composer", IgnoreCase -> True]]

If there was a MusicBrainz-SPARQL endpoint, we could first retrieve information from Wikidata:

{work,composer}=SPARQLExecute
&#10005

{work, composer} = SPARQLExecute["https://query.wikidata.org/sparql", "
     select ?mbwork ?mbcomposer where {
       ?song wdt:P31 wd:Q7366 .
       ?song wdt:P136 / wdt:P279* wd:Q14390274. # tango
       ?song wdtn:P435 ?mbwork.
       ?song wdt:P86 / wdtn:P434 ?mbcomposer.
     }
     "] // Values // Transpose;

work
&#10005

work[[;; 3]]

composer
&#10005

composer[[;; 3]]

We could then compare those work-composer pairs using a single query—using SPARQLValues to send inline data—and return those works for which MusicBrainz and Wikidata composers differ (commented out till such day in which a MusicBrainz-SPARQL endpoint becomes available):

(*SPARQLExecute
&#10005

(*SPARQLExecute[
"hypothetical MusicBrainz SPARQL endpoint",
SPARQLSelect[{
SPARQLValues[
{"work","composerFromWD"},
Thread[{work,composer}]
],
RDFTriple[SPARQLVariable["work"],schema["composer"],SPARQLVariable[\
"composerFromMB"]]
}/;SPARQLVariable["composerFromWD"]\[NotEqual]SPARQLVariable[\
"composerFromMB"]]
]*)

Without a SPARQL endpoint, we’ll have to retrieve composer information work by work from MusicBrainz.

This is the total number of works with composer information in Wikidata (not restricted to any genre):

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select (count(*) as ?c) where {
   ?song wdt:P31 wd:Q7366 .
   ?song wdtn:P435 ?mbwork.
   ?song wdt:P86 / wdtn:P434 ?mbcomposer.
 }
 "]

This number is likely going to grow in the future. So as not to send too many requests to MusicBrainz (and to learn something new), we’ll content ourselves with checking a “random” subset of the data. How do we get random values using SPARQL? The first idea might be to “ORDER BY” a random number like this, then LIMIT to the desired number of results:

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select ?song where {
   ?song wdt:P31 wd:Q7366 .
   ?song wdtn:P435 ?mbwork.
   ?song wdt:P86 / wdtn:P434 ?mbcomposer.
 }
 order by rand()
 limit 2
 "]

However, apparently this SPARQL endpoint is cheating a bit: when evaluating this query multiple times, the result stays the same. This could be caused by a cache, but trying to trick it by, say, changing the name of some variables or waiting a bit does not produce different results. To convince yourself that this SPARQL endpoint is indeed cheating, let’s write a fake query that returns two examples of the “random” number. They should be different, but they are the same:

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
 select ?r {
   ?s ?p ?o .
   bind (rand() as ?r)
 } limit 2
 "]

Instead, we’ll use a different approach: ordering by the MD5 hash of the song IRI catenated by a random string. Putting everything together, we arrive at this function, which gives a number of random work-composer pairs for a given genre:

wd
&#10005

wd[s_] := IRI["http://www.wikidata.org/entity/" <> s];

randomComposers
&#10005

randomComposers[genre_IRI, n_Integer?Positive] := SPARQLExecute[
   "https://query.wikidata.org/sparql",
   SPARQLSelect[
    {
      RDFTriple[SPARQLVariable["song"], wdt["P31"], wd["Q7366"]],
      SPARQLPropertyPath[
       SPARQLVariable["song"], {wdt["P136"], wdt["P279"] ...},
       genre
       ],
      RDFTriple[SPARQLVariable["song"], wdtn["P435"], 
       SPARQLVariable["mbwork"]],
      SPARQLPropertyPath[
       SPARQLVariable["song"],
       {wdt["P86"], wdtn["P434"]},
       SPARQLVariable["mbcomposer"]
       ]
      } -> {SPARQLVariable["mbwork"], SPARQLVariable["mbcomposer"]},
    "OrderBy" -> SPARQLEvaluation["md5"][
      SPARQLEvaluation["concat"][
       SPARQLEvaluation["str"][SPARQLVariable["song"]],
       ToString[RandomInteger[10^10]]
       ]
      ],
    "Limit" -> n
    ]
   ];

Here are 10 examples:

data=randomComposers
&#10005

data = randomComposers[
    IRI["http://www.wikidata.org/entity/Q14390274"], 10] // 
   Map[Map[fixMBID]];

List the last parts of the work and composer IRIs:

Dataset
&#10005

Dataset[data][All, All, First /* (StringSplit[#, "-"] &) /* Last]

Note how in the Wolfram Language (since Version 10), you can compose operators using /* (RightComposition), which is very convenient in case you like reading from left to right: the operators are applied in the order in which they appear.

Now you can find the works in MusicBrainz that either list a different composer or fail for a different reason (if you want to observe the progress, wrap Echo around #mbwork):

data//Select
&#10005

data // Select[Function[
   Pause[1];(* be nice to MusicBrainz *)
   Quiet[Check[
     getRDFStore[#mbwork] // SPARQLAsk[
       RDFTriple[#mbwork, schema["composer"], 
         SPARQLVariable["composer"]] /; 
        SPARQLVariable["composer"] != #mbcomposer
       ],
     True
     ]]
   ]]

It seemed like this batch was free of problems, though while writing and reevaluating this post inconsistencies occasionally showed up, one of which I allowed myself to investigate and fix (Wikidata listed a nonexisting MusicBrainz work ID, which I replaced with the correct one). Hopefully one of the many volunteer editors of Wikidata and MusicBrainz will find this analysis helpful in finding and fixing inconsistencies. Feel free to try your favorite genre.

Where Do Genres Come From?

Finally, let’s see where composers were born. Get locations, given a genre:

composerLocations
&#10005

composerLocations[genre_IRI] := SPARQLExecute[
   "https://query.wikidata.org/sparql",
   SPARQLSelect[
    {
      RDFTriple[SPARQLVariable["song"], wdt["P31"], wd["Q7366"]],
      SPARQLPropertyPath[
       SPARQLVariable["song"], {wdt["P136"], wdt["P279"] ...},
       genre
       ],
      SPARQLPropertyPath[
       SPARQLVariable["song"],
       {wdt["P86"], wdt["P19"], wdt["P625"]},
       SPARQLVariable["location"]
       ]
      } -> SPARQLVariable["location"]
    ]
   ] // Query[All, "location"]

tangoLoc=composerLocations
&#10005

tangoLoc = 
  composerLocations[IRI["http://www.wikidata.org/entity/Q14390274"]];

Each location is represented as a Point of a GeoPosition:

tangoLoc
&#10005

tangoLoc[[;; 3]]

Create a GeoHistogram:

GeoHistogram
&#10005

GeoHistogram[First /@ tangoLoc]

Compare the “origins” of tango, grunge and electronica using GeoGraphics:

GeoGraphics
&#10005

GeoGraphics[{
   {RGBColor[0.92, 0.13, 0.25], 
    composerLocations[
     IRI["http://www.wikidata.org/entity/Q14390274"]]},(* 
   tango *)
   {RGBColor[0.57, 0.51, 0.22], 
    composerLocations[IRI["http://www.wikidata.org/entity/Q11365"]]},(* 
   grunge *)
   {RGBColor[0.02, 0.7, 1.], 
    composerLocations[IRI["http://www.wikidata.org/entity/Q9778"]]}(* 
   electronica *)
   } /. 
  Point[p_] :> GeoDisk[p, Quantity[350, "Kilometers"]]]

Getting the Most Out of SPARQL and RDF

This was a quick overview of RDF and its query language SPARQL, and how the Wolfram Language will help you to work with RDF data and construct SPARQL queries.

We have seen two prominent examples of websites that provide RDF data, Wikidata and MusicBrainz. But there are many more—for example, websites that embed RDF in their page sources so search engines can better understand their contents, or SPARQL endpoints that allow complicated queries to be answered. Wikidata knows about some of them; to close, let’s ask it about other SPARQL endpoints:

SPARQLExecute
&#10005

SPARQLExecute["https://query.wikidata.org/sparql", "
   select * where {
     [] wdt:P5305 ?endpoint .
   }
   "] // Query[All, "endpoint"] // Short

This is the first release of semantic web technology in the Wolfram Language. We are curious about your feedback: How useful is symbolic SPARQL to you? What features are you missing? What are you doing with it? And even though we have not yet released the first version, we are already working on future features—for instance, support for the Web Ontology Language (OWL), simplified access to data associated with IRIs and a deeper integration with the Entity framework. Feel free to comment on this blog or reach out to us with your questions and feedback.

Comments

Join the discussion

!Please enter your comment (at least 5 characters).

!Please enter your name.

!Please enter a valid email address.