Hands-on Session 01

16/04/2012, Lyon, France

For this hands-on session, we have a number of objectives:

Existing Queries

We had two scenarios, which we addressed with SPARQL queries:

Albums from July 2007

1.) To get an overview of the dataset, we used the following query to get a list of all classes:

SELECT DISTINCT ?class
WHERE {
	?s a ?class .
}

The result should look something like this:

--------------------------------------------------------
| class                                                |
========================================================
| <http://purl.org/ontology/mo/Playlist>               |
| <http://purl.org/ontology/mo/Signal>                 |
| <http://purl.org/ontology/mo/Lyrics>                 |
| <http://purl.org/ontology/mo/Track>                  |
| <http://www.w3.org/2006/time#Interval>               |
| <http://purl.org/ontology/mo/Torrent>                |
| <http://purl.org/ontology/mo/ED2K>                   |
| <http://purl.org/ontology/mo/Record>                 |
| <http://www.holygoat.co.uk/owl/redwood/0.1/tags/Tag> |
| <http://purl.org/ontology/mo/MusicArtist>            |
| <http://xmlns.com/foaf/0.1/Document>                 |
--------------------------------------------------------
2.) To further improve our understanding of the data, we can list all properties used for a particular class, in this case mo:MusicArtist:
PREFIX mo: <http://purl.org/ontology/mo/>

SELECT DISTINCT ?pred
WHERE {
	?artist a mo:MusicArtist ;
		?pred ?obj .
}

Result:

-----------------------------------------------------
| pred                                              |
=====================================================
| <http://xmlns.com/foaf/0.1/name>                  |
| <http://xmlns.com/foaf/0.1/made>                  |
| <http://xmlns.com/foaf/0.1/img>                   |
| <http://xmlns.com/foaf/0.1/based_near>            |
| <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> |
| <http://xmlns.com/foaf/0.1/homepage>              |
| mo:biography                                      |
-----------------------------------------------------

3.) As an improvement of the previous query, we can also list the kinds of resources these properties link to. This introduces the OPTIONAL keyword, since not all objects will have a class assigned:

PREFIX mo: <http://purl.org/ontology/mo/>

SELECT DISTINCT ?pred ?class
WHERE {
	?artist a mo:MusicArtist ;
		?pred ?obj .
	OPTIONAL {
		?obj a ?class .
	}
}

Result:

-----------------------------------------------------------------
| pred                                              | class     |
=================================================================
| <http://xmlns.com/foaf/0.1/name>                  |           |
| <http://xmlns.com/foaf/0.1/made>                  | mo:Record |
| <http://xmlns.com/foaf/0.1/img>                   |           |
| <http://xmlns.com/foaf/0.1/based_near>            |           |
| <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> |           |
| <http://xmlns.com/foaf/0.1/homepage>              |           |
| mo:biography                                      |           |
-----------------------------------------------------------------

4.) We can do the same for mo:Records. We can also use ORDER BY to order the results:

PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX hg: <http://www.holygoat.co.uk/owl/redwood/0.1/tags/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dce: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?pred ?class
WHERE {
	?record a mo:Record ;
		?pred ?obj .
	OPTIONAL {
		?obj a ?class .
	}
}
ORDER BY ASC(?pred) DESC(?class)

Result:

----------------------------------------------------------------------
| pred                                              | class          |
======================================================================
| dce:date                                          |                |
| dce:description                                   |                |
| dce:title                                         |                |
| mo:available_as                                   | mo:ED2K        |
| mo:available_as                                   | mo:Playlist    |
| mo:available_as                                   | mo:Torrent     |
| mo:image                                          |                |
| mo:track                                          | mo:Track       |
| hg:taggedWithTag                                  | hg:Tag         |
| <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> |                |
| foaf:maker                                        | mo:MusicArtist |
----------------------------------------------------------------------

5.) To work with dates, SPARQL expects them to be in the correct format (xsd:dateTime). Since the date format in our source data is slightly different, we need to change it from something like "2007-06-15 17:13:58" to "2007-06-15T17:13:58"^^xsd:dateTime. We are using CONSTRUCT, BIND and some functions to achieve this:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dce: <http://purl.org/dc/elements/1.1/>

CONSTRUCT {
	?s2 dce:date ?date_fixed .
}
WHERE {
	?s2 dce:date ?date .
	BIND (STRDT(REPLACE(?date, " ", "T"), xsd:dateTime) AS ?date_fixed)
}

Since we are using CONSTRUCT, the output is an RDF file which looks something like this:

@prefix dce:     <http://purl.org/dc/elements/1.1/> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .

<http://dbtune.org/jamendo/record/6015>
      dce:date      "2007-06-15T17:13:58"^^xsd:dateTime .

<http://dbtune.org/jamendo/record/6343>
      dce:date      "2007-07-05T19:32:23"^^xsd:dateTime .

<http://dbtune.org/jamendo/record/12863>
      dce:date      "2007-11-19T11:46:34"^^xsd:dateTime .

<http://dbtune.org/jamendo/record/3993>
      dce:date      "2007-02-01T18:08:28"^^xsd:dateTime .

6.) How could we get a subset of the source data without any dates? We could use this to complement the output of the previous query, and create a complete, fixed dataset. We use CONSTRUCT again, but apply a FILTER to remove all dates:

PREFIX dce: <http://purl.org/dc/elements/1.1/>

CONSTRUCT {
	?s ?p ?o .
}
WHERE {
	?s ?p ?o .
	FILTER (?p != dce:date)
}

7.) Now we are ready for the final query, giving us all albums with a release data of July 2007. We are again using a FILTER, as well as some special functions related to dates (YEAR and MONTH).

PREFIX mo:  <http://purl.org/ontology/mo/>
PREFIX dce: <http://purl.org/dc/elements/1.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?album ?title ?release_date
WHERE {
	?album a mo:Record ;
		dce:date ?release_date ;
		dce:title ?title .
	FILTER (YEAR(?release_date) = 2007 && MONTH(?release_date) = 7)
}
ORDER BY ?release_date

Result:

-----------------------------------------------------------------------------------------------------------
| album                                  | title                     | release_date                       |
===========================================================================================================
| <http://dbtune.org/jamendo/record/6310>| "Demo 2005"^^xsd:string   | "2007-07-04T16:24:23"^^xsd:dateTime|
| <http://dbtune.org/jamendo/record/6320>| "Cinema"^^xsd:string      | "2007-07-04T16:39:11"^^xsd:dateTime|
| <http://dbtune.org/jamendo/record/6312>| "ECLECTEK II"^^xsd:string | "2007-07-04T17:02:09"^^xsd:dateTime|
| <http://dbtune.org/jamendo/record/6302>| "Unknow user"^^xsd:string | "2007-07-04T18:06:18"^^xsd:dateTime|

Artists in France

1.) The following query gives us the artist’s locations, as specified in the Jamendo dataset:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo:   <http://purl.org/ontology/mo/>

SELECT DISTINCT ?location
WHERE {
	?artist a mo:MusicArtist ;
		foaf:based_near ?location .
}

Result:

--------------------------------------
| location                           |
======================================
| <http://sws.geonames.org/2661602/> |
| <http://sws.geonames.org/2997861/> |
| <http://sws.geonames.org/3013767/> |
| <http://sws.geonames.org/3013726/> |
| <http://sws.geonames.org/2802361/> |
| <http://sws.geonames.org/3026644/> |
| <http://sws.geonames.org/3017382/> |
| <http://sws.geonames.org/3865483/> |
| <http://sws.geonames.org/2968815/> |

2.) Problem: we don’t know what these URIs refer to - we need to involve a second dataset (the Geonames dataset). We can use the SERVICE keyword to query a remote SPARQL endpoint. For Geonames, there are several third-party endpoints available, e.g.:

We start exploring the dataset like we did before, only this time it is external data. The query gives us properties used for places in Geonames:

SELECT DISTINCT ?pred
WHERE {
	SERVICE <http://factforge.net/sparql>
	{
		<http://sws.geonames.org/2661602/> ?pred ?obj .
	}
}

Result:

-------------------------------------------------------------------------------------------------------------------------------
| pred                                                                                                                        |
===============================================================================================================================
| <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>                                                                           |
| <http://www.w3.org/2004/02/skos/core#subject>                                                                               |
| <http://purl.org/dc/elements/1.1/subject>                                                                                   |
| <http://www.geonames.org/ontology#alternateName>                                                                            |
| <http://www.geonames.org/ontology#name>                                                                                     |
| <http://www.geonames.org/ontology#childrenFeatures>                                                                         |
| <http://www.geonames.org/ontology#featureClass>                                                                             |
| <http://www.geonames.org/ontology#featureCode>                                                                              |
| <http://www.geonames.org/ontology#inCountry>                                                                                |
| <http://www.geonames.org/ontology#locationMap>                                                                              |
| <http://www.geonames.org/ontology#parentFeature>                                                                            |
| <http://sw.opencyc.org/concept/Mx8Ngh4rY_nOYBt-QdiS5seAT9e7DQ-raHR0cDovL3d3dy53My5vcmcvMjAwMy8wMS9nZW8vd2dzODRfcG9zI2xhdA>  |
| <http://www.w3.org/2003/01/geo/wgs84_pos#lat>                                                                               |
| <http://sw.opencyc.org/concept/Mx8Ngh4rY_nOYBt-QdiS5seAT9e7DQ-saHR0cDovL3d3dy53My5vcmcvMjAwMy8wMS9nZW8vd2dzODRfcG9zI2xvbmc> |
| <http://www.w3.org/2003/01/geo/wgs84_pos#long>                                                                              |
| <http://www.ontotext.com/preferredLabel>                                                                                    |
| <http://www.ontotext.com/textSnippet>                                                                                       |
| <http://factforge.net/preferredLabel>                                                                                       |
| <http://factforge.net/textSnippet>                                                                                          |
-------------------------------------------------------------------------------------------------------------------------------

3.) Now we combine local and remote data to get a list of all artists (from Jamendo) with the name their location (from Geonames). Note we need to LIMIT the result, otherwise the query might time out. That can be a typical problem with federated queries.

PREFIX foaf: 		<http://xmlns.com/foaf/0.1/>
PREFIX geonames: 	<http://www.geonames.org/ontology#>
PREFIX mo:   		<http://purl.org/ontology/mo/>

SELECT DISTINCT ?artist ?location_name
WHERE {
	?artist a mo:MusicArtist ;
		foaf:based_near ?location .
		
	SERVICE <http://factforge.net/sparql>
	{
		?location geonames:name ?location_name .
	}
		
}
LIMIT 50

Result:

-------------------------------------------------------------------------------------
| artist                                    | location_name                         |
=====================================================================================
| <http://dbtune.org/jamendo/artist/5496>   | "Kanton Basel-Stadt"                  |
| <http://dbtune.org/jamendo/artist/256>    | "Département de la Loire-Atlantique"  |
| <http://dbtune.org/jamendo/artist/337049> | "Département de la Haute-Garonne"     |
| <http://dbtune.org/jamendo/artist/8035>   | "Département des Hautes-Pyrénées"     |
| <http://dbtune.org/jamendo/artist/6260>   | "Kingdom of Belgium"                  |
| <http://dbtune.org/jamendo/artist/2471>   | "Département de la Charente-Maritime" |
| <http://dbtune.org/jamendo/artist/3572>   | "Republic of France"                  |

4.) We are interested in France - but which resource URI identifies France?

PREFIX geonames: <http://www.geonames.org/ontology#>

SELECT DISTINCT ?country ?name
WHERE {
	SERVICE <http://factforge.net/sparql>
	{
		SELECT DISTINCT ?country ?name
		WHERE {
			?country geonames:featureCode geonames:A.PCLI ;
				geonames:name ?name .
			FILTER (regex(?name, "france", "i"))
		}
	}
}
ORDER BY ?name

Result:

---------------------------------------------------------------------
| country                                    | name                 |
=====================================================================
| <http://dbpedia.org/resource/Belle_France> | "France"             |
| <http://sws.geonames.org/3017382/>         | "France"             |
| <http://dbpedia.org/resource/Belle_France> | "Republic of France" |
| <http://sws.geonames.org/3017382/>         | "Republic of France" |
---------------------------------------------------------------------

5.) Now we can use this URI (http://sws.geonames.org/3017382/) to query our Jamendo dataset for artists in France:

PREFIX foaf: 		<http://xmlns.com/foaf/0.1/>
PREFIX geonames: 	<http://www.geonames.org/ontology#>
PREFIX mo:   		<http://purl.org/ontology/mo/>

SELECT DISTINCT ?artist ?name
WHERE {
	?artist a mo:MusicArtist ;
		foaf:based_near <http://sws.geonames.org/3017382/> ;
		foaf:name ?name .
		
}
ORDER BY ?name

Result:

----------------------------------------------------------------------------------------------------------
| artist                                    | name                                                       |
==========================================================================================================
| <http://dbtune.org/jamendo/artist/3972>   | "#2 Orchestra"^^<http://www.w3.org/2001/XMLSchema#string>  |
| <http://dbtune.org/jamendo/artist/5865>   | "(own+line)"^^<http://www.w3.org/2001/XMLSchema#string>    |
| <http://dbtune.org/jamendo/artist/4779>   | "7BZH"^^<http://www.w3.org/2001/XMLSchema#string>          |
| <http://dbtune.org/jamendo/artist/3297>   | "AMarie.C"^^<http://www.w3.org/2001/XMLSchema#string>      |
| <http://dbtune.org/jamendo/artist/754>    | "Adam Lwo"^^<http://www.w3.org/2001/XMLSchema#string>      |
| <http://dbtune.org/jamendo/artist/4825>   | "Alteranti"^^<http://www.w3.org/2001/XMLSchema#string>     |

6.) The previous query was great, but only gives us half of the answer. We only get those artists which are explicitly located in France. But what about those which are defined as being located in Paris or Lyon? To answer this, we need to ask Geonames about places which have a geonames:parentCountry of France (i.e., places which are located in France).

Note: this does not work on the Factforge endpoint. We need to use the Kasabi endpoint, which, however, requires us to use a valid API key. That is only possible in the latest version of arq, as built from source. :( The API key shown here is for the latc-tutorial user. You should ideally create your own account if you want to use Kasabi.

PREFIX foaf: 		<http://xmlns.com/foaf/0.1/>
PREFIX geonames: 	<http://www.geonames.org/ontology#>
PREFIX mo:   		<http://purl.org/ontology/mo/>

SELECT DISTINCT ?artist ?name ?place ?place_name
WHERE {
	?artist a mo:MusicArtist ;
		foaf:based_near ?place ;
		foaf:name ?name .
	
	SERVICE <http://api.kasabi.com/dataset/geonames/apis/sparql?apikey=07d5e23c8d7bf5ae576f39c64e822ea6baffea98>
	{
		?place geonames:parentCountry <http://sws.geonames.org/3017382/> ;
			geonames:name ?place_name .
	}
		
}
ORDER BY ?name
LIMIT 10

Result:

This works in theory, but it’s so slow that it’s actually impractical.

7.) To make our approach workable, we extract the relevant portion of the Geonames data and store it locally:

PREFIX geonames: 	<http://www.geonames.org/ontology#>

CONSTRUCT {
	?place geonames:parentCountry <http://sws.geonames.org/3017382/> ;
		geonames:name ?name .
}
WHERE {
	SERVICE <http://api.kasabi.com/dataset/geonames/apis/sparql?apikey=07d5e23c8d7bf5ae576f39c64e822ea6baffea98>
	{
		?place geonames:parentCountry <http://sws.geonames.org/3017382/> ;
			geonames:name ?name .
	}		
}

Result:

@prefix geonames:  <http://www.geonames.org/ontology#> .

<http://sws.geonames.org/3003977/>
      geonames:name  "Le Grand Merimont" ;
      geonames:parentCountry
                    <http://sws.geonames.org/3017382/> .

<http://sws.geonames.org/2994812/>
      geonames:name  "Mean-Ruz" ;
      geonames:parentCountry
                    <http://sws.geonames.org/3017382/> .

<http://sws.geonames.org/7535264/>
      geonames:name  "Golf du Prieuré" ;
      geonames:parentCountry
                    <http://sws.geonames.org/3017382/> .

<http://sws.geonames.org/3030256/>
      geonames:name  "Bois de Bretisel" ;
      geonames:parentCountry
                    <http://sws.geonames.org/3017382/> .

8.) Now we have all the pieces of the puzzles. We can query the our existing Jamendo data together with the Geonames slice created in the previous query. To load more than one datasets, we use the FROM keyword, pointing to the URI of the dataset file (local or remote).

PREFIX foaf: 		<http://xmlns.com/foaf/0.1/>
PREFIX geonames: 	<http://www.geonames.org/ontology#>
PREFIX mo:   		<http://purl.org/ontology/mo/>

SELECT DISTINCT ?artist_name ?place_name
FROM <../data/jamendo-rdf/jamendo.nt>
FROM <../data/in_france.nt>
WHERE {
 	?artist a mo:MusicArtist ;
 		foaf:based_near ?place ;
 		foaf:name ?artist_name .
	?place geonames:parentCountry <http://sws.geonames.org/3017382/> ;
		geonames:name ?place_name .
}
ORDER BY ?artist_name

Result:

-------------------------------------------------------------------------------------------------------------------------------------
| artist_name                                                                           | place_name                                |
=====================================================================================================================================
| "#2 Orchestra"^^<http://www.w3.org/2001/XMLSchema#string>                             | "Republic of France"                      |
| "#Blockout"^^<http://www.w3.org/2001/XMLSchema#string>                                | "Département des Yvelines"                |
| "#Dance 75#"^^<http://www.w3.org/2001/XMLSchema#string>                               | "Paris"                                   |
| "#NarNaoud#"^^<http://www.w3.org/2001/XMLSchema#string>                               | "Département de la Gironde"               |
| "#ZedMeta#"^^<http://www.w3.org/2001/XMLSchema#string>                                | "Paris"                                   |
| "#Zorglups#"^^<http://www.w3.org/2001/XMLSchema#string>                               | "Département de l'Essonne"                |
| "&ND"^^<http://www.w3.org/2001/XMLSchema#string>                                      | "Paris"                                   |
| "(own+line)"^^<http://www.w3.org/2001/XMLSchema#string>                               | "Republic of France"                      |
| "* Q u i r y *"^^<http://www.w3.org/2001/XMLSchema#string>                            | "Paris"                                   |
| "-;~°§)[ k.ROCKSHIRE ](§°~;-"^^<http://www.w3.org/2001/XMLSchema#string>              | "Département des Bouches-du-Rhône"        |

Create your own Queries

Now you can try to create variations of the queries above to answer some slightly different questions:

Solutions are here.