Example

The following example demonstrates the publication and consumption process for the Community Research and Development Information Service (CORDIS).

The example makes use of the following tools:

Prerequisites:

  • The CORDIS data set is originally available in a relational database.

Outline of the step by step publication process:

  1. Model the ontology for CORDIS using Neologism.
  2. Publish the data set as Linked Data using D2R Server. Map the database schema to RDF using the D2RQ Mapping Language.
  3. Find Linked Data sets that thematically overlap with CORDIS and that can be linked to, e.g. on CKAN.
  4. Use the Silk Link Discovery Framework to interlink the data sets and publish the links along with the data set.

Each publication step will be described in detail. The Linked Data version of CORDIS can be consumed in different ways. We will show how to set up RelFinder to explore relations between entities in the CORDIS Linked Data set.

Prerequisites

The CORDIS data set is originally available in a relational database with the following database schema:

Neologism: Modeling the CORDIS ontology

Set up Neologism as described here.

Login to your Neologism instance.

Create a new vocabulary for CORDIS (Create content > Vocabulary).

Provide at least the vocabulary ID, title, namespace URI, authors and description of the vocabulary.

For CORDIS provide e.g. the following parameters:

  • Vocabulary ID: cordis
  • Title: Community Research and Development Information Service (CORDIS)
  • Description: The CORDIS ontology represents the Community Research and Development Information Service (CORDIS) data set published by the European Union. The data set contains information on EU programmes and projects.
  • Namespace URI: http://www4.wiwiss.fu-berlin.de/cordis/resource/cordis/

Then save the vocabulary.

Create classes for each concept in the data set (Create new class button in vocabulary view).

For CORDIS these are e.g.:

  • Project
  • Programme
  • Organization
  • Person

When adding a class, provide an URI and a label. Furthermore, a description of the class can be added and super classes can be chosen.

For the class Project provide the following parameters:

  • URI: Project (the namespace is already given by the vocabulary namespace)
  • Label: project
  • Comment: A project is a collaborative enterprise involving several project partners.

Then add properties to your classes (Create new property button in vocabulary view). These are class relations as well as class attributes.

An example for a property is the relation between projects and programmes. Therefore define the programme property with the domain cordis:Project and the range cordis:Programme.

Make sure to reuse vocabularies whenever possible. For links to websites containing more information on a concept, use foaf:page instead of defining your own website property.

When the concepts and their properties of the database (or data set) are modelled you can see them in the overview diagram:

You then can save the modelled vocabulary as N3 or RDF/XML (N3 or RDF/XML icons in vocabulary view).

Part of the CORDIS vocabulary in N3:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix vann: <http://purl.org/vocab/vann/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix cordis: <http://www4.wiwiss.fu-berlin.de/cordis/resource/cordis/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://vocab.deri.ie/cordis> a owl:Ontology;
    dc:title "Community Research and Development Information Service (CORDIS)";
    vann:preferredNamespaceUri "http://www4.wiwiss.fu-berlin.de/cordis/resource/cordis/";
    vann:preferredNamespacePrefix "cordis";
    dc:creator <http://vocab.deri.ie/cordis#anjjen> .

<http://vocab.deri.ie/cordis#anjjen> a foaf:Person;
    foaf:name "Anja Jentzsch";
    foaf:mbox <mailto:mail@anjajentzsch.de> .

<http://vocab.deri.ie/cordis#FU%20Berlin> a foaf:Organization;
    foaf:member <http://vocab.deri.ie/cordis#anjjen>;
    foaf:name "FU Berlin";
    foaf:homepage <http://www.fu-berlin.de/> .

cordis:Project a rdfs:Class, owl:Class;
    rdfs:isDefinedBy <http://vocab.deri.ie/cordis>;
    rdfs:label "Project" .

cordis:Programme a rdfs:Class, owl:Class;
    rdfs:isDefinedBy <http://vocab.deri.ie/cordis>;
    rdfs:label "Programme" .

cordis:programme a rdf:Property;
    rdfs:isDefinedBy <http://vocab.deri.ie/cordis>;
    rdfs:label "programme";
    rdfs:domain cordis:Project;
    rdfs:range cordis:Programme .

...

D2R Server: Publishing the CORDIS data set as Linked Data

Set up D2R Server as described in here until you reach step 3.

Write a D2RQ mapping file to map your database to the previously defined ontology.

The following excerpt from the CORDIS D2RQ map defines prefixes, the server and database properties as well as the mappings for the cordis:Project and cordis:Programme classes and the ontology property cordis:programme which connects a project with a programme.

@prefix map: <file:/C:/Cordis/cordis.n3#> .
@prefix cordis: <http://www4.wiwiss.fu-berlin.de/cordis/resource/cordis/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
@prefix d2r: <http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/config.rdf#> .
@prefix vocabClass: <http://www4.wiwiss.fu-berlin.de/cordis/vocab/resource/class/> .
@prefix vocabProperty: <http://www4.wiwiss.fu-berlin.de/cordis/vocab/resource/property/> .

map:Server a d2r:Server;
    rdfs:label "D2R Server publishing the CORDIS data set";
    d2r:baseURI <http://www4.wiwiss.fu-berlin.de/cordis/>;
    d2r:port 2038;
    .

map:database a d2rq:Database;
    d2rq:jdbcDriver "com.mysql.jdbc.Driver";
    d2rq:jdbcDSN "jdbc:mysql://127.0.0.1/cordis?autoReconnect=true";
    d2rq:username "d2r";
    .

map:Projects a d2rq:ClassMap;
    d2rq:dataStorage map:database;
    d2rq:class cordis:Project;
    d2rq:uriPattern "Project/@@projects.projectsref@@";
    d2rq:classDefinitionLabel "project"@en;
    .

map:title a d2rq:PropertyBridge;
    d2rq:belongsToClassMap map:Projects;
    d2rq:column "projects.title";
    d2rq:property rdfs:label;
    .

...

map:project2programme a d2rq:PropertyBridge;
    d2rq:belongsToClassMap map:Projects;
    d2rq:property cordis:programme;
    d2rq:propertyDefinitionLabel "Programme Acronym"@en;
    d2rq:uriPattern "Programme/@@projects.programmeacronym|urlify@@";
    d2rq:join "programmes.acronym = projects.programmeacronym"
    .

map:Programmes a d2rq:ClassMap;
    d2rq:dataStorage map:database;
    d2rq:class cordis:Programme;
    d2rq:uriPattern "Programme/@@programmes.acronym|urlify@@";
    d2rq:classDefinitionLabel "Programme"@en;
    .

...

Save the D2RQ file as cordis.n3 to your D2R Server directory.

Then test the mapping by starting the server from inside this directory:

d2r-server.bat cordis.n3

You should see the server running at http://localhost:2038 now.

If the mapping is complete, you can set up D2R Server as a service. On Windows use the install-service script:

install-service cordis cordis.n3

Find the example D2R Server publishing the CORDIS data set as Linked Data online.

CKAN: Find thematically overlapping Linked Data sets

Even if you already know Linked Data sets CORDIS can be interlinked with, you should explore the data sets on CKAN.

As an example we choose DBpedia as a link target. Since DBpedia covers a lot of domains, we can interlink many entities in CORDIS and DBpedia. Suitable entity types for interlinking are e.g.: EU projects, EU programmes, organizations, countries, persons.

Silk Link Discovery Framework: Interlinking the CORDIS Linked Data set

Set up Silk Single Machine as described here.

As an example we write a Silk link specification file for EU projects in CORDIS and DBpedia. In both data sets we restrict the data set to EU projects with the element. We then require the label or acronym to match as well as either the website or EU project reference number. The minimum similarity of two data items which is required to generate a link between them is set to 95% in the element, while only links between items with a similarity higher than 98% is written to the resulting link set, while the links below 98% are written to a verify file.

The following listing is the resulting Silk link specfication:

<?xml version="1.0" encoding="utf-8" ?>
<Silk>
  <Prefixes>	
...
  </Prefixes>

  <DataSources>
    <DataSource id="dbpedia" type="sparqlEndpoint">
      <Param name="endpointURI" value="http://dbpedia.org/sparql" />
      <Param name="graph" value="http://dbpedia.org" />
    </DataSource>

    <DataSource id="cordis" type="sparqlEndpoint">
      <Param name="endpointURI" value="http://www4.wiwiss.fu-berlin.de/cordis/sparql" />
    </DataSource>
  </DataSources>

  <Interlinks>
    <Interlink id="projects">
      <LinkType>owl:sameAs</LinkType>

      <SourceDataset dataSource="dbpedia" var="a">
        <RestrictTo>
          ?a dbpedia-prop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Research-Project>
        </RestrictTo>
      </SourceDataset>

      <TargetDataset dataSource="cordis" var="b">
        <RestrictTo>
          ?b rdf:type cordis:Project
        </RestrictTo>
      </TargetDataset>

      <LinkCondition>
        <Aggregate type="average">
          <Aggregate type="max">
            <Compare metric="levenshtein">
              <TransformInput function="lowerCase">
                <Input path="?a/rdfs:label" />
              </TransformInput>
              <TransformInput function="lowerCase">
                <Input path="?b/rdfs:label" />
              </TransformInput>
            </Compare>
            <Compare metric="levenshtein">
              <TransformInput function="lowerCase">
                <TransformInput function="replace">
                  <TransformInput function="stripUriPrefix">
                    <Input path="?a\dbpedia-prop:redirect" />
                  </TransformInput>
                  <Param name="search" value="_" />
                  <Param name="replace" value=" " />
                </TransformInput>
              </TransformInput>
              <TransformInput function="lowerCase">
                <Input path="?b/rdfs:label" />
              </TransformInput>
            </Compare>
            <Compare metric="levenshtein">
              <TransformInput function="lowerCase">
                <Input path="?a/dbpedia-prop:title" />
              </TransformInput>
              <TransformInput function="lowerCase">
                <Input path="?b/rdfs:label" />
              </TransformInput>
            </Compare>
            <Compare metric="levenshtein">
              <TransformInput function="lowerCase">
                <Input path="?a/rdfs:label" />
              </TransformInput>
              <TransformInput function="lowerCase">
                <Input path="?b/cordis:acronym" />
              </TransformInput>
            </Compare>
            <Compare metric="levenshtein">
              <TransformInput function="lowerCase">
                <TransformInput function="replace">
                  <TransformInput function="stripUriPrefix">
                    <Input path="?a\dbpedia-prop:redirect" />
                  </TransformInput>
                  <Param name="search" value="_" />
                  <Param name="replace" value=" " />
                </TransformInput>
              </TransformInput>
              <TransformInput function="lowerCase">
                <Input path="?b/cordis:acronym" />
              </TransformInput>
            </Compare>
            <Compare metric="levenshtein">
              <TransformInput function="lowerCase">
                <Input path="?a/dbpedia-prop:title" />
              </TransformInput>
              <TransformInput function="lowerCase">
                <Input path="?b/cordis:acronym" />
              </TransformInput>
            </Compare>
          </Aggregate>
          <Aggregate type="max">
            <Compare metric="equality">
              <TransformInput function="stripPostfix">
                <Input path="?a/dbpedia-prop:website" />
                <Param name="postfix" value="/" />
              </TransformInput>
              <TransformInput function="stripPostfix">
                <Input path="?b/foaf:page" />
                <Param name="postfix" value="/" />
              </TransformInput>
            </Compare>
            <Compare metric="equality">
              <TransformInput function="regexReplace">
                <Input path="?a/dbpedia-prop:projectreference" />
                <Param name="regex" value="^([^-])*-*0*([0-9]*)\s*$" />
                <Param name="replace" value="$2" />
              </TransformInput>
              <TransformInput function="regexReplace">
                <Input path="?b/cordis:reference" />
                <Param name="regex" value="^0*" />
                <Param name="replace" value="" />
              </TransformInput>
            </Compare>
            <Compare metric="equality">
              <Input path="?a/dbpedia-prop:projectreference" />
              <Input path="?b/cordis:reference" />
            </Compare>
          </Aggregate>
        </Aggregate>
      </LinkCondition>

      <Filter threshold="0.95" />

      <Outputs>
        <Output maxConfidence="0.98" type="file" >
          <Param name="file" value="cordis_dbpedia_projects_verify_links.xml"/>
          <Param name="format" value="alignment"/>
        </Output>
        <Output minConfidence="0.98" type="file">
          <Param name="file" value="cordis_dbpedia_projects_links.xml"/>
          <Param name="format" value="ntriples"/>
        </Output>
      </Outputs>
    </Interlink>
  </Interlinks>
</Silk>

Save the link specification as silk_cordis_dbpedia.xml. Then got to the directory where you put the Silk Single Machine jar file and run:

java -DconfigFile=silk_cordis_dbpedia.xml -jar silk.jar

The resulting 17 links are written into the cordis_dbpedia_projects_links.xml file:

<http://dbpedia.org/resource/DAIDALOS>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/71242> .
<http://dbpedia.org/resource/SeCSE>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/72081> .
<http://dbpedia.org/resource/IMAQUANIM>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/75943> .
<http://dbpedia.org/resource/SALERO>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/79378> .
<http://dbpedia.org/resource/Stasis_%28EU_project%29>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/79477> .
<http://dbpedia.org/resource/BEinGRID>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/79512> .
<http://dbpedia.org/resource/SUPER>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/79373> .
<http://dbpedia.org/resource/AssessGrid>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/79340> .
<http://dbpedia.org/resource/EMANICS>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/80625> .
<http://dbpedia.org/resource/DAIDALOS>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/80687> .
<http://dbpedia.org/resource/UNIC_%E2%80%93_UNIversal_satellite_home_Connection>
  <http://www.w3.org/2002/07/owl#sameAs>  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/80627> .
<http://dbpedia.org/resource/RESERVOIR>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/85304> .
<http://dbpedia.org/resource/TREAT-NMD>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/84926> .
<http://dbpedia.org/resource/SOA4All>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/85536> .
<http://dbpedia.org/resource/SecureChange>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/89030> .
<http://dbpedia.org/resource/ONTORULE>  <http://www.w3.org/2002/07/owl#sameAs> 
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/89260> .
<http://dbpedia.org/resource/PRoVisG>  <http://www.w3.org/2002/07/owl#sameAs>
  <http://www4.wiwiss.fu-berlin.de/cordis/resource/Project/89375> .
Import the links into a database table (e.g. cordis_dbpedia_projects) by seperating source and target into different columns and add the according property mapping to your D2RQ map:
map:ProjectDBpediaLink a d2rq:PropertyBridge;
	d2rq:belongsToClassMap map:Projects;
	d2rq:property owl:sameAs;
	d2rq:propertyDefinitionLabel "DBpedia link"@en;
	d2rq:join "projects.projectsref = cordis_dbpedia_projects.project_id";
	d2rq:uriColumn "cordis_dbpedia_projects.dbpedia_url";
	.

Your D2R Server for CORDIS now also serves the links for EU projects in DBpedia as Linked Data.

RelFinder: Explore the CORDIS Linked Data set

For exploring the CORDIS Linked Data set we use RelFinder. Instead of using the showcase version as described here, we set up a separate RelFinder version especially for CORDIS. The instructions on setting up the RelFinder are online at http://relfinder.dbpedia.org/integrating.html.

Try your RelFinder version by pointing your web browser to the URL where you set up RelFinder. Then add 2 to n entity names into the RelFinder between form. Then define the maximum path length between these entities and click the Find Relations button.

You can find the RelfFinder for CORDIS online at: http://www4.wiwiss.fu-berlin.de/cordis/relfinder/RelFinder.swf.