Richard Cyganiak
Chris Bizer

Pubby can be used to add Linked Data interfaces to SPARQL endpoints.

Much Semantic Web data lives inside triple stores and can be accessed only by sending SPARQL queries to a SPARQL endpoint. It is hard to connect information in these stores with other external data sources.

Linked Data is a style of publishing data on the Semantic Web that makes it easy to interlink, discover and consume data on the Semantic Web. It allows a wide variety of existing RDF browsers (e.g. Disco, Tabulator, OpenLink Browser), RDF crawlers (e.g. SWSE, Swoogle), and query agents (e.g. SemWeb Client Library, SWIC) to access the data.

Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. It is implemented as a Java web application.

News

Features

Screenshot of a Pubby web page displaying information from the DBpedia SPARQL endpoint

How It Works

Many triple stores and other SPARQL endpoints can be accessed only by SPARQL client applications that use the SPARQL protocol. It cannot be accessed by the growing variety of Linked Data clients. Pubby is designed to provide a Linked Data interface to those RDF data sources.

Pubby architecture diagram

In RDF, resources are identified by URIs. The URIs used in most SPARQL dataset are not dereferenceable, meaning they cannot be accessed in a Semantic Web browser, but return 404 Not Found errors instead, or use non-dereferenceable URI schemes, as in the fictional URI tag:dbpedia.org,2007:Berlin.

When setting up a Pubby server for a SPARQL endpoint, you will configure a mapping that translates those URIs to dereferenceable URIs handled by Pubby. If your server is running at http://myserver.org:8080/pubby/, then the Berlin URI above might be mapped to http://myserver.org:8080/pubby/Berlin.

Pubby will handle requests to the mapped URIs by connecting to the SPARQL endpoint, asking it for information about the original URI, and passing back the results to the client. It also handles various details of the HTTP interaction, such as the 303 redirect required by Web Architecture, and content negotiation between HTML, RDF/XML and Turtle descriptions of the same resource.

Download and Installation

  1. Download Pubby Current version: v0.3.3 (alpha), released 2011-01-26
  2. If you haven't already, download and install a servlet container. Pubby has been tested with Tomcat and Jetty. I will assume your server is set up to run at http://myserver/.
  3. Unzip the Pubby distribution and copy the webapp directory into the servlet container's webapps folder. If Pubby is the only web application you want to run in the container, then rename the webapp directory to root. Otherwise, rename it to something like mydataset. This will change the Pubby root to http://myserver/mydataset/.
  4. Modify the configuration file to suit your needs. It is located within Pubby's webapp directory, at /WEB-INF/config.ttl. See the next section for a list of supported configuration directives.

Configuration

The Pubby configuration file uses Turtle syntax. It typically starts with some boilerplate prefix declarations, followed by a server configuration section, and one or more dataset configuration sections:

<> a conf:Configuration;
    conf:option1 value1;
    conf:option2 value2;
    (...)
    conf:dataset [
        conf:option1 value1;
        conf:option2 value2;
    ];
    .

There is an example configuration file.

Note that punctuation is significant, e.g. URIs are always enclosed in angle brackets, while literal values are enclosed in quotes. All directives are optional unless otherwise noted.

Server Configuration Section

Below is a list of all supported directives for the server configuration section.

conf:projectName "Project Name";

The name of the project, for display in page titles.

conf:projectHomepage <project_homepage_url.html>;

A project homepage or similar URL, for linking in page titles.

conf:webBase <server_base_uri>;

Required. The root URL where the Pubby web application is installed, e.g. http://myserver/mydataset/.

conf:labelProperty ex:property1, ex:property2, ...;

The value of these RDF properties, if present in the dataset, will be used as labels and page titles for resources. Defaults to rdfs:label, dc:title, foaf:name.

conf:commentProperty ex:property1, ex:property2, ...;

The value of these RDF properties, if present in the dataset, will be used as a short textual description of the item. Defaults to rdfs:comment, dc:description.

conf:imageProperty ex:property1, ex:property2, ...;

The value of these RDF properties, if present in the dataset, will be used as an image URL to show a depiction of the item. Defaults to foaf:depiction.

conf:usePrefixesFrom <file.rdf>;

Links to an RDF document whose prefix declarations will be used in output. Defaults to the empty URL, which means the prefixes from the configuration file will be used.

conf:defaultLanguage "en";

If labels and comments in multiple languages are present (using different language tags on RDF literals), then this language will be preferred. Defaults to "en".

conf:indexResource <dataset_uri>;

The URI of a resource whose description will be displayed as the home page of the Pubby installation. Note that you have to specify a dataset URI, not a mapped web URI.

conf:dataset [ ... ];

Required. Introduces a dataset configuration section. There can be one or more dataset sections.

Dataset Configuration Section

Below is a list of all supported directives for the server configuration section.

conf:sparqlEndpoint <sparql_endpoint_url>;

Required. The URL of the SPARQL endpoint whose data we want to expose.

conf:sparqlDefaultGraph <sparql_default_graph_name>;

If the data of interest is not located in the SPARQL dataset's default graph, but within a named graph, then its name must be specified here.

conf:datasetBase <dataset_uri_prefix>;

Required. The common URI prefix of the resource identifiers in the SPARQL dataset; only resources with this prefix will be mapped and made available by Pubby.

conf:datasetURIPattern "regular expression";

If present, only dateset URIs matching this Java-style regular expression will be mapped and made available by Pubby. The regular expression must match everything after the datasetBase part of the URI.

conf:datasetBase <http://example.org/>;
conf:datasetURIPattern "(users|documents)/.*";

This example configuration will publish the dataset URI http://example.org/users/alice, but not http://example.org/invoices/5395842 because the URI part invoices/5395842 does not match the regular expression.

conf:addSameAsStatements "true"/"false";

If set to "true", an owl:sameAs statement of the form <web_uri> owl:sameAs <dataset_uri> will be present in Linked Data output.

conf:loadRDF <data1.rdf>, <data1.rdf>, ...;

Load one or more RDF documents from the Web or the file system and use them as the data source. The SPARQL endpoint configured above will be ignored. Allows using Pubby as an RDF server for publishing static RDF files.

conf:rdfDocumentMetadata [ statement1; statement2; ...; ];

All statements inside a conf:rdfDocumentMetadata block will be added as document metadata to the RDF documents published for this dataset. This feature can be used for instance to add licensing information to your published documents.

conf:rdfDocumentMetadata [
    dc:publisher <http://richard.cyganiak.de/foaf.rdf#cygri>;
];
conf:metadataTemplate "metadata.ttl";

Refers to a metadata template that is used by the metadata extension. This file is expected in directory ./WEB-INF/templates/.

conf:webResourcePrefix "uri_prefix/";

If present, this string will be prefixed to the mapped web URIs. This is useful if you have to avoid potential name clashes with URIs already used by the server itself. For example, if the dataset includes a URI http://mydataset/page, and the dataset prefix is http://mydataset/, then there would be a clash after mapping because Pubby reserves the mapped URI http://myserver/mydataset/page for its own use. In this case, you may specify a prefix like "resource/", which will result in a mapped URI of http://myserver/mydataset/resource/page.

conf:fixUnescapedCharacters "abc";

(Only needed if you have problems with funny characters in the URIs when running Pubby behind an Apache proxy)

conf:redirectRDFRequestsToEndpoint "true"/"false";

Instead of serving RDF documents, Pubby will redirect requests for RDF to DESCRIBE query results on the SPARQL server. This reduces Pubby's job to serving HTML descriptions of resources. All features that affect the RDF output will have no effect, e.g. URI rewriting and adding of owl:same statements won't work. This is useful to improve performance in cases where the SPARQL dataset has been designed with Pubby publication in mind.

Limitations

Support and feedback

Please email richard@cyganiak.de.

Source code and development

Pubby is open source (Apache License, Version 2.0). Pubby is hosted on GitHub. The official version of the source code is available from the cygri/pubby repository.

Acknowledgements

This project has received contributions from Olaf Hartig and Boris Villazón-Terrazas.