The WIQA Engine: API Overview

Chris Bizer (
Richard Cyganiak (
$Id: index.html,v 1.6 2006/10/12 14:07:03 bizer Exp $


This document provides an application programmer's overview of the WIQA Engine. It is targeted at developers who want to integrate its filtering capabilities into their own applications. This is a high-level overview with links to the Javadoc API documentation. A separate document describes the engine's internal architecture and implementation.

Table of Contents

1. What is it?

The WIQA engine is a library for filtering Semantic Web information. The input is a set of RDF graphs, each named with a URI. The output is an RDF graph that contains only those triples from the source graphs that match a certain policy. Policies are defined using the WIQA-PL policy language (@@@ Link). In addition to the filtering, the engine can generate textual explanation objects that explain to a user why a certain triple was accepted into the output graph.

The WIQA engine's API is the package de.fuberlin.wiwiss.wiqa. There are various subpackages that users of the engine normally don't have to interact with.

This diagram shows the main components of the engine's public API.

UML diagram showing the WIQA engine's main components

2. Installation

Download the WIQA archive from the website. Place wiqa.jar and all the jar files from the /lib folder onto the classpath.

WIQA ships with these libraries:

We didn't test WIQA with other versions of these libraries.

3. Providing input graphs

Input graphs must be provided to the engine as a NG4J NamedGraphSet or as an ARQ Dataset. NG4J is the Named Graphs API for Jena and developed by the same team as WIQA. ARQ is the SPARQL engine included with HP Labs' Jena framework. All of these components are included in the WIQA download.

The following example loads a set of named graphs from a TRIG file:

NamedGraphSet source = new NamedGraphSetImpl();"file:source_graphs.trig", "TRIG");

NG4J's NamedGraphSet also provides facilities for loading RDF graphs from the web.

4. Managing policies

WIQA policies define which information should be accepted by the engine and which should be rejected. Policies are written in the WIQA-PL policy language, which is loosely based on the SPARQL query language.

Policies are written down in policy files. You are free to create one file per policy or manage your entire policy suite in one single file. Policies are loaded into WIQA using the PolicyParser object, which offers methods for both approaches:

Policy policy = PolicyParser.parsePolicyFromFile("accept_everything.wiqa");
List policies = PolicyParser.parseSuiteFromFile("financial_policies.wiqa");

The PolicyParser also has methods for reading from strings or readers.

5. Setting up the engine

With an input graph set and a policy in place, we can set up the engine. The engine is hidden behind the AcceptedGraph class. An AcceptedGraph is a Jena Graph that shows just the triples matching a policy.

AcceptedGraphs are created through the AcceptedGraphFactory class.

AcceptedGraphFactory engine = new AcceptedGraphFactory(source);
AcceptedGraph graph = engine.createAcceptedGraph(policy);

The engine also has a facility for passing user defined variables into policies:

engine.setContextVariable("USER", Node.createURI(""));

The URI <> will then be available as the value of the variable ?USER within all WIQA-PL policies.

6. Querying the filtered graph

AcceptedGraph is an implementation of Jena's Graph interface and can be queried using that interface's methods. Many developers prefer to work with Jena's Model API instead. To wrap the graph into a model, use Jena's ModelFactory:

Model filteredModel = ModelFactory.createModelForGraph(graph);

Many components of the Jena framework accept Models as input.

Note that we didn't provide access to the explanation feature through Jena's Model API. You have to retain the AcceptedGraph object to work with explanations.

7. Generating explanations

To generate an explanation for why a triple was accepted, use the AcceptedGraph's explain(Triple) method. This returns an Explanation object. Note that this only works for triples that have been previously returned by the graph's find method.

Explanations are cached inside the AcceptedGraph. This will consume memory over time. The clearCachedExplanations method discards all cached explanations. After calling the method, the cache is empty and no explanations are available. The next call to the find method will start to fill the cache again.

7.1. The Explanation object

Explanations are structured as a tree of text fragments. The Explanation object represents the root of the tree and has no associated text fragment; the other nodes are ExplanationParts and all have associated text fragments. The text fragment consists of a list of RDF Nodes. Most of these are string literals, but some may be URIs or blank nodes that represent some entity involved in the explanation. Applications may make the URIs clickable or retrieve an appropriate label from the RDF data.

The Explanation class also has a toRDF() method. It converts the object to an RDF representation that uses the EXPL vocabulary.

7.2. Graph explanations

In addition to the textual explanation described above, some policies will also have the ability to generate graph explanations. These are simply RDF graphs that contain information about the filtering process in a vocabulary chosen by the policy. To retrieve a triple's graph explanation, use the AcceptedGraph's explainAsGraph(Triple) method. Again, the triple must have been a query result, and graph explanations are cached too.

8. Extending the WIQA engine

Like SPARQL, the WIQA-PL language can be extended by adding custom functions. These can range from very simple functions like those already found in SPARQL, to complex trust metrics that include external data sources into the evaluation process.

WIQA is based on the ARQ query engine. WIQA extensions use two of ARQ's extension points: ARQ custom functions and ARQ custom extensions. The difference is explained below. More information is a available int the ARQ documentation.

Custom functions and extensions are implemented as a Java class and must be named with a URI.

Custom functions and extensions must be registered with ARQ's FunctionRegistry and ExtensionRegistry, respectively, before they can be used in WIQA policies:

FunctionRegistry.get().put("", FooFunction.class);
ExtensionRegistry.get().put("", BarExtension.class);

Note: Functions and extensions must be registered before any policy using them is parsed.

Functions and extensions have access to the unfiltered source dataset and can use it to include additional information in the evaluation process. Within the Function or Extension implementation, use this code to retrieve the dataset:

DatasetGraph source = getContext().getDataset();

Functions and extensions can generate explanations. After creating an appropriate Explanation instance, this code can be used to hand it back to the engine:

getContext().getContext().put(WIQAQueryEngine.EXPLANATION, explanation);

8.1. Functions

The input of an ARQ function is a single variable binding, and a number of arguments which are RDF nodes. The output is a boolean value. If it is false, the binding, and the triple represented by its ?SUBJ, ?PRED and ?OBJ variables, will not be accepted. A typical example in the context of WIQA might be a function whose single argument represents a person or organisation, and the function returns true if and only if that agent is found in a non-RDF list of trusted information sources.

To create a new function, ARQ's com.hp.hpl.jena.query.function.Function interface must be implemented. For convenience, ARQ provides several abstract implementations to choose from. WIQA also provides one, the ExplainableFunction, which has convenience methods getDataset() and returnExplanation(ExplanationPart) that wrap the code provided above.

The ExplainableFunction must implement the function exec:

public abstract NodeValue exec(List args);

args is a list of ARQ NodeValue objects representing the arguments passed to the function. The expected return value is usually NodeValue.TRUE or NodeValue.FALSE.

8.2. Extensions

The input of an ARQ extension is a stream of variable bindings, and a number of arguments which are RDF nodes. The output is a modified version of the input stream. This flexibility can be used for simple modifications, like rejecting some of the bindings, or for complex operations that totally change the result stream. A typical example in the context of WIQA might be a function that ranks the bindings according to some metric, e.g. the trustworthiness of the ?source variable, and accepts only the top five bindings. This couldn't be done as a function because the algorithm has to look at several bindings at once.

To create a new extension, ARQ's com.hp.hpl.jena.query.extension.Extension interface must be implemented. For convenience, ARQ provides several abstract implementations to choose from. WIQA also provides one, the ExplainableExtension, which has convenience methods getDataset() and returnExplanation(ExplanationPart) that wrap the code provided above.

The ExplainableExtension must implement the functions exec and finish:

public abstract QueryIterator exec(List args, Binding binding);
public abstract QueryIterator finish(List args);

The function exec is called once for every input binding. Finish is called after the last input binding. args is a list of ARQ NodeValue objects representing the arguments passed to the function. Variables and expressions in the call are already evaluated against the input binding. The expected return values are iterators of output bindings. An extension can choose when it returns its output bindings, it might return some in each exec call, or all in the finish call.

The ARQ classes QueryIterPlainWrapper, QueryIterSingleton, and QueryIterNullIterator might be useful as return values.

9. Change log

$Log: index.html,v $
Revision 1.6  2006/10/12 14:07:03  bizer
Editorial improvements; links to other parts of the documentation

Revision 1.5  2006/06/12 21:07:03  cyganiak
Editorial improvements; more information about implementing Function and Extension

Revision 1.4  2006/05/16 22:07:06  cyganiak
Added note about registration/parsing order

Revision 1.3  2006/04/26 21:37:55  cyganiak
Small documentation improvements

Revision 1.2  2006/04/19 15:45:21  cyganiak
Added overview UML diagram to API doc

Revision 1.1  2006/04/15 15:50:23  cyganiak
First version of API overview documentation