Chris Bizer, Freie Universität Berlin
Richard Cyganiak, Freie Universität Berlin
Oliver Maresch, Technische Universität Berlin
Tobias Gauss, Freie Universität Berlin

The WIQA - Web Information Quality Assessment Framework

Information providers on the Web have different levels of knowledge, different views of the world and different intensions. Thus, provided information may be wrong, biased, inconsistent or outdated. Before information from the Web is used to accomplish a specific task, its quality should be assessed according to task-specific criteria. The WIQA - Information Quality Assessment Framework is a set of software components that empowers information consumers to employ a wide range of different information quality assessment policies to filter information from the Web.

Nearby: The WIQA Browser | NG4J | W3C Named Graphs Webite |


Table of Contents

  1. Overview about the WIQA Framework
  2. The Semantic Web Publishing Vocabulary
  3. Using the WIQA Framework
  4. References


1. Information Quality on the Web

Information providers on the Web have different levels of knowledge, different views of the world and different intensions. Thus, provided information may be wrong, biased, inconsistent or outdated. Before information from the Web is used to accomplish a specific task, its quality should be assessed according to task-specific criteria [Bizer2007][BiOl04][Nau02].

In everyday life, we use a wide range of different policies to assess the quality of information: We might accept information from a friend on restaurants, but distrust him on computers; regard scientific papers only as relevant, if they have been published within the last two years; or believe foreign news only when they are reported by several independent sources. Which policy is chosen depends on the specific task at hand, our subjective preferences and the availability of information quality-related meta-information, such as ratings or background information about the information provider.

The goal of the WIQA framework is to empower users of web-based information systems to employ a similar wide range of different information filtering policies as they are using in the off-line world.

Quality-based information filtering policies evaluate multiple information quality dimension [Nau02][Wang96], such as accuracy, timeliness, relevancy, interpretability or beliefability. Afterwards, they aggregate the assessment results to an overall decision whether to accept or reject information.

Quality-based information filtering policies may rely on a wide range of different assessment metrics. The different assessment metrics can be classified into three categories according to the type of information, that is used as quality indicator:

A detailed description of the WIQA framework and the ideas behind the framework is found in [Bizer2007]

 

2. Overview about the Framework

The WIQA - Information Quality Assessment Framework is a set of software components for filtering information from the Web using a wide range of different filtering policies.

The framework has been designed to fulfill the following requirements:

The figure below gives an overview about the components of the WIQA framework and illustrates how applications interact with the framework.


The WIQA framework consists of the NG4J - Named Graphs API for Jena and the WIQA - Filtering and Explanation Engine.

Filtering Process

One example of an application that uses the WIQA framework is the WIQA Browser. The browser demonstrates how information quality filtering capabilities can be integrated into the Firefox Web browser. The browser enables users to extract structured information from web pages. Extracted information from different web pages is stored in a local repository and can be browsed, sorted and searched together. The content of the local repository can be filtered using WIQA-PL filtering policies. In order to help users understand filtering decisions, there is a "Oh, yeah?"-button [ BernersLee97 ] next to each piece of information which opens up a window with an explanation why information satisfies the selected policy.

The WIQA Framework is based on earlier work in the TriQL.P Project.

 


3. The WIQA-PL Information Quality Assessment Policy Language

Filtering policies are expressed using the WIQA-PL policy language. A WIQA-PL policy defines which information is filtered positive by the WIQA - Filtering and Explanation Engine. WIQA-PL policies can combine different content-, context- and rating-based assessment metrics. The basic idea of the language is to represent policies as a set of graph patterns which are matched against the graph set to be filtered. As information quality assessment often requires domain-specific assessment metrics, WIQA-PL provides an extension mechanism that enable domain-specific assessment metrics to be included into policies. WIQA-PL policies may contain explanation templates, which are used by the WIQA framework to generate natural language as well as RDF explanations about filtering decisions.

The WIQA-PL Language Specification describes the WIQA-PL language constructs and explain how the language is used to formulate information filtering policies.

The EBNF grammar of the WIQA-PL policy language is availiable here. The grammar is based on the grammar of the SPARQL query language in order to make it easier for people who already know SPARQL to learn WIQA-PL.

The following example shows the WIQA-PL policy "Accept only information that has been asserted by analysts who have received at least 3 positive ratings."

 
1. NAME "Asserted by analysts with at least 3 positive ratings."
2. DESCRIPTION "Accept only information that has been asserted by
3. analysts who have received at least 3 positive ratings."
4. PATTERNS {
5.
6. GRAPH fd:GraphFromAggregator
7. { ?GRAPH swp:assertedBy ?warrant .
8. ?warrant swp:authority ?authority .
9. EXPL "it was asserted by " ?authority " and " . }
10.
11. GRAPH ?graph2
12. { ?authority rdf:type fin:Analyst . }
13.
14. GRAPH fd:GraphFromAggregator
15. { ?graph2 swp:assertedBy ?warrant2 .
16. ?warrant2 swp:authority ?authority2 .
17. EXPL ?authority2 " claims that " ?authority
18. " is an analyst." . }
19.
20. GRAPH ANY
21. { ?rater fin:positiveRating ?authority .
22. FILTER (wiqa:count(?rater) > 2) .
23. EXPL ?authority "has received positive ratings from" . }
24.
25. GRAPH fd:BackgroundInformation
26. { ?rater fin:affiliation ?company .
27. EXPL ?rater "who works for" ?company . }
28. }

 

More example policies are contained in the WIQA Financial Example Policy Suite.


4. The Semantic Web Publishing Vocabulary

The Semantic Web Publishing Vocabulary (SWP) [CaBiHaSt05] provides terms for expressing different degrees of commitment towards information and for representing digital signatures. The vocabulary may be used for publishing signed information on the Web using the Named Graphs data model. The vocabulary can also be used for representing provenance meta-information about information from the Web.

Linking information to authorities and optionally assuring these links with digital signatures gives information consumers a secure basis for using filtering policies which rely on information provenance. Signing RDF graphs requires specific canonicalization and digest algorithms. The SWP vocabulary is the first RDF vocabulary that provides terms to identify these algorithms and to describe the combination of algorithms that is used to calculate a signature. This enables the SWP vocabulary to represent serialization-independent signatures and makes it possible to verify signatures even after information from different sources is combined and is serialized using a different serialization syntax.

Detailed information about the vocabulary is found in the Semantic Web Publishing Vocabulary (SWP) - User Manual.

The vocabulary is defined by the SWP-2 Vocabulary RDFS Schema

The two examples below show how the SWP vocabulary is used to assert two named graph and to asure this assertion using a digital signature. Both examples use the TriG syntax for Named Graphs.

 


5. Using the WIQA Framework

This section describes how applications use WIQA Filtering- and Explanation Engine to filter information using WIQA-PL policies. Please refer to the NG4J website for information about using NG4J.

5.1. Public Interface

The UML diagram below gives an overview about the public interface of the WIQA engine.

Policy and PolicyParser

WIQA-PL policies are represented as instances of the class Policy. The class PolicyParser provides a collection of static methods for parsing WIQA-PL policies from files and strings into policy objects.

AcceptedGraph and AcceptedGraphFactory

The class AcceptedGraph is the main interface of the WIQA - Filtering and Explanation Engine. An AcceptedGraph represents a filtered view on a set of named graphs. Only statements matching a WIQA-PL policy are in the graph. The method find() is used to extract information from the graph. The method returns an iterator over all accepted triples that match a given triple pattern. AcceptedGraphs are created using an AcceptedGraphFactory. The method createAcceptedGraph() of the factory returns an AcceptedGraph for a given Policy. The method setContextVariable() is used to set WIQA-PL context variables which may be used within policies afterwards.

Retrieving Explanations

AcceptedGraphs can generate textual explanations and RDF explanations why a triple was accepted into the graph. A textual Explanation is created by calling the method explain() of an AcceptedGraph. The method takes an accepted triple as argument. An Explanation consists of a collection of ExplanationParts. The method parts() returns the collection of ExplanationParts. An ExplanationPart represents a text fragment and has zero or more children which are also ExplanationParts. The text fragment is represented as list of RDF nodes. Most of these are literals, but some may be URI references or blank nodes which refer to some entity involved in the explanation. The WIQA framework provides a ExplanationToHTMLRenderer which can be used to generate basic HTML representations of explanations.

Implementing and Registering Extension Functions

The WIQA-PL policy language can be extended with domain-specific extension functions (see WIQA-PL Language Specification). Extension functions are implemented as plug-ins for the WIQA engine. On implementation level, two different types of extension functions are distinguished: Basic functions and extensions. The input of a basic function is a single matching solution, and a number of arguments which are RDF nodes. The output is a boolean value. The input of an extension is a stream of matching solutions, and a number of arguments which are RDF nodes. The output is a modified version of the input stream. Basic functions have to extend the abstract class ExplainableFunction. Extensions have to extend the abstract class ExplainableExtension. Basic functions and extensions must be registered with the FunctionRegistry and ExtensionRegistry, respectively, before they can be used in WIQA-PL policies.

For more information about the WIQA engine, please refer to:

 

5.2. Code Example

The code example below illustrates the usage of the WIQA - Filtering and Explanation Engine.

Lines 10-11 create a new NamedGraphSet and load a TriX file into the graph set. Lines 14-17 load a policy suite and select a policy from the suite. Line 19-22 create an AcceptedGraphFactory and assign a value to the context variable ?USER. Line 25 creates an AcceptedGraph by applying the selected policy against the graph set. Lines 28-30 create an iterator over all triples in the accepted graph that have the subject http://richard.cyganiak.de/foaf.rdf\#RC. In line 37 an explanation why the first triple satisfies the policy is created. This explanation is rendered to HTML and written to System.Out in lines 40-42.

 
1. import java.util.Iterator;
2. import com.hp.hpl.jena.graph.Node;
3. import com.hp.hpl.jena.graph.Triple;
4. import de.fuberlin.wiwiss.ng4j.*
5.
6. // Create a new graphset
7. NamedGraphSet graphset = new NamedGraphSetImpl();
8.
9. // Create a new NamedGraph in the NamedGraphSet
10. NamedGraph graph =
11. graphset.createGraph("http://example.org/persons/123");
12.
13. // Add information to the NamedGraph
14. graph.add(new Triple(
15. Node.createURI("http://richard.cyganiak.de/foaf.rdf#RC"),
16. Node.createURI("http://xmlns.com/foaf/0.1/name") ,
17. Node.createLiteral("Richard Cyganiak", null, null)));
18.
19. // Create a quad
20. Quad quad = new Quad(
21. Node.createURI("http://www.bizer.de/InformationAboutRichard"),
22. Node.createURI("http://richard.cyganiak.de/foaf.rdf#RC"),
23. Node.createURI("http://xmlns.com/foaf/0.1/mbox") ,
24. Node.createURI("mailto:richard@cyganiak.de"));
25.
26. // Add the quad to the graphset. This will create a new NamedGraph
27. // in the graphset.
28. graphset.addQuad(quad);
29.
30. // Find information about Richard across all graphs in the graphset
31. Iterator it = graphset.findQuads(
32. Node.ANY,
33. Node.createURI("http://richard.cyganiak.de/foaf.rdf#RC"),
34. Node.ANY,
35. Node.ANY);
36.
37. // Output the results of findQuads()
38. while (it.hasNext()) {
39. Quad q = (Quad) it.next();
40. System.out.println("Source: " + q.getGraphName());
41. System.out.println("Statement: " + q.getTriple());
42. }
43.
44. // Serialize the graphset to System.out, using the TriX syntax
45. graphset.write(System.out, "TRIX", null);

 

For testing the framework, you can use the Financial Example Graph Set containing financial news, analyst reports and postings from investment related discussion forums and the Financial Example Policy Suite.


6. Download
 

The WIQA -Filtering and Explanation Engine and the Named Graphs API for Jena can be downloaded from the Sourceforge NG4J Website.

The latest version of the code can be browsed online at http://ng4j.cvs.sourceforge.net/ng4j/WIQA

Version Comment Release Date
NG4J V0.5 NG4J maintainance release.
2006-10-09
WIQA V0.1 Initial release of the WIQA Engine.
2006-08-10

List of our other open source projects @ Freie Universität Berlin


7. Feedback

We are very interested in hearing your opinion about the WIQA Framework. Please send comments to:

Chris Bizer
chris@bizer.de
www.bizer.de


8. References

[Bizer2007]
Quality-Based Information Filtering in the Context of Web-Based Systems, Christian Bizer, 2007, http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/DisertationChrisBizer.pdf
[Named-Graphs]
Named Graphs Website, http://www.w3.org/2004/03/trix/
 
[TriG]
The TriG Syntax, Christian Bizer, 2004. http://www.wiwiss.fu-berlin.de/suhl/bizer/TriG/
[RDF-SYNTAX]
RDF/XML Syntax Specification (Revised), Beckett D. (Editor), W3C Recommendation, 10 February 2004. This version is http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/. The latest version is http://www.w3.org/TR/rdf-syntax-grammar/.
[Nau02]
Felix Naumann: Quality-Driven Query Answering for Integrated Information Systems, Springer, 2002. http://www.springerlink.com/content/bfevlgwdjcl2/
[Wang96]
Richard Wang and Diane Strong. Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of
Management Information Systems, 12(4):5–33, 1996. http://citeseer.ist.psu.edu/context/381786/0
[JIB06]
Audun Jøsang, Roslan Ismail, and Colin Boyd. A Survey of Trust and Reputation Systems for Online Service Provision,
2006, http://sky.fit.qut.edu.au/%7Ejosang/papers/JIB2006-DSS.pdf
[BuSaKh01]
Peter Buneman, Sanjeev Khanna, Wang-Chiew Tan: Why and Where: A Characterization of Data Provenance. http://db.cis.upenn.edu/DL/whywhere.pdf
[GoHePa03]
J. Golbeck, J. Hendler, B. Parsia: Trust Networks on the Semantic Web. http://mindswap.org/papers/Trust.pdf
[RiAgDo03]
M. Richardson, R. Agrawal, P. Domingos: Trust Management for the Semantic Web. http://www.cs.washington.edu/homes/mattr/doc/iswc2003/iswc2003.pdf
[Berners-Lee97]
Tim Berners-Lee: Cleaning up the User Interface, Section - The "Oh, yeah?"-Button. http://www.w3.org/DesignIssues/UI.html
[BiOl04]
Christian Bizer, Radoslaw Oldakowski: Using Context- and Content-Based Trust Policies on the Semantic Web. WWW2004, New York, May 2004. http://www.wiwiss.fu-berlin.de/suhl/bizer/SWTSGuide/p747-bizer.pdf
[CaBiHaSt05]
Jeremy Carroll, Christian Bizer, Patrick Hayes, Patrick Stickler: Named Graphs . Journal of Web Semantics, Vol. 3, Issue 4, p. 247-267, 2005. http://www.websemanticsjournal.org/ps/pub/2005-23
[Marchiori04]
Massimo Marchiori: W5: The Five W's of the World Wide Web. http://www.w3.org/People/Massimo/papers/2004/w5_04.pdf
[CuWi99]
Yingwei Cui, Jennifer Widom: Practical Lineage Tracing in Data Warehouses. http://citeseer.nj.nec.com/cache/papers/cs/1704/http:zSzzSzwww-db.stanford.eduzSzpubzSzpaperszSztrace.pdf/cui99practical.pdf
[McGuinnessDaSilva03]
L. McGuinness, P. da Silva: Infrastructure for Web Explanations - ISWC 2003. http://www.cs.toronto.edu/semanticweb/resource/reference/iswc03bestpapers/iswc03-infrastructure-web-explanations.pdf
More references to resources about the trust and security issues arising from the Semantic Web are found in the
Semantic Web Trust and Security Resource Guide.


$Id: index.html,v 1.2 2006/12/15 13:24:00 bizer Exp $