Linked Data is experiencing rapid growth since 2007. Hundreds of interlinked datasets compose a knowledge space which currently consists of more than 31 billion RDF triples. Given that this information comes from disparate sources, over which there is no central curation or control, quality problems often emerge. Quality problems come in different flavors, including redundant (duplicate) triples, conflicting, inaccurate, untrustworthy or outdated information, inconsistencies, invalidities and others. It has been reported that data quality problems cost businesses several billions of dollars each year. Therefore, the identification of bad quality datasets is an important task, as well as devising ways to improve data quality.
The Linked Open Data Quality Assessment (LODQA) benchmark consists of 5 input data sets about Brazilian cities that have been extracted from Wikipedia editions in English, Portuguese, Spanish, German and French; as well as a gold standard data set obtained from an official source (ibge.gov.br). The input data sets are potentially incomplete, outdated or incorrect. The objective of the benchmark is to compare quality assessment and repair methods for obtaining an integrated view over those data sets that is more complete, concise and consistent than using information from each of the individual sources alone. Systems are tasked with the challenge to assess the quality of the input data sets and decide for the correct values based on that assessment. A benchmark driver is provided that compares the obtained values with the gold standard and produces evaluation scores.
Please send comments and feedback about the benchmark to Chris Bizer, Pablo N. Mendes and María Poveda Villalón.
This work was supported by the European Commission's Seventh Framework Programme FP7/2007-2013 (PlanetData, Grant 257641) and the Spanish mobility and internationalization program for PhD studies (Orden EDU/2719/2011. Ministerio de Educación).
This work is licensed under a Creative Commons Attribution 3.0 Unported License.