DBpedia is the Semantic Web version of Wikipedia and enables you to ask sophisticated queries against Wikipedia data. The flickr wrappr extends DBpedia with RDF links to photos posted on flickr. For each of the 1.95 million DBpedia concepts, the wrappr generates a collection of flickr photos that depict the concept.
Contents
- Introduction
- Some Examples
- Usage
- Architecture
- Technical Notes
- Linkage with DBpedia.org
- Geo-only queries
- Sourcecode
- Feedback
News
- 2009/04/27: flickr™ wrappr now only returns images licensed under CC-BY, CC-NC, CC-NC-SA, CC-SA licenses, that is, images that may be used in derivative works. This option has been active for a while and has now been made configurable using the ONLY_CC_DERIV constant.
- 2008/01/24: Switched to xsd:float datatype for geocoordinates; resolved querying issues.
- 2007/10/04: Added foaf:page links to Flickr photo pages; resolved encoding issues that impacted results.
- 2007/09/14: Sourcecode released; error handling improved.
- 2007/09/11: Added support for geo-only queries.
- 2007/09/10: Initial release.
1. Introduction
flickr is one of the world's largest public photo repositories. It allows users to label their photos and apply a range of tag types, including geographical coordinates (geotags). In order to simplify usage for the photographer, there are no explicit rules addressing naming, level of detail, or relevance. As a result, labels may be provided in many languages, and accuracy as well as relevance varies greatly. Additionally, ambiguities may arise when photos are labeled using only generic terms (e.g. capitol), prohibiting semantic matching algorithms from deriving exact statements about them (e.g. "this is a picture of the Washington Capitol") in the absence of additional clues.
On the other hand, geotags are often applied in batches, e.g. one might put all his Berlin vacation pictures on one spot. To prevent misinterpretations in these cases, Flickr stores the zoom level at which a picture was geotagged as an accuracy measure and considers it during the search process. For example, when performing geographical searches using the Flickr API, only images tagged within street level view are returned by default. However, this solution can't prevent situations where users associate all their vacation photos with a specific landmark on street level.
The solution lies in combining search by topic and search by geographic location. Wikipedia provides an enormous collection of semi-structured content, from which the DBpedia.org project extracts structured information. As a result, DBpedia.org is able to provide multilingual labels and geographic location for many topics. By matching labels and geographic locations from both worlds, we can obtain highly relevant photos for many topics (that is, if they have an article in Wikipedia).
Photo collections are interlinked from DBpedia using dbpedia:hasPhotoCollection RDF links. You can use Semantic Web browsers like Marbles, Tabulator or DISCO to follow these links from a DBpedia concept to the photo collection depicting it.
2. Some Examples
- White House
- http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/White_House (View with Marbles)
- vs. Flickr search with quotes
- Includes many unrelated pictures
- vs. Flickr search without quotes
- Even more unrelated pictures
- vs. Flickr geo search (50 meters radius; default accuracy = almost street level)
- Includes many picutres that were incorrectly geo-tagged
- Brandenburg Gate
- http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Brandenburg_Gate (View with Marbles)
- vs. Flickr search with quotes
- Good results
- vs. Flickr geo search (50 meters radius; maximum accuracy = street level)
- Includes a large collection of pictures taken all over Berlin (incorrect geotags)
3. Usage
The service is accessible at http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Entry
where Entry denotes the identifier of any article in the English Wikipedia, which can be found at the end of its URL (e.g. http://en.wikipedia.org/wiki/Berlin). For articles that are redirected (these are designated by the line Redirected from .... beneath the article title), the target article's identifier must be provided.
For testing purposes, the following query form may also be utilized:
Results are returned in RDF using foaf:depiction predicates. Using HTTP content negotiation, HTML clients are provided with a preview, while RDF browsers are delivered plain RDF/XML data. This behaviour may be overridden by appending the parameter ?format=rdf or ?format=xhtml.
4. Architecture
The flickr™ wrappr is implemented as a small PHP script (250 lines of code). Whenever the script gets a lookup call for a Wikipedia entry, it queries DBpedia.org's SPARQL endpoint for related multilingual labels and geo-coordinates. This information is provided to the Flickr Search API, and results are generated in RDF using foaf:depiction predicates.
The wrappr implements content type negotiation in accordance with Linking Open Data Recommendations.
5. Technical Notes
- All searches are performed with exact term matches (search terms in quotes) in order to provide only highly relevant results.
- When Wikipedia does not list geo-coordinates, or when we don't recognize them, we omit the geographic restriction during the Flickr search.
- Search is currently performed within 1 km radius of the coordinates found in Wikipedia. This leads to poor results when the request term is a region (city, country etc.).
- We only return 30 images per request according to our interpretation of the Flickr TOS.
- We don't perform searches in additional languages once at least 15 pictures have been found.
- As an alternative to geo-coordinates, the name of the city of interest could also be used as a search parameter in order to achieve similar results albeit with significantly more photos. However, the double coherence of both label and geo-coordinates seems to be a good indicator of precision. Additionally, this approach would be even more complex: Wikipedia currently lacks established mechanisms to specify resources’ cities, so these would have to be derived from the geo-coordinates in an additional step. Furthermore, the city name may require translation as well; multiplying the number of searches performed per query.
6. Linkage with DBpedia.org
A separate tool creates http://dbpedia.org/property/hasPhotoCollection links for all articles of the English Wikipedia. These will soon be published and imported into the public DBpedia.org endpoint, enabling clients to obtain current photos for a given entry on demand.
7. Geo-only queries
7.1 Introduction
The flickr wrappr also supports pure geographic queries. In this case, no reconcilement with DBpedia.org is performed; the results correspond to the Flickr world map.
This service is accessible at http://www4.wiwiss.fu-berlin.de/flickrwrappr/location/latitude/longitude/radius where
- latitude and longitude are decimals according to the World Geographical Survey 1984
- radius is the search range from the specified point in meters (max: 100,000)
7.2 Examples
- Photos taken around Free University of Berlin:
- Photos taken around Central Park, New York:
7.3 Notes
- RDF-aware browsers are redirected to RDF representations at http://www4.wiwiss.fu-berlin.de/flickrwrappr/data/photosDepictingLocation/latitude/longitude/radius; all others are redirected to HTML representations at http://www4.wiwiss.fu-berlin.de/flickrwrappr/page/photosDepictingLocation/latitude/longitude/radius.
8. Sourcecode
The sourcecode is available in the DBpedia.org SVN repository at https://dbpedia.svn.sourceforge.net/svnroot/dbpedia/related_apps/flickrwrappr.
9. Feedback
We are very interested in hearing your opinion about this service. Please send comments to Chris Bizer and Christian Becker.
Further information about our work in the area of the Semantic Web/Web-of-Data can be found at
List of our other open source projects @ Freie Universität Berlin
