Liberating Tag Collections

April 05, 2010

The goal of the semantic web is to construct a web of semantically annotated information that allows machines to find, combine, and process information. The vision for the semantic web requires a ontologies defining relevant terms and their relations. While the semantic web has been making only slow progress, another form of annotation has taken over the (social) web with sites such as flickr. Tagging is a light-weight form of semantic annotation in which simple terms (keywords) are associated with an artifact. Tags are chosen on the fly by users and don’t have to follow a pre-defined ontology. As a result, tag systems (folksonomies) can develop quickly with emerging terminologies. Furthermore, tags do not enforce a Linean hierarchy on terms. On the other hand, tagging systems suffer from ambiguity, synonyms, and classification level mismatches. Despite these problems, tagging systems seem to be very successful on the social web. (Are there studies that indicate that tagging systems do indeed help users find stuff more effectively?)

This analysis is the introduction for the paper about TagFusion, which notes that a big downside of the tagging collections of popular social media sites is that they don’t provide the interoperability that is envisioned for the (semantic) web. That is, tags assigned to artifacts on one site can not be linked to tags on other sites. The paper proposes to address this problem by means of a meta-tagging facility that collects tags for various sites and links them, allowing various forms of data mining to be applied.

While the idea is worthy, it is not clear whether TagFusion will help. First, it seems to be an idea rather than a working system; the project page does not link to an implementation. But more importantly, as the paper cites Tim Berners Lee, the web is more of a social construction than a technological one. The architecture of TagFusion seems to require sites with tag collections to publish tags to the TagFusion system, depending on collaboration (e.g. by providing webhooks) for such an enterprise to work. To achieve better interoperability it would be useful if more sites provide an API that allows arbitrary sites to hook into their content. (It is not yet clear to me what such an API would look like, though. But I am thinking about an API for various aspects of researchr, including tagging; suggestions are welcome.)

From a researchr review of “TagFusion: A System for Integration and Leveraging of Collaborative Tags” by Milan Stankovic and Jelena Jovanovi.