Semantic Web? What’s in it for me?
There’s no doubt about it: the Semantic Web is the hottest thing in the on-line industry at the moment. It’s all over the web, on the speaker circuits, in multitudes of product labs. On-line publishers are being told again and again that they need to get there content into RDF triplets and create linked data. One of the questions they should be asking is: why?
This article assumes some knowledge of RDF although does not go into technical details. There are many good sites which introduce RDF, including RDF: about, which I recommend reading.
Okay, so some of the reasons why are obvious. Tim berners-lee‘s vision of linked data tying the WWW together has inarguable and massive benefits. The potential for applications utilizing knowledge gleaned from RDF triplets is mind-boggling. One of the points Dame Wendy Hall made at last months ePublishing forum was that if publishers felt that they missed out by not getting on board with the World Wide Web as sharply as they would have, with hindsight, then now was their opportunity to make up for that. Don’t miss the boat twice was the message; start thinking about the Semantic Web now.
And, for me, that sentiment – start thinking about the Semantic Web now – is even more pertinent than, perhaps, it was intended. In the mid nineties when the modern Web was taking off putting content on-line was a risky, uncertain business for publishers. There may have been some publishers who jumped in early and reaped the rewards, some were burned and some joined the party late, but knowing what we do now no publisher would have hesitated. So now the Semantic Web is the big, new thing; largely unknown and poorly understood (aren’t all new concepts?). But unlike the boom of the WWW – the scale of which was never predicted, even by TBL – we now do have some concept of the magnitude of what the Semantic Web could achieve. Certainly there is enough hype about it, now, that I, at least, can’t imagine the Semantic Web (in some form) not taking off.
So more than just looking at the augmentation of the Web with linked data as another opportunity to not miss the boat, we should be planning what we are going to do with this data. I can see the uses (visualisation?) of RDF triplets falling, broadly speaking, into two (non mutually exclusive) categories:
- Representations of specific facts
- Representations of generic facts
Currently there are a number of examples of interfaces for interacting with linked data available on the web. RKBExplorer is one of the best. There are also numerous examples of geo-data mapping applications, etc. These are representations of specific facts. That is, we have a question in mind and are displaying the answer(s). Take, for example, a set of triplets which link articles to there author, in the form:
Using this information a piece of software can now ask the question: who wrote this article? And it would get back the correct answer: me. Now, in reality, this would be an extremely over simplified knowledge base; a more likely set up would include a foaf:Person and possibly a bnode referencing some Dublin Core meta-data (don’t worry about the terminology). Then the scope of available questions widens dramatically. Where do the colleagues of the person who wrote this article live? Where can I find a photo of the author? By complying with these standard ontologies software can make pretty accurate assumptions about valid questions to ask.
In the same vien, whole new possibilities become achievible in terms of mash-ups. Say I’m writing a review of a new novel. If I can assume that Amazon and all the other big online vendors are producing RDF documents describing their stock I can simply query for ISBN which I know is stored as dc:Identifier and return all prices which I can assume (for the perpose of an example) are commerce:Price. In short, RDF is a great way of managing distributed data – which is something you’ll hear a lot of if you dig into the subject.
But even with applications utilizing complex webs of linked data in this way they are still only asking predefined questions. “I know how to display a latitude and longitude on a map so I’ll find out those details”. “If a foaf:Person has a picture I’ll display it by their posts”.
The second category of uses I described for RDF triplets was the representation of generic facts. This is something I haven’t seen done yet (with the possible exception of the SPARQL – which is not appropriate for this discussion) but seams, to me, at least, to be an obvious next step. Let me explain…
The beauty of the RDF approach – beyond any other – is that is allows the document owner to describe any fact with computers still able to extract some kind of meaning from it. This is where the predicate of the triplet comes in and why using a URI is so important. It goes without saying that if a well used standard exists for describing any component of the triplet then it should be used but if one doesn’t exist you can still describe the fact. I could create by own URI which describes the predicate ‘ate for lunch’, if I so pleased. And then I could publish the fact that I, Chris, ate for lunch beans-on-toast and, in theory, an application with no prior relation to me could understand what I meant (at least to some degree). The application in question would, possibly, not understand what “ate for lunch” means but it could point it’s user to the URI I created and, hence, explain the fact to them.
Finding new ways to represent these generic facts has to be on the horizon of anyone interested in pushing the Semantic Web into the mainstream. It may be through widgets and apps, it may require a new generation of browser, but it should happen. I have no doubt that the kind of mash-ups and queries that I described as representations of specific facts are achieved much, much more easily using RDF channels for data but, essencially, we could already represent those kind of links between data. I could build a database of all the authors who write on my site and produce a Google Maps integration to show you where they live. However, I could never – without a unified system of triplets – even concieve of displaying arbitrary facts to acompany an article unless someone had manually written them. Certainly, one could not display those facts dynamically, it would be impractical. But, as the RDF standard becomes more popular, allowing applications (widgets, etc) and search portals to do just that is very much a realistic prospect.
If we, as online publishers, are going to jump, two-footed, into the Semantic Web (which I firmly believe we should) we should also be thinking about our goals and reasons for doing so. No publisher’s target is to help search engines answer a searchers query without visiting their site; or contributing to the building a knowledge base of unparrallelled proportions. No. The goal has to be the same as it always was, to improve the users experience and to drive web traffic. So, sure, don’t get left behind, get content into RDF format, but why stop there? This is the time to be thinking about how to get ahead of the curve, how to use this data. Certainly I am…