Semantic Web? What’s in it for me?

There’s no doubt about it: the Semantic Web is the hottest thing in the on-line industry at the moment. It’s all over the web, on the speaker circuits, in multitudes of product labs. On-line publishers are being told again and again that they need to get there content into RDF triplets and create linked data. One of the questions they should be asking is: why?

This article assumes some knowledge of RDF although does not go into technical details. There are many good sites which introduce RDF, including RDF: about, which I recommend reading.

Okay, so some of the reasons why are obvious. Tim berners-lee‘s vision of linked data tying the WWW together has inarguable and massive benefits. The potential for applications utilizing knowledge gleaned from RDF triplets is mind-boggling. One of the points Dame Wendy Hall made at last months ePublishing forum was that if publishers felt that they missed out by not getting on board with the World Wide Web as sharply as they would have, with hindsight, then now was their opportunity to make up for that. Don’t miss the boat twice was the message; start thinking about the Semantic Web now.

And, for me, that sentiment – start thinking about the Semantic Web now – is even more pertinent than, perhaps, it was intended. In the mid nineties when the modern Web was taking off putting content on-line was a risky, uncertain business for publishers. There may have been some publishers who jumped in early and reaped the rewards, some were burned and some joined the party late, but knowing what we do now no publisher would have hesitated. So now the Semantic Web is the big, new thing; largely unknown and poorly understood (aren’t all new concepts?). But unlike the boom of the WWW – the scale of which was never predicted, even by TBL – we now do have some concept of the magnitude of what the Semantic Web could achieve. Certainly there is enough hype about it, now, that I, at least, can’t imagine the Semantic Web (in some form) not taking off.

So more than just looking at the augmentation of the Web with linked data as another opportunity to not miss the boat, we should be planning what we are going to do with this data. I can see the uses (visualisation?) of RDF triplets falling, broadly speaking, into two (non mutually exclusive) categories:

  1. Representations of specific facts
  2. Representations of generic facts

Currently there are a number of examples of interfaces for interacting with linked data available on the web. RKBExplorer is one of the best. There are also numerous examples of geo-data mapping applications, etc. These are representations of specific facts. That is, we have a question in mind and are displaying the answer(s). Take, for example, a set of triplets which link articles to there author, in the form:

chrisauthoredthispost

Using this information a piece of software can now ask the question: who wrote this article? And it would get back the correct answer: me. Now, in reality, this would be an extremely over simplified knowledge base; a more likely set up would include a foaf:Person and possibly a bnode referencing some Dublin Core meta-data (don’t worry about the terminology). Then the scope of available questions widens dramatically. Where do the colleagues of the person who wrote this article live? Where can I find a photo of the author? By complying with these standard ontologies software can make pretty accurate assumptions about valid questions to ask.

In the same vien, whole new possibilities become achievible in terms of mash-ups. Say I’m writing a review of a new novel. If I can assume that Amazon and all the other big online vendors are producing RDF documents describing their stock I can simply query for ISBN which I know is stored as dc:Identifier and return all prices which I can assume (for the perpose of an example) are commerce:Price. In short, RDF is a great way of managing distributed data – which is something you’ll hear a lot of if you dig into the subject.

But even with applications utilizing complex webs of linked data in this way they are still only asking predefined questions. “I know how to display a latitude and longitude on a map so I’ll find out those details”. “If a foaf:Person has a picture I’ll display it by their posts”.

The second category of uses I described for RDF triplets was the representation of generic facts. This is something I haven’t seen done yet (with the possible exception of the SPARQL – which is not appropriate for this discussion) but seams, to me, at least, to be an obvious next step. Let me explain…

The beauty of the RDF approach – beyond any other – is that is allows the document owner to describe any fact with computers still able to extract some kind of meaning from it. This is where the predicate of the triplet comes in and why using a URI is so important. It goes without saying that if a well used standard exists for describing any component of the triplet then it should be used but if one doesn’t exist you can still describe the fact. I could create by own URI which describes the predicate ‘ate for lunch’, if I so pleased. And then I could publish the fact that I, Chris, ate for lunch beans-on-toast and, in theory, an application with no prior relation to me could understand what I meant (at least to some degree). The application in question would, possibly, not understand what “ate for lunch” means but it could point it’s user to the URI I created and, hence, explain the fact to them.

Finding new ways to represent these generic facts has to be on the horizon of anyone interested in pushing the Semantic Web into the mainstream. It may be through widgets and apps, it may require a new generation of browser, but it should happen. I have no doubt that the kind of mash-ups and queries that I described as representations of specific facts are achieved much, much more easily using RDF channels for data but, essencially, we could already represent those kind of links between data. I could build a database of all the authors who write on my site and produce a Google Maps integration to show you where they live. However, I could never – without a unified system of triplets – even concieve of displaying arbitrary facts to acompany an article unless someone had manually written them. Certainly, one could not display those facts dynamically, it would be impractical. But, as the RDF standard becomes more popular, allowing applications (widgets, etc) and search portals to do just that is very much a realistic prospect.

If we, as online publishers, are going to jump, two-footed, into the Semantic Web (which I firmly believe we should) we should also be thinking about our goals and reasons for doing so. No publisher’s target is to help search engines answer a searchers query without visiting their site; or contributing to the building a knowledge base of unparrallelled proportions. No. The goal has to be the same as it always was, to improve the users experience and to drive web traffic. So, sure, don’t get left behind, get content into RDF format, but why stop there? This is the time to be thinking about how to get ahead of the curve, how to use this data. Certainly I am…

7 Responses to “Semantic Web? What’s in it for me?”

  1. Martin Brousseau says:

    Excellent thought Chris! This is exactly the kind of question we first need to ask ourselves in order to promote the real value of linked data and the overall vision formulated 20 years ago by Tim Berners-Lee.
    Standards and technologies behind this vision are becoming mainstream but the business value of it is still fuzzy, like any new “gadget”. Here at Nstein, we have a role to play in this, to clearly show what is this value, where it comes from and how to implement it.
    I think we should have a wide angle view of all possibilities these standards and technologies offer in the on-line industry. Also, I strongly think that – as of today – the Semantic Web is largely about consuming linked data sources, and that linked data sources needs to be semantically enriched with text-mining and text analytics solutions in order to annotate unstructured content with structured and rich semantic metadata. Among these Semantic Web technologies, TME5, Nstein’s text-mining engine will be a key player!
    In a near future, on of the challenge will be to keep these linked data sources accurate and up to date in a timely fashion. This is where the immediate value of any linked data will come from. Here again, we have some ideas. The best “maintenance” tool will be the winner. 😉

  2. Diane Burley says:

    Nice expository Chris! It seems to me that speciality publishers have more to gain than most since they can create URIs to describe certain activities germaine to their subject. Allowing them to be the “authority file” of that world, so to speak — which is the role they used to play until the democratization of the Web. Since no one knows when the Semantic Web will catch fire — it is merely smoldering in small circles — it may be possible for publishers to wait — however can they afford to play catch up twice? And that answer is definitively no.

  3. […] In his recent exemplary article, my colleague Chris Scott posted the question ‘Semantic Web ? … and whilst I don’t intend to retread what he describes in great detail, there is much in there that will help us here, as we’re beginning to make the journey towards the world of ‘Linked Data’. […]

  4. You are right to worry a bit about why we want to be on the semantic Web, although I must say the “West coast Internet” strategy – which has lead to a lot cool things not to mention wealth – is often jump first figure out the monetization later. What mainstream publishers are failing to grasp is that the Internet means you are in new businesses. Google has a killer search engine as a means to collect detailed data about people, their relationships to each other, and their content interests. This is also why they run YouTube at a net loss but really don’t seem terribly upset about it. They understand that what I view on YouTube gives them lots of possible monetization opportunities directly AND indirectly.

    Publishers have got to realize that the content they deliver is only a small fraction of what they need to monetize on the Web. You need to understand the relationship of the content to other content and the people consuming it. As it stands, most publishers have such a rudimentary understanding of the content and their audience that they can’t even consider alternate mechanisms of monetization.

    Google built up a lot of data around search, figured out how to use that data to do things like suggest good content for me… but then also use that data to monetize advertising. How many publishers have even tried to build a framework for understanding content consumption? Without that they rely on third parties to do their advertising.

    Funny you talk about the semantic Web as “the hottest thing.” I wonder if this time is for real? I am a bit older than you and recall that this was all-the-rage emerging in the late 90s with the XML community: http://www.xml.com/pub/a/2001/03/07/buildingsw.html. But this time around the level of concern may be a bit higher…

  5. chris says:

    @Christopher Hill
    It the hottest in terms of what I hear at conferences, tutorials, etc. Even more so than the social media buzz, I would say. But sure it’s nothing new; didn’t the use of triples as facts start back in the 80’s on Pascal?

  6. Guy Valerio says:

    Good post Chris.

    @Martinbrousseau
    > “The best “maintenance” tool will be the winner. 😉 ”

    Great point.

    Guy

  7. […] video based content. And, of course, all of the data from a particular clip can integrate into the Semantic Web seamlessly. RDF links and TME generated relations could easily be used to automate the association […]

Leave a Reply

Your email address will not be published. Required fields are marked *