JSON LD vs RDF/JSON (Alternative serialization)

I’d like to say that I am rarely opinionated about technology but that would be a barefaced lie. It is true, however, that I am rarely so opinionated that I motivate myself to write about it. This, unfortunately, is one of those occasions.

On January the 16th the JSON-LD specification became a W3C recommendation. There was an “Alternative Serialization” of RDF into JSON, referred to as RDF/JSON. This format is not to become a recommendation:

The RDF Working Group has decided not to push this document through the W3C Recommendation Track. [from http://www.w3.org/TR/rdf-json/]

I think that decision was a mistake; particularly from the perspective of an RDF user. I have no real right to complain as I was not involved with the working group or a member of the mailing list; I can whine now, though.

Differences

In a nutshell:

RDF/JSON (the Alternative Syntax) is very simple. It maps obviously to Notation 3 and the RDF core concept.

JSON-LD is not specific to RDF and it has lots of features to make it more readable to a human, which makes it more verbose.

Using the syntaxes

I am making an assumption that, in choosing JSON as a format, the user of one of the two syntaxes would likely be a JavaScript developer. So, I put together a minimal example of processing RDF written in both syntaxes. In both cases I have mapped the following Turtle into JSON in the tersest way I could – specifically, I did not use the JSON-LD “@context” attribute, in order to give JSON-LD the fairest chance. Then, for each JSON object I wrote a simple JavaScript program to convert the RDF back into Turtle. Here are the two programs for comparison:

Using the “Alternative Syntax”:

function printLiteralTriple(subject, predicate, value, language) {
  document.write("<" + subject + "> <" + predicate + "> "" + value + ""@" + language + " .");
  document.write("<br/>");
}

var jsonAlt = {
  "http://example.org/about" : {
      "http://purl.org/dc/terms/title" : [ { "value" : "Anna's Homepage", 
                                             "type" : "literal", 
                                             "lang" : "en" } ] 
  }
}

for (var subject in jsonAlt) {
  for (var predicate in jsonAlt[subject]) {
    for (var objectIndex in jsonAlt[subject][predicate]) {
      var object = jsonAlt[subject][predicate][objectIndex];
      printLiteralTriple(subject, predicate, object.value, object.lang);
    }
  }
}

See the Pen zoFHC by chrismichaelscott (@chrismichaelscott) on CodePen.

Note that the iteration is east to understand as the keys hold useful content – a common construct in JavaScript – and are homogeneous (i.e. the top level keys are always subjects).

Using JSON-LD:

function printLiteralTriple(subject, predicate, value, language) {
  document.write("&lt;" + subject + "&gt; &lt;" + predicate + "&gt; "" + value + ""@" + language + " .");
  document.write("<br/>");
}

var jsonLD = {
  "@id": "http://example.org/about",
  "http://purl.org/dc/terms/title": {
    "@value": "Anna's Homepage",
    "@type": "literal", 
    "@language" : "en"
  }
}

for (var key in jsonLD) {
  var subject;

  if (key == "@id") {
    subject = jsonLD[key];
  } else { // NOTE: there are other cases to check for here, such as @context
    var predicate = key;
    var object = jsonLD[key];

    if (object instanceof Array) {
      for (var objectIndex in object) {
        printLiteralTriple(subject, predicate, object[objectIndex]["@value"], object[objectIndex]["@language"]);
      }
    } else {
      printLiteralTriple(subject, predicate, object["@value"], object["@language"]);
    }
  }
}

See the Pen JaeIG by chrismichaelscott (@chrismichaelscott) on CodePen.

Because the keys in JSON-LD are not homogeneous (i.e. they may be keywords, like “@id”, or they may be predicates) the developer cannot simply iterate over the object.

Computer readable verses human readable

A fully featured JSON-LD document may be more complex than the example above. One can use the “@context” key to provide typing information and prefixes (as they would be in Turtle). That feature can arguably make the document easier for a human to read but is that really the objective? Look what happens to the JavaScript parser when we include a “@context” – and consider that this is a minimal parser that doesn’t support any case other than a literal valued triple.

var jsonLD = {
  "@context": {
    "title": {
      "@id": "http://purl.org/dc/terms/title",
      "@type": "@value",
      "@language": "en"
    }
  },
  "@id": "http://example.org/about",
  "title": "Anna's Homepage"
}

function printLiteralTriple(subject, predicate, value, language) {
  document.write("&lt;" + subject + "&gt; &lt;" + predicate + "&gt; "" + value + ""@" + language + " .");
  document.write("<br/>");
}

var contexts = {};

// first resolve contexts
for (var key in jsonLD) {
  if (key == "@context") {
    if (jsonLD["@context"] instanceof Array) {
      for (var contextIndex in jsonLD["@context"]) {
        for (var context in jsonLD["@context"][contextIndex]) {
          contexts[context] = jsonLD["@context"][contextIndex][context];
        }
      }
    } else {
      for (var context in jsonLD["@context"]) {
        contexts[context] = jsonLD["@context"][context];
      }
    }
  }
}

for (var key in jsonLD) {
  var subject;

  if (key == "@id") {
    subject = jsonLD[key];
  } else if (key != "@context") {
    var predicate = key;
    var object = jsonLD[key];
    var objectLanguage;

    if (typeof(contexts[predicate]) !== "undefined") {
      if (typeof(contexts[predicate]["@language"]) !== "undefined") {
        objectLanguage = contexts[predicate]["@language"];
      }
      predicate = contexts[predicate]["@id"];
    }

    if (object instanceof Array) {
      for (var objectIndex in object) {
        var objectValue;
        if (typeof(object[objectIndex]) == "string") {
          objectValue = object[objectIndex];
        } else {
          if (typeof(objectLanguage) == "undefined") {
            objectLanguage = object[objectIndex]["@language"];
          }
          objectValue = object[objectIndex]["@value"];
        }
        printLiteralTriple(subject, predicate, objectValue, objectLanguage);
      }
    } else {
      var objectValue;
      if (typeof(object) == "string") {
        objectValue = object;
      } else {
        if (typeof(objectLanguage) == "undefined") {
          objectLanguage = object["@language"];
        }
        objectValue = object["@value"];
      }
      printLiteralTriple(subject, predicate, objectValue, objectLanguage);
    }
  }
}

See the Pen jiarn by chrismichaelscott (@chrismichaelscott) on CodePen.

Not great, is it? I really hope the working group reconsider. For human readable RDF use Turtle: it’s great at that. For RDF to be consumed by object oriented languages, use RDF/JSON, the “Alternative Serialization”. 🙁

I’m on the Semantic Web! Pt. 2

Ok. So in the last post I was talking about how I created an RDF graph to describe myself and found that I’d entered into a huge rambling geekfest about the design of the FOAF vocabluary. So, I decided to cut all of that out and post it here separately. For context, it follows directly from having created this RDF document. If your interested read on…

From that experience, of creating my own RDF graph, I had only one hiccup: using the FOAF vocabulary, while it is relatively simple to define a group (such as the company which employs you) and list its members (in that case, staff), it seems impossible to do it the other way around. Essentially, you cannot say “I work for OpenText” but can say “OpenText employs me”. I do understand why this is, though: it is fairly standard for predicates to assume ahas relationship not an is one (#me has foaf:name Chris, not is foaf:name Chris), and standards are essential for Linked Data to work.

You may think that the problem described above sounds pretty irrelevant (you may be right: read on), so let me run through my thought process:

Imagine two graphs, one describing me and one describing OpenText. In the OpenText graph there is a list of employees (as there is on Freebase) which include a reference to my graph. You could, then, search (for the purpose of an example) for the weblogs of OpenText employees fairly successfully. If, however, you were using my graph, you couldn’t find a list of colleagues of mine because I couldn’t add “Chris is employed by OpenText” to then graph and, hence, the two could not be connected.

Someone obviously agreed with that assessment as I discovered the RoleVocab vocabulary on the FOAF wiki. I used that vocab in my person profile document to assert that “Chris has a role in the organization OpenText”.

With hindsight, I think that might have been a mistake, though. My mind-frame was trapped in the resource – the me. Perhaps I should have been thinking about the whole RDF graph. Why couldn’t I include a separate resource about OpenText which only included my employment? Well, because the domain of the foaf:member property is foaf:Group and the foaf:Organization type is a direct subclass of foaf:Agent. Essentially, the foaf vocabulary is saying that you can only be a member of a group and not an organization. Personally, I think that the most semantically correct way around this issue would be to make foaf:Organization a subClass of foaf:Group or, failing that, foaf:Organization could be added as a second rdfs:domain property of foaf:member…. I may make the suggestion.

In the meantime, I’ve also added an OWL Object Property to the top of my RDF document which describes the predicate”employee”, as in “OpenText has employee Chris”.

So: apologies for the geeky and rambling post and please let me know your thoughts on the whole “Group has member Person”/”Person participates in Group” conundrum.

I’m on the Semantic Web!

That’s right. About a fortnight ago I decided it was about time to practice what I preach (well, specifically what I was due to preach at last weeks excellent ePublishing Innovation Forum) and get myself onto the Semantic Web. For those new to the concept of the Semantic Web, I’m talking about creating an RDF graph which includes a resource describing me.

So, without further ado, here I am:

http://chrisscott.me/about/card#me

The document at the end of that link is a FOAF Personal Profile Document. As you can see, the URI above includes the fragment “me”. This is a fairly important part of the Linked Data concept as it allows one of the axioms, that the URI is dereferenceable, whilst also identifying a resource, “me”, which can be used to link the graph to others. So, if you are curious, take a look at my personal profile and check out the “me” resource – it’s pretty simplistic but a good starting point.

So, how did I go about creating my personal profile on the Semantic Web? Well I started with a step I urge everyone to do: I signed up to the Opera community. You can do the same here. Once you’ve done that you can go to your profile and click on the “FOAF” link on the right hand side of the footer:

My profile page in the Opera community.

That’s the quickest and easiest way to get yourself represented on the Semantic Web but for me Opera do not give you enough control. For example, I cannot use the foaf:weblog predicate to point to this blog, only the one which Opera host for me (that said, they do support the rdfs:seeAlso predicate so my private personal profile is referenced by my Opera one). For that reason, I took the XML generated for my Opera community profile, tweaked it a bit and uploaded it onto this domain.

Give it a go! I’d love to hear how people get on…

NB: I ended up going on a bit in the draft of this post about the FOAF vocab design and got a bit technical, so I’ve seperated that content off into this post.

Semantic Web? What’s in it for me?

There’s no doubt about it: the Semantic Web is the hottest thing in the on-line industry at the moment. It’s all over the web, on the speaker circuits, in multitudes of product labs. On-line publishers are being told again and again that they need to get there content into RDF triplets and create linked data. One of the questions they should be asking is: why?

This article assumes some knowledge of RDF although does not go into technical details. There are many good sites which introduce RDF, including RDF: about, which I recommend reading.

Okay, so some of the reasons why are obvious. Tim berners-lee‘s vision of linked data tying the WWW together has inarguable and massive benefits. The potential for applications utilizing knowledge gleaned from RDF triplets is mind-boggling. One of the points Dame Wendy Hall made at last months ePublishing forum was that if publishers felt that they missed out by not getting on board with the World Wide Web as sharply as they would have, with hindsight, then now was their opportunity to make up for that. Don’t miss the boat twice was the message; start thinking about the Semantic Web now.

And, for me, that sentiment – start thinking about the Semantic Web now – is even more pertinent than, perhaps, it was intended. In the mid nineties when the modern Web was taking off putting content on-line was a risky, uncertain business for publishers. There may have been some publishers who jumped in early and reaped the rewards, some were burned and some joined the party late, but knowing what we do now no publisher would have hesitated. So now the Semantic Web is the big, new thing; largely unknown and poorly understood (aren’t all new concepts?). But unlike the boom of the WWW – the scale of which was never predicted, even by TBL – we now do have some concept of the magnitude of what the Semantic Web could achieve. Certainly there is enough hype about it, now, that I, at least, can’t imagine the Semantic Web (in some form) not taking off.

So more than just looking at the augmentation of the Web with linked data as another opportunity to not miss the boat, we should be planning what we are going to do with this data. I can see the uses (visualisation?) of RDF triplets falling, broadly speaking, into two (non mutually exclusive) categories:

  1. Representations of specific facts
  2. Representations of generic facts

Currently there are a number of examples of interfaces for interacting with linked data available on the web. RKBExplorer is one of the best. There are also numerous examples of geo-data mapping applications, etc. These are representations of specific facts. That is, we have a question in mind and are displaying the answer(s). Take, for example, a set of triplets which link articles to there author, in the form:

chrisauthoredthispost

Using this information a piece of software can now ask the question: who wrote this article? And it would get back the correct answer: me. Now, in reality, this would be an extremely over simplified knowledge base; a more likely set up would include a foaf:Person and possibly a bnode referencing some Dublin Core meta-data (don’t worry about the terminology). Then the scope of available questions widens dramatically. Where do the colleagues of the person who wrote this article live? Where can I find a photo of the author? By complying with these standard ontologies software can make pretty accurate assumptions about valid questions to ask.

In the same vien, whole new possibilities become achievible in terms of mash-ups. Say I’m writing a review of a new novel. If I can assume that Amazon and all the other big online vendors are producing RDF documents describing their stock I can simply query for ISBN which I know is stored as dc:Identifier and return all prices which I can assume (for the perpose of an example) are commerce:Price. In short, RDF is a great way of managing distributed data – which is something you’ll hear a lot of if you dig into the subject.

But even with applications utilizing complex webs of linked data in this way they are still only asking predefined questions. “I know how to display a latitude and longitude on a map so I’ll find out those details”. “If a foaf:Person has a picture I’ll display it by their posts”.

The second category of uses I described for RDF triplets was the representation of generic facts. This is something I haven’t seen done yet (with the possible exception of the SPARQL – which is not appropriate for this discussion) but seams, to me, at least, to be an obvious next step. Let me explain…

The beauty of the RDF approach – beyond any other – is that is allows the document owner to describe any fact with computers still able to extract some kind of meaning from it. This is where the predicate of the triplet comes in and why using a URI is so important. It goes without saying that if a well used standard exists for describing any component of the triplet then it should be used but if one doesn’t exist you can still describe the fact. I could create by own URI which describes the predicate ‘ate for lunch’, if I so pleased. And then I could publish the fact that I, Chris, ate for lunch beans-on-toast and, in theory, an application with no prior relation to me could understand what I meant (at least to some degree). The application in question would, possibly, not understand what “ate for lunch” means but it could point it’s user to the URI I created and, hence, explain the fact to them.

Finding new ways to represent these generic facts has to be on the horizon of anyone interested in pushing the Semantic Web into the mainstream. It may be through widgets and apps, it may require a new generation of browser, but it should happen. I have no doubt that the kind of mash-ups and queries that I described as representations of specific facts are achieved much, much more easily using RDF channels for data but, essencially, we could already represent those kind of links between data. I could build a database of all the authors who write on my site and produce a Google Maps integration to show you where they live. However, I could never – without a unified system of triplets – even concieve of displaying arbitrary facts to acompany an article unless someone had manually written them. Certainly, one could not display those facts dynamically, it would be impractical. But, as the RDF standard becomes more popular, allowing applications (widgets, etc) and search portals to do just that is very much a realistic prospect.

If we, as online publishers, are going to jump, two-footed, into the Semantic Web (which I firmly believe we should) we should also be thinking about our goals and reasons for doing so. No publisher’s target is to help search engines answer a searchers query without visiting their site; or contributing to the building a knowledge base of unparrallelled proportions. No. The goal has to be the same as it always was, to improve the users experience and to drive web traffic. So, sure, don’t get left behind, get content into RDF format, but why stop there? This is the time to be thinking about how to get ahead of the curve, how to use this data. Certainly I am…

About me

Contrary to the massive "Chris Scott" at the top of the page, I'm not a (complete) ego-maniac. I just liked the font and couldn't think of anything more interesting to say.

I'm a passionate developer and entrepreneur. My company Factmint provides an elastic RDF triplestore and a suite of Data Visualization tools, so I largely talk about those things.

Fork me on GitHub

GitHub Octocat

chrismichaelscott @ GitHub

  • Status updating...