The future of video on the web

I’m getting rather excited about video media online. We’re on the cusp of a revolution in the way we produce and consume the medium.

I was working on a project recently which involved video content. It struck me that, although we have come on no end in terms of our ability to distribute video over the web in the last half decade, video content still has huge holes in the orthodox functionalities of more established media.

Most obviously, there is the dependency upon external codecs (i.e. not native to the browser). The solution to which, in the most case, is a Flash player. There are numerous Flash players available freely and cheaply on the web; they can usually play most of the common video types and depend only upon a single plugin, Flash. YouTube is probably the best known example of using Flash to play videos.

This approach creates problems all of it’s own, though:

  • Flash players still have a dependency upon a browser plugin.
  • The binary video – the original file – is not transparently available in the way that images and text are.
  • Flash does not always cohere with de facto web standards: you cannot apply CSS to Flash, it does not respect z-indexes of objects (ever seen a drop-down menu disappear underneath a Flash component?).
  • It does not have a full set of properties directly accessible for the content it wraps, as a other elements in a pages DOM do.

Don’t get me wrong, Flash has it’s place in the modern web. It is a fantastic platform for RIAs and rich, animated and interactive components of web sites. However as far as video presentation goes it is, essentially, a hack.

These drawbacks for video (and, in fact, audio) presentation, manipulation and playback have not gone unnoticed. One of the most important changes for HTML5 – first drafted back in January 2008 – is the handling of these mediums with the <video> and <audio> tags, now supported in both Gecko and Webkit.

The initial specifications for HTML5 recommended the lossy Ogg codecs for audio and video:

“User agents should support Ogg Theora video and Ogg Vorbis audio, as well as the Ogg container format”

The reasoning behind this drive for a single format seems obvious enough. Going-it-alone doesn’t really work as far as web standards are concerned (does it IE?). There were, however, some objections as to the choice of codec, namely from Apple and Nokia. The details of the complaints are not really relevant to this article but can be read in more detail on the Wikipedia page, Ogg controversy. At the end of the day it doesn’t really matter which format is used as long as it is consistent with the requirements of the W3C specifications; for this article I am going to assume that the Ogg codecs and container will be standard.

So, now that we have browsers (Firefor 3.5, Safari 3.1) which support the <video> tab and have native Ogg Coder/Decoders (At least FireFox) all of the deficiencies of video we discussed earlier become inconsequential. If video works as part of the HTML then it will behave as such. CSS, for example, will operate on a video element in exactly the same way as it would for an image element, z-index and all. The DOM tree for the page will include the video with all of its properties as expected. And, crucially, events and Javascript hooks allow web developers with no special skills (such as ActionScript) to control the behaviour of videos. have provided a nice example of using video with CSS. If you are running FireFox 3.5 or later you can check it out by clicking on the image. have provided a nice example of using video with CSS. If you are running FireFox 3.5 or later you can check it out by clicking on the image.

But there is another – for me more interesting – feature of Ogg video (and, presumably, its alternatives): metadata. Now, metadata in video is nothing new, for sure, but having access to a video’s metadata as described above will lead to a whole new level of video media integration in webpages. The Ogg container, for example, supports a CMML (Continuous Media Markup Language) codec and, in a developmental state, Ogg Skeleton for storing metadata withing the Ogg container. Both of these formats facilitate timed metadata. In CMML one could define a clip in a video – say from 23 seconds into the movie up to 41 seconds in – and add a description, including keywords, etc, to that clip specifically. I will resist the temptation to go into a description of how Javascript listeners could be used to access that data but, in essence, the accessibility of the information to the web page containing it would allow a programmer to accomplish fantastic features with trivial techniques.

The most obvious example has to be for search. Being able to display a video from a specific point (where the preceding data may not be relevant) is not out of scope of the Flash based players but would be much easier to accomplish.

If we squeeze our imaginations a bit further, though, I think there is great potential for highly dynamic, potentially interactive sites to be based around video as the primary content. When demonstrating front-end templates for Nstein’s WCM I always pay particular attention to in-line, Wikipedia style, links which we create in a block of text using data derived from the TME (Text Mining Engine); in-line for text equates, with timed metadata, to in-flow for video. In the past video has, by and large, been limited to a supporting medium, a two minute clip to illustrate a point from the main article. With timed metadata this could be a thing of the past.

Imagine this: you have just searched for a particular term and been taken to a video of a lecture on the subject playing from 20 minutes through – the section relevant to your query. As the video is playing data is displayed alongside it, images relevant to the topic, definitions of terms, and as the video moves into new clips, with new timed meta data, the surrounding, supporting resources are changed to reflect – in-flow.

An example of using CSS3 with the video element from Mozilla.

An example of using CSS3 with the video element from Mozilla.

As people appear in films and episodes links could be offered to the character’s bio and the author’s home page. Travel programs could sit next to a mapping application (GoogleMaps, etc) showing the location of the presenter at the current time. There are huge opportunities with this kind of dynamic accompanying data to enrich video based content. And, of course, all of the data from a particular clip can integrate into the Semantic Web seamlessly. RDF links and TME generated relations could easily be used to automate the association of content to a particular clip of a video.

The downside? Well the biggest one as far as I can see is the time-frame. Most publishers are continuing to commit to, and develop, black box style video players due to the fact that no one – a few geeks, such as myself, excluded – use cutting edge browsers. But when HTML5 gets some momentum behind it from a web developer/consumer point of view the horizons for video will be burst open wide.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *