Limiting account misuse

Account sharing for B2B publishers – part 2

So, you’ve identified that account sharing is a problem for your business. What do you do about it?

There are two distinct strategies here:

  1. You can actively make it harder to share accounts; or
  2. You can passively track the users who share their accounts, then use that data to sell more seats to the customer.

Active prevention

Prevention of account sharing also splits into two:

  1. Using authentication techniques that only the real account holder can log in with; or
  2. Limiting devices or sessions.

Authentication techniques

A quick crash course in authentication… When someone authenticates (logs in) they are presenting you with:

  • Something only they know (like a password, PIN, etc.);
  • Something they are (i.e. biometric data like a fingerprint); or
  • Something they possess (such as a phone, which can receive a token by SMS).
    • I would also include something they have access to (like an email inbox) in the last category.

The de facto (although it’s slowly changing) method for logging in to a Web site is a password. If you don’t want someone to access your online banking you set a strong password and keep it secret. But here’s the problem: if you do want to let someone else access your account (i.e. account sharing) then you can just set a weak password and tell them what it is. The responsibility for keeping others out is with the user not the vendor.

That problem is pretty hard to get around in the “something they know” family, because knowledge can be shared. If you want to actively prevent account sharing you need to leverage one of the other two systems.

Side note: this post is not particularly discussing 2-factor authentication (2FA) or multi-factor authentication (MFA) but, in a nutshell, those systems are asking for multiple authentication techniques. They do not necessarily state which techniques but it’s often a password + a token by SMS.

Authenticating someone using “something they are” is definitely an area to watch keenly. If you primarily deliver content through native apps it’s pretty easy to leverage in-device biometrics today. On the Web it’s harder as you are targeting a multitude of devices with different biometric capabilities. That said, standards are being created to work with password-less authentication, in particular the Credential Management API which allows you to use fingerprint sensors on Mac OS and Android (if the device supports them). If you want to invest early, then definitely look into these techniques and offer them for users who have a supported device. The big challenge is that you’ll probably need to support passwords too, for the foreseeable future, so a user who wants to share their account can just elect to use a password instead of their fingerprint.

So what about “something they possess”? There are a bunch of low-hanging fruit here. It’s often impractical to send a user a physical device (like an RSA key), so I’ll focus on things they already own like SIM cards (that receive SMS message), email inboxes and social media accounts. To leverage any of these authentication schemes you need to send a token using that channel that only the true owner can pick up; for social media this is codified as OAuth2.0 (it’s a similar effect, at least) but for the other two you’ll need an Identity Management tool that supports them.

The flow goes like this:

  • To log in, the user enters their email address or phone number
  • A single-use token is sent to them by email or SMS
  • They enter the token on-site or in-app to authenticate

Only the owner of the inbox or device has access to that token, so only they can log in. The user can still abuse this system by forwarding on the token but, crucially, they need to be an active participant in every single log in. Unlike password-based account sharing, where the user told their peer the password once and they then have access forever-forward, they will now need to forward on the token every time the piggyback user logs in.

If you use a “something you own” scheme, you can push account sharing further down the long tail by imposing a limit on concurrent sessions. Given that a user is only allowed to be logged in from one device at a time, and two users are trying to use your service, they will each need to log in every session – always by passing on a token from one to the other. That is such a jarring experience (and clearly tied to their illicit behaviour) that there’s a real drive to either not share or, if the value is high enough, to buy another licence.

Of course you can use this system with any means of delivery that the user would feel uncomfortable sharing. They would share a password – they wouldn’t give someone access to their inbox, their SMSs or their Facebook account.

Limiting devices

You can limit the number of devices that a customer can use to access your service by either setting a maximum number of concurrent session or by restricting the user to named devices.

Named devices are perfect for high value content. Typically, device management is used only with native apps not on the Web. Limitations of the Web mean that device’s “name” can only be pinned to a session, local storage or a device fingerprint. Sessions and local storage (unlike on a native app) are highly transient and becoming more so: cookie can be deleted, “incognito” session are used, corporations routinely clear data from their employee’s browsers.

How about pinning the “name” to a device fingerprint? That is an interesting approach but not perfect. The system works as follows:

  • A user authenticates using any scheme (let’s say a password)
  • In the background, a hash is created of static information we know about the device, like the operating system (OS), the browser, the screen size and maybe fonts and browser extensions
  • They are asked to name their device (“Work laptop”)
  • An identity management system stores the hash and name against the user
  • Subsequently, if the user logs in from a device with the same hash, they are considered to be using the “Work laptop”
  • If the user (or a piggyback user) logs in from a device with a different hash then they are asked to manage their devices.

There is a real problem with this approach, however… static data is so generic it’s not useful. Consider the OS. If a visitor users a Mac then that part of the hash is the same for every other mac user. You can’t really use data such as browser version because the “device” will change every time someone’s browser is updated: it’s not static data.

Building an effective fingerprint is hard, to the point that it cannot really work standalone. But this might be a useful technique to deploy in conjunction with other tactics, such as “something you have” authentication.

For the Web, the simplest way to limit devices is a concurrent session limit. This is similar to the flow for named devices but it’s completely opaque to the user. If you allow 2 concurrent sessions, and the user logs in from a third browser, then the oldest session in deleted. The user cannot choose which session is deleted, because they are unnamed. Again, this is not going to stop account misuse but it will require bad actors to log in more frequently, so could be coupled with a password-less authentication scheme.

Passive Tracking

It’s definitely useful leverage, when trying to negotiate an increase at renewal time, if you can categorically prove that the customer has bought 10 seats but 30 of their employees use the service. Be aware that you need users to abuse their accounts to get that benefit, so don’t rush block account sharing.

There are a number of metrics you can track to get an indicator of account sharing. All publishers of high-value B2B content absolutely need to report on concurrent sessions – that’s the low hanging fruit – but you can also look at unique IP addresses per user and other dimensions that are likely to be small for a real user. Building these kinds of report is usually pretty straight forward (and was covered in the previous post) but know that it will need to be based upon server log data or your IAM’s reporting if you want to see IP addresses.

The pay off

At Zephr, we have seen B2B publishers tackle the issue of account sharing head on. It’s really not that hard to go from it being trivially easy for a user to abuse their licence to it being quite difficult and explicitly fraudulent. We have seen huge revenue uplifts attributed to these drives and prove a bunch of tools for the publisher to leverage.

This is never going to be one-size-fits-all but being conscious of the cost of the problem is essential. And know that there are things you can do to deter this behaviour.

The cost of account sharing

Account sharing for B2B publishers – part 1

Account sharing is incredibly prevalent across the whole digital subscriptions space. Maybe you let a mate of yours use one of your Netflix family profiles. Perhaps there’s a SaaS tool you find really useful and you shared your login to it with a colleague because it was easier than getting a corporate licence. “Password123” anyone? Whatever the transgression, few of us can honestly deny we misuse our subscription accounts from time-to-time.

There are three parties in this situation. For this post I’ll refer to them as:

  • “the service” – which is the vendor offering the subscription;
  • “the subscriber” – who is the user that actually pays the subscription; and,
  • “the piggyback user” – who is using the subscriber’s account to access the service but never actually paid for the right to do so.

To begin to quantify the problem, somewhere between 30% and 40% of all subscribers to music streaming services share their login with at least one person, according to a survey of over 12,000 people carried out in 2017. That’s huge.

So, what’s the impact of account misuse? For B2Cs account sharing can be seen (if you squint) as a good thing. It’s word-of-mouth sales. The piggyback user is building the service into their habits and routines and, at some point, will probably need to actually sign up; if they don’t they weren’t getting value out of the service anyway.

That advantage doesn’t necessarily carry over to B2B, however. In general, a B2B subscription is for a substantially higher fee and the customer base is much smaller. A smaller market means that the word-of-mouth growth from piggyback users won’t have the scale to become viral (there are some obvious exceptions which are widely horizontal, like Slack). This is compounded as the cost of the piggyback user, the loss of potential revenue, is much greater.

I asked Tony Skeggs, CTIO of Private Equity International, for his view:

“[The cost of account misuse] is not an insignificant amount. Enough to justify setting up a claims team to track client behaviour and … claim back revenue.”

I’m sure that’s a position many B2B publishers share.

Now, every business is different and it’s dangerous to generalize (particularly with domain-focused B2B publishers) so there are two questions you should be asking yourself: “how big is the problem for me?” and “what can be done about it?”. The first question is the focus of this post, the second will be covered in part 2.

How big is the problem?

As with most difficult questions, this is multi-facetted. You need to know, at least:

  • How many users are regularly sharing their account?
  • What proportion of the piggyback users would otherwise convert into customers?
  • What would it have cost to identify the piggyback users as net-new leads?

The last point, in particular, draws attention to the balance here: account sharing is not bad in an absolute sense, it’s just probably responsible for an overall loss, in the case of most B2B publishers.

How many users are regularly sharing their account?

To the first question, it is tricky to accurately estimate the number of users who share accounts. However you can see tell-tale signs of account sharing if you have a decent BI tool in place or some access to server logs. This really needs a statistician to analyse – and is beyond the scope of this post- but if you want to properly understand how to identify account misuse you should read up on the Gaussian Mixture Model and similar clustering techniques.

In a nutshell, you will look for patterns in accounts that have an unusually high number of concurrent sessions or access the service from more IP addresses or geographies than normal users.

What proportion of piggyback users could be converted to customers?

This question is also non-trivial. You do know these people actually consume your product, though, so it’s a fairly safe bet that they would convert at least as well as an MQL – and you probably have that ratio already.

If you want to get  a better feel for this you are going to have to run some experiments. You’ll need to be creative and this will take investment. One tactic, for example, is to identify the 1000 accounts which use the most IP addresses then target them with a promotion to “invite a colleague and they get 1 month free”. If you can track the signups from that campaign you have a list of leads that are likely to have previously been piggyback users – then you just need to sell to them and track your success!

What would it have cost to identify the piggyback users as net-new leads?

This should be a fairly easy calculation, as if they weren’t piggyback users they would be marketing targets. What’s your marketing spend per MQL? This should be a known stat.

Overall cost of account sharing

Once you have answered the questions above you can have a good stab at calculating the cost of account sharing.

The lost potential revenue is the approximate number of piggyback users x the conversion rate x your ARPU.

The cost of realizing that revenue is your cost per MQL x the approximate number of piggyback users.

The difference is your potential return.

However, depending upon the capabilities of your tech stack, there might be more investment needed to get there. You would need to bake that into your model. The capabilities needed are explored in the next part but some solutions (like Zephr) provide many of these out-of-the-box, so you may not need additional investment.


Working out the business case for tackling account sharing might be difficult for you to take on or it might be so obvious you don’t even need to back it with data. The chances are that all B2B publishers (and all subscription businesses) suffer due to misuse of accounts to some degree of another.

There are a few steps you can take to limit account sharing and I’ll explore those in the next post in this series.

Steve Krug’s Ironic Law of Usability

Get rid of half of the words on each page, then get rid of half of what’s left – Steve Krug’s Third Law of Usability

So that’s “Remove three quarters of the words”, then?

Why I would hire a social media expert

Just a quick one.

I came across this post a couple of weeks ago (and just got round to following it up):

Basically, @petershankman is very much against the “Social Media Expert”. His argument is based upon social media being just one component of a good marketing stratergy – and not so alien or complex as to need a dedicated expert on your payroll:

Social media is just another facet of marketing and customer service. Say it with me. Repeat it until you know it by heart.

While I do see his point of view – and do agree that social media should be considered part of a broader marketing effort (even if it’s the main part) – I think he (and a staggering amount of commenters) have missed the point. It does sound like something a social media expert would say, but…. they don’t get it.

The point is this: social media isn’t just used as a soapbox to shout from. Unlike many other marketing channels, social media makes it incredibly easy to listen.

Let’s compare it to another “facet of marketing”, email campaigns. What do we learn when we send out a marketing email? Maybe we get an inbound lead – if we’re lucky. Then, we learn that the particular prospect who responded really liked our message. However, we’ve learned very little about all those who didn’t. And aren’t they the more important ones to listen to?

True, you could argue that potential customers might email you to tell you about their requirements and expectations and the preconceptions they have about you. I’m sure it happens. Not often, though.

Using social media, on the other hand, one just needs to do a quick search on Twitter and they can see potentially huge numbers of opinions expressed about them. Combine that information with some clever analytics and a marketing organization has a huge amount of usable and very valuable data to hand.

You could still argue: you don’t need an expert to search on Twitter, which is all Peter asserted. True, you don’t, but that’s only the most obvious use of social media outside of the normal scope of marketing channels. What about automating reactive messaging (as lots of big brands now do already)? What about feeding positive trends, being discussed about your competitors, to your R&D team? What about personalizing how a visiter sees a Web site based upon the publically available information about their likes and dislikes (as idio does)?

What if you don’t know how you could use social media to imprive your business stratergy? Well then, I’d say, you need to hire a social media expert.

Native apps – a necessary evil?

I recently saw a tweet quoting TBL, talking at Profiting From The New Web (which I’m very sad to have missed!), which went along the lines of: don’t develop apps, use open standards.

It’s a very interesting instruction. One I used to agree with…

Purely from a technical point of view it is difficult to argue with the logic. Being able to develop once and still create Web applications which work well on many devices – with different hardware features, screen sizes, etc – is very possible using the latest iterations of Web standards.

HTML5 allows developers to produce rich and interactive graphics and animations in their pages using the Canvas element. One can handle streaming media (fairly) effectively using it. Persistent client-side storage is even available, which can be used for offline applications, amongst other things. The very brilliant, and now widely supported, CSS level 3 specifications make it really easy to accommodate different devices and screen sizes by using Media Queries – an excellent example of that is (play with your browser window size or use your phone/tablet).

All of those technologies are very interesting in their own right but for the purpose of this post I won’t delve into the mucky details. If you are interested, or feel you need to brush up, I recommend following the excellent @DesignerDepot on Twitter.

The important point here, and I assume (which may be a dangerous thing!) the key point for the Open-Standards-over-native-apps argument, is that Web sites built using modern Web standards – running on a modern browser – have the potential to be just as feature rich as any native app.

That may not be quite true: to the best of my knowledge HTML/javascript does not support device specific features such as compasses and native buttons. So in that regard, apps have a slight edge. That said, I don’t think there are any huge gaps in the functionality available through the standard Web technologies. Certainly there are no insurmountable ones.

I do see one bit problem for the Web standards supporters, though. One big difference; one thing that Apple – and the others – offered which so far Web applications have failed to match. Micro-payments.

I’m sure there may be many people who disagree with that statement, but here me out.

Firstly, the importance of micro-payments. Whatever the difference in the functional capabilities of Web applications and native apps, Apple’s App Store essentially created an industry – or at least a sub industry. It was probably the first to market it’s applications as apps and certainly the most successful. 10 million apps have been downloaded and the App Store’s revenue in 2010 reached $5.2 billion. The median revenue per third party app is $8,700 – according to Wikipedia (and source) – and there are more than 300,000 apps available, equating to a $2.6 billion market for the app developers themselves. No doubt, then, this is big business. And (as big a fan as I am of FOSS) that kind of potential revenue has been the main driver in the success of the app movement, driven many software innovations in the name of competition and, most importantly, created an environment for entrepreneurs and developers to benefit financially from their creations. I would say – in my opinion unquestioningly – that the success of the App Store and its contemporaries is primarily down to a no-fuss micro-payment system where users feel safe and comfortable and not under pressure when parting with $3.64 (on average) of their hard earned cash.

You may argue that these micro-payment systems already exist on the Web; Amazon’s payment system – including 1-click – and Paypal could be seen as successful examples, amongst many others, no doubt. However, the argument which I’m contesting is not that Web-based systems are a better alternative to native apps but that open standards are… Amazon is no more open than Apple or Android. Google App’s – while very much Web based – doesn’t use a payment system which is compatible with any of it’s competitors or described in any specification (W3C or otherwise). In fact the W3C did look into this very issue in the 1990’s, although due to lack of uptake of micro-payments (how wrong they were) the working group was closed.

Anyway, to conclude this (now rambling) post…. it’s all well and good to plug open standards but, until the big issue of payments is resolved with an open standard, a healthy applications market – and hence the competitive pressures which have built pushed the boundaries of mobile software to what it is today – could not exist without closed systems and APIs.

Publishing from a Content Hub


Working as part of a sales team, one of the questions that I’m asked again and again – by my management as well as the Marketing department – is “who are your biggest competitors?” For a Web content management system or text analytic tool (Nstein’s WCM and TME respectively), that’s a fairly easy question to answer. In the DAM space, however, because of Nstein’s particular focus upon the Publishing industry the answer is less clear.

A simplified publishing workflow.
A simplified example of a publishing workflow.

Content Hub workflow

With assets stored in a central repository all systems and processes have direct access to them.

In fact, over the last couple of years Nstein has been positioning its DAM offering as a strategic centre-point for publishing workflows – Content Hub seems to be the prevailing (if slightly uninspired) label for this kind of system. Essentially, a Content Hub is a DAM with integration points so that all assets which come into the wider system (the company, publication, etc) are ingested straight into it; all content which is created internally is written directly into it; and then, all systems which utilize, display, edit or distribute content do so from the Hub directly. This is not a new model – it is sometimes referred to as a single version of the truth – however it often represents significant change and significant challenges in environments which have naturally developed around a (fairly) linear workflow. Magazines, in particular, as well as any breaking news publications, tend to have a from A to B style workflow which involves filtering incoming media, bring it together as a publication of some description and then publishing it out. By repositioning the processes and applications along such a workflow around a central Hub, dependencies and bottlenecks are broken down and assets, and access to them, become standardized. As a symptom of this shift, efficiency improves, asset re-use is encouraged and assets, their rights and usage information are better tracked. And by creating packages of content, independent of both source and output channel, features can be efficiently published on multiple channels (such as Print and Web) and new properties can be created cheaply with lower risk.

So, coming back to the original question, the DAM space doesn’t present that many competitors for Nstein (although there are, of course, a few) as few DAM systems have the out-of-the-box capabilities required by the vertical – handling extended metadata, transforming images, re-encoding video, printing contact sheets, managing page content, &c. In fact, the biggest competition in these cases comes squarely from Print Editorial System vendors who would, like us, endorse a Content Hub approach except with their CMS at the centre of the publishing universe.

In some ways both sets of vendors – DAM and Editorial System – are using the same arguments. One version of the truth, certainly. Single workflow and security. To some extent the multiple-channel publishing argument would also be used by both, certainly most Print Editorial Systems come with some option to publish a Web site as well.

These two approaches to the same Content Hub strategy raise a couple of key questions: what is the difference between the two solutions and how do those differences affect the buyer?

The former question is the simplest to answer: A DAM based Hub disassociates itself from the editing and creation of products whereas an Editorial System is strongly tied in to the production process. Take the creation of a newspaper, for example. The collaborative effort needed to construct a modern edition in an efficient and reliable manner relies heavily upon Editorial Systems to manage the agglomeration of the content and design in real time. The question is; should that System be the hub or a spoke?

How do these differences affect the buyer? What are the relative merits of the approaches? These questions are the ones which are being debated and rely upon strategic visions that the publisher may just not share. However, from my point of view, here are the main points.

On the plus side for the Editorial Systems, as they are so connected to the production process, they  can offer advanced and specific functionalities, tying in closely with DTP tools and offering collaborative working features which a DAM cannot compete with.

That strength, however, is also the biggest weakness for the Editorial Systems. By abstracting themselves from the production process the DAMs become far more agile. We can look at a fairly simple example of this in publishing the same content to both print and the Web, a process which should, by now, be a commodity. At its simplest this task should work smoothly in any Print Editorial System; text and images from a print feature are transformed into Web pages and published online. What happens, though, when other media is introduced? Most Print Editorial Systems that I have seen struggle to (or cannot) display and edit video. Maybe they can store them but the advanced features available for print content are gone, as are many simple features such as previewing and usage tracking. Now in many cases, the Print Editorial System may be coupled with a Web CMS (potentially from the same vendor) which does feature better handling of video but in that scenario there are now two production points. That means compromised security, more staff training, more convoluted audit trails. Then when you take audio, Software Flash, or any other format of content that the publisher may use – online or elsewhere – and the problem is magnified.

One solution for the Editorial Systems would be to develop the extra functionality required to handle these formats with the same level of functionalities as the print content which they are familiar with. The obvious problem with that is the effort and available resources required to build and maintain such a suite. So by steering clear of the production process the DAM based systems can handle content in a channel-ambiguous fashion.

Particularly when one looks at the creativity in digital media these days, the strength of agility should be clear. There are the obvious ones: Facebook apps, QR codes, iPad channels, etc. There are also some less well adopted mediums.

In October 2008 Hearst released a special edition Esquire (sponsored by Ford) featuring an e-ink, animated front-cover. Bauer last week released an issue of Grazia featuring Florence (and the Machine) dancing in an augmented reality world activated by pointing your webcam/iPhone at the cover. While this was pretty disappointing in comparison with many other AR examples (such as the great GE ones) due to the fact that the real page was not displayed – more on that in a future post. While neither of those examples where particularly well implemented they definitely show signs of what could become mainstream technologies in the future. The question about adding the functionality to manage the production of publications including these kinds of technologies into Editorial Systems is a far-fetched one. Not only is the investment significant and the road to maturity slow but if a technology ultimately fails to gain mainstream accessibility the investment becomes a wasted one. For that reason companies that rely upon an Editorial System at the core of their business have to wait until new technologies reach general acceptance to embrace them and lose the ability to stay ahead of the curve – at least without excessive risk. In those cases, as with more mundane ones, the channel ambiguous and content ambiguous DAM systems project their flexibility directly on to the publications which use them.

That’s not to say that there are not downsides to using the DAM as the Hub. In particular, collaborative working cannot be handled to the depth that the Editorial Systems manage without their level of detail and understanding of the specifics. And in both cases there are overlaps in functionality; most Editorial Systems have some kind of repository, for example, and many top tier DAM systems integrate well with DTP tools.

Inevitably, those two questions, drive towards the ultimate conclusion of the debate: “Which would make a better Content Hub, an Editorial System or a DAM?” I won’t attempt to answer that directly as I’m obviously biased towards the solution I sell and know the most about but will encourage debate from those who have an opinion…

The future of video on the web

I’m getting rather excited about video media online. We’re on the cusp of a revolution in the way we produce and consume the medium.

I was working on a project recently which involved video content. It struck me that, although we have come on no end in terms of our ability to distribute video over the web in the last half decade, video content still has huge holes in the orthodox functionalities of more established media.

Most obviously, there is the dependency upon external codecs (i.e. not native to the browser). The solution to which, in the most case, is a Flash player. There are numerous Flash players available freely and cheaply on the web; they can usually play most of the common video types and depend only upon a single plugin, Flash. YouTube is probably the best known example of using Flash to play videos.

This approach creates problems all of it’s own, though:

  • Flash players still have a dependency upon a browser plugin.
  • The binary video – the original file – is not transparently available in the way that images and text are.
  • Flash does not always cohere with de facto web standards: you cannot apply CSS to Flash, it does not respect z-indexes of objects (ever seen a drop-down menu disappear underneath a Flash component?).
  • It does not have a full set of properties directly accessible for the content it wraps, as a other elements in a pages DOM do.

Don’t get me wrong, Flash has it’s place in the modern web. It is a fantastic platform for RIAs and rich, animated and interactive components of web sites. However as far as video presentation goes it is, essentially, a hack.

These drawbacks for video (and, in fact, audio) presentation, manipulation and playback have not gone unnoticed. One of the most important changes for HTML5 – first drafted back in January 2008 – is the handling of these mediums with the <video> and <audio> tags, now supported in both Gecko and Webkit.

The initial specifications for HTML5 recommended the lossy Ogg codecs for audio and video:

“User agents should support Ogg Theora video and Ogg Vorbis audio, as well as the Ogg container format”

The reasoning behind this drive for a single format seems obvious enough. Going-it-alone doesn’t really work as far as web standards are concerned (does it IE?). There were, however, some objections as to the choice of codec, namely from Apple and Nokia. The details of the complaints are not really relevant to this article but can be read in more detail on the Wikipedia page, Ogg controversy. At the end of the day it doesn’t really matter which format is used as long as it is consistent with the requirements of the W3C specifications; for this article I am going to assume that the Ogg codecs and container will be standard.

So, now that we have browsers (Firefor 3.5, Safari 3.1) which support the <video> tab and have native Ogg Coder/Decoders (At least FireFox) all of the deficiencies of video we discussed earlier become inconsequential. If video works as part of the HTML then it will behave as such. CSS, for example, will operate on a video element in exactly the same way as it would for an image element, z-index and all. The DOM tree for the page will include the video with all of its properties as expected. And, crucially, events and Javascript hooks allow web developers with no special skills (such as ActionScript) to control the behaviour of videos. have provided a nice example of using video with CSS. If you are running FireFox 3.5 or later you can check it out by clicking on the image. have provided a nice example of using video with CSS. If you are running FireFox 3.5 or later you can check it out by clicking on the image.

But there is another – for me more interesting – feature of Ogg video (and, presumably, its alternatives): metadata. Now, metadata in video is nothing new, for sure, but having access to a video’s metadata as described above will lead to a whole new level of video media integration in webpages. The Ogg container, for example, supports a CMML (Continuous Media Markup Language) codec and, in a developmental state, Ogg Skeleton for storing metadata withing the Ogg container. Both of these formats facilitate timed metadata. In CMML one could define a clip in a video – say from 23 seconds into the movie up to 41 seconds in – and add a description, including keywords, etc, to that clip specifically. I will resist the temptation to go into a description of how Javascript listeners could be used to access that data but, in essence, the accessibility of the information to the web page containing it would allow a programmer to accomplish fantastic features with trivial techniques.

The most obvious example has to be for search. Being able to display a video from a specific point (where the preceding data may not be relevant) is not out of scope of the Flash based players but would be much easier to accomplish.

If we squeeze our imaginations a bit further, though, I think there is great potential for highly dynamic, potentially interactive sites to be based around video as the primary content. When demonstrating front-end templates for Nstein’s WCM I always pay particular attention to in-line, Wikipedia style, links which we create in a block of text using data derived from the TME (Text Mining Engine); in-line for text equates, with timed metadata, to in-flow for video. In the past video has, by and large, been limited to a supporting medium, a two minute clip to illustrate a point from the main article. With timed metadata this could be a thing of the past.

Imagine this: you have just searched for a particular term and been taken to a video of a lecture on the subject playing from 20 minutes through – the section relevant to your query. As the video is playing data is displayed alongside it, images relevant to the topic, definitions of terms, and as the video moves into new clips, with new timed meta data, the surrounding, supporting resources are changed to reflect – in-flow.

An example of using CSS3 with the video element from Mozilla.

An example of using CSS3 with the video element from Mozilla.

As people appear in films and episodes links could be offered to the character’s bio and the author’s home page. Travel programs could sit next to a mapping application (GoogleMaps, etc) showing the location of the presenter at the current time. There are huge opportunities with this kind of dynamic accompanying data to enrich video based content. And, of course, all of the data from a particular clip can integrate into the Semantic Web seamlessly. RDF links and TME generated relations could easily be used to automate the association of content to a particular clip of a video.

The downside? Well the biggest one as far as I can see is the time-frame. Most publishers are continuing to commit to, and develop, black box style video players due to the fact that no one – a few geeks, such as myself, excluded – use cutting edge browsers. But when HTML5 gets some momentum behind it from a web developer/consumer point of view the horizons for video will be burst open wide.

Brand: the new pretender

Content is king, is it? Well maybe. There’s no getting away from the fact that good quality content drives traffic. But in the struggling publishing industry, with waning advertising revenues, we might have to conclude that the current approach to web publishing is just not working.

That’s not to say there aren’t exceptions. Julian Sambles (@juliansambles), head of audience development at the Telegraph Media Group, talked at the resent ePublishing forum on his success in terms of SEO and bringing new audiences to the Telegraph site. No doubt other publishers have had similar successes. However there are problems associated with that kind of drive for SEO – not least because it is a very expensive process in a climate where large budgets are scarce. But, for me, I have more important reservations about focusing heavily on search engine optimised content.

Firstly, there is the issue of editorial integrity. If content was truly king then its quality would be the single most important factor in growing (and keeping) an on-line audience. For a lot of publishers  content isn’t king though – search is. In that scenario a publisher is not controlling how it’s content is consumed, or in what order. They will, undoubtedly, find that their political and social stances are watered down as well, as traffic heads more to soft news and opinion. In circumstances like these the focus actually moves away from the content and towards how the content is structured – the role of the publisher gets closer to that of an aggregator.

The next problem with relying on search engines to supply ones’ on-line audience is inherent: the consumer is researching not discovering (@matt_hero‘s search trilogy is, loosely, relevant here). I seriously doubt Google is inundated with searches for the word “news”. Perhaps terms like “football results” are more common but still not that frequent. If a visitor arrives at a site from a search engine it is fairly safe to assume they fall into one of two categories:

  1. They’ve already read the news elsewhere, first.
  2. An aggregator has presented them with summaries and the content suppliers only get a hit (and, hence, revenue) for the stories they are really interested in.

Of course, if that visitor then stays on the site – or book marks it even – then great. Of course search engine optimisation creates new users and they can become regular visitors. The problem is that without a strong brand the proportion of stray surfers who end up on a content producers site to those which are converted into frequent readers is much smaller.

The prevailing opinion these days is that the fickleness of consumers comfortable with search is inescapible; that hitting the top spot on Google is overwhelmingly the best way to drive traffic. I just can’t believe that. Certainly that sentiment doesn’t apply to me. I’m quite modern in my consumption of the news: I almost never buy a physical paper any more. But that doesn’t mean I don’t appreciate the editorial “package”, as Drew Broomhall (@drewbroomhall), search editor for the Times, described the journey a (print) newspaper reader is guided through. Every morning I embark on such a journey, lead (very ridgedly) by the BBC’s mobile site. And, while monetizing mobile content is harder than on traditional web pages, that builds a very strong brand loyalty for me. If I read any news at work, or explore in more depth a story I read that morning, it’s always on the BBC news site.

So I would argue that the readers experience – the editorial journey – is far from a thing of the past and, in fact, is as important now as it ever was for print media. There is no need to limit that experience to mobile channels, either. There are a wealth of frameworks available for producing widgets and apps on all kinds of platforms. Another talk at the ePublishing forum, by  Jonathan Allen (@jc1000000), explored in more depth how to take advantage of these output channels. iGoogle widgets, iPhone apps, Facebook applications are all great examples.

This approach not only allows publishers more of the editorial control which they had in producing print media (and lost to the search engine) but also creates a better user experience. Focused distribution channels for on-the-rails feeds can give a consumer the feeling that a publisher is doing something for them. With news being such a commodity in the on-line world these channels add real value for the audience. And if there is value for the audience, they will promote that content themselves. Creating, for example, a widget for an iGoogle user’s homepage, which displays featured articles, engages them (and presents a link back to the original content) before they have even done a search.

We see this kind of, selected content, approach commonly in the form of RSS feeds (although, too often as “latest” not greatest). Widgets and apps aren’t really doing anything different, rather they are making the stream more accessible, more user friendly. There’s another attraction to widgets and apps over RSS feeds, though – a point from Jonathan’s talk which almost makes these channels a no-brainer – they really help to boost the main document’s search engine ranking. So contrary to being an alternative to SEO widgets help drive traffic both ways.

You can take this one step further and allow the audience to define their own paths through content. As semantic understanding becomes more and more achievable, through tools such as Nstein‘s Text Mining Engine (TME) and the dawning of an RDF bases semantic web, publishers will be able to offer dynamic widgets with content ordered by an editorial team and filtered by a user. The iGoogle widget described above could easily be filtered for a Formula One fan based upon data from the TME to create a custom feed of stories they are interested in. Or if a consumer enjoys the “package” they can take the unfiltered list.

No silver bullet for publishers struggling in the migration to the web, for sure, but thinking about how content is offered as a package is a strong, and often underused, way of strenthening a brand and driving traffic. As always, IMHO…

Open Source v traditional Software (ding, ding, ding)

At the tail end of last month I spent two days attending talks at the yearly Internet World exhibition. I always enjoy listening to speakers and the quality was, by and large, very good. On the final day CMS Watch (@cmswatch) hosted a panel discussion in the Content Management theatre entitled: “Open Source v Traditional Software”. It’s was a strange title, I thought, as the line, for many vendors, between open and closed source becomes more and more vague. This blending was, however, represented in the panel, which included Stephen Morgan (@stephen_morgan) of Squiz – a commercial open source vendor.

On the whole the panel was very good and the debate interesting. The open source contingent argued eloquently  the pros of spreading knowledge throughout the community and of the response times to bug fixes compared with the release cycles of proprietary software. One of Stephen’s responses when asked for reasons to go with an open source system, however, struck me as – at best – ill conceived.

Stephen had argued that as a customer of a closed source software retailer you fall, entirely, to their mercy in terms of functional changes. The assertion was that when you – as a customer – have access to source code you can modify it to suit your needs. Conversely, he claimed that changes to a closed source solution could only be requested, may never happen and would be subject to a lengthy release cycle even if they were implemented.

Now I’m sorry but that is just not the case; as I told the panel once the discussion was opened to the audience. The software I work with, Nstein’s WCM, features an expansive and  well designed extension framework to do just what Stephen was referring to. In fact, I went further and put the polemic to the panel that hacking core source code is obviously not desirable and severely hinders an applications upgrade path. Stephen’s countered with the fact that changes made to the code-base can be submitted to Squiz (or almost any other open source software maintainer, for that matter) and may be committed into the core application.

Before I start a holy war here (and a succession of flames in this sites comments) I would like to state my position on open source: I love it. I love the concept. I love free software. I love the freedom to modify and distribute software. Basically, I get it. I’m a huge fan of Linux and at the end of the day a PHP programmer. Just yesterday, I spent my Saturday contributing PHPTs (that’s PHP tests, for non-geeks) with the PHP London user group. I really do dig open source. Also, for the record, I thought Stephen Morgan represented his brand and community very well and I enjoyed his commentary; this is not meant to be a personal attack 😉 .

In fact, this post is not criticizing open source software at all. The discussion here, as far as I am concerned is about best practices. Okay, sure, one can modify the source code to an open source project and that change may be incorporated into the software. May be incorporated; probably won’t be. And with closed source software that option is not available – you have less choice. But that is, I think, a good thing.

At least the prelude to a good thing. Software evolves, like all technology, and the beautiful simplicity of Darwinian evolution applies. It’s survival of the fittest. If we, at Nstein, were to compete with open source CMS projects with a solution which was not customisable, which had no mechanism for modification we would have died out. The fact is we make a vast amount of customisation possible – we’ve had to. Because we don’t encourage customers to delve into the core source code (it’s a PHP app so they can if they really want) we’ve had to employ other methods. Extensible object models built around best practices derived from industry experience. Plug-in frameworks. Generic extension frameworks. If one of our customers cannot extend or change something that they need to the chances out that another client will at some point want that same, absent flexibility. So, through good design practices we have constructed a system which clients can (and do) modify, yet when they decide to upgrade to the next point release it is a trivial process.

Now, I’m not saying that open source software is poorly designed. I’m writing this piece now on WordPress – a fantastic example of an open source project – which features an extremely rich and well documented plug-in framework. The sheer number of plug-ins and themes available for WorldPress is a testament to the system. And, as with Nstein’s software, when I upgrade WordPress all of my extensions still work (at least 95%, or more, of the time).

I doubt anyone would disagree with the merits of a plug-in based system. My interest, however, is in this question: how much of a temptation is there to hack open source software? I know I’ve done it in the past. I’ve heard a number of times that Drupal upgrades are nigh on impossible due to the nature of the inevitible customisations a Web content management system requires. I’m not in a position to answer that question authorititively, and I won’t attempt to. I would like to stir the debate up though. So, thoughts, please….

Creating compelling content in the Web 5.0 world

Whoa, there. Web 5.0?

Okay, so I made up web 5.0. Actually, I detest the numbered generations we’ve applied to the web. The main problem I have with these terms is that they imply a linear progression. They suggest that we are going to abandon the interactive web, Web 2.0, for the semantic web, Web 3.0. Obviously we aren’t. I doubt anyone would even suggest it. Web developers will continue to use both. Hence Web 5.0 (do the maths).

I’m going to drop the term now – it was just a joke. The modern World Wide Web is, in fact, much more than just the three so-called generations – although clearly they are very important. I can identify three main concepts (not technologies) which are facilitating the current evolution of the web:

  • Interactivity (2.0)
  • Semantic understanding (3.0)
  • Commoditization (the Cloud)

Nothing ground breaking there. And we, as users, are certainly seeing more and more of these big three in our daily use of the web.

Interactivity is fairly obvious. I think the biggest revolution in interactive content came about as Wikipedia took off. Undoubtedly the most expansive (centralized) base of knowledge the world has ever seen – and written by volunteers, members of the public. It really is a staggering collaborative achievement. Then there’s blogging, micro-blogging, social networking, professional networking, content discovery (digg, etc), pretty much anything you might want to contribute, you can.

Semantic understanding is a little trickier to see. That’s hardly suprising as it is so much newer and far less understood. Believe the hype, though. The sematic web is coming and it will change everything (everything web related, that is). If you don’t believe me try googling for “net income IBM”. You should see something like this:

Google results using RDF infoThat top result is special. It’s special because it’s the answer; it’s what you were looking for. No need to trawl through ten irrelevant pages to find the data – it’s just there. Google managed to display this data because IBM published it as part of an RDF document. If you search for the same information about Amazon – who don’t, no such luck. (That particular example was given by Ellis Mannoia in a great Web 3.0 talk at Internet World this week – so thanks Ellis.)

That leaves us with commoditization. Specifically, the commoditization of functionality from a developers point of view. This concept is largely, although not exclusively, linked to the Cloud. The term “the Cloud” is used broadly to describe services make avalible over the internet. GMail, for example, is email functionality in the cloud. Users don’t need to install anything to use GMail (bar a web client) they just use it when they want, from any computer. Many of the Cloud services out there are available as APIs, and that leads to the commoditization of functionality. Say I want to add a mapping application to my web site to show my audience where I am. A few years ago that would have been a significant amount of development work. These days it’s trivial – you just make a call to the GoogleMaps API. And so map functionalities become a commodity.

The point of this post, however, is that these are not mutually exclusive concepts. There is no reason why you cannot combine semantic understanding with Cloud computing, or UGC, or both. Quite the opposite: combining the three should be the goal.

There are problems, however. Utilizing Cloud computing requires a certain amount of adherence to standards – fitting in to an API. And semantic understanding (and meta data, in general) takes time to accrue. In general those two constraints don’t work well with Web 2.0 functionality.

Let me give an example: If a user contributes a comment to an article they probably won’t take the time to add the meta data required for semantic understanding to be achieved. In the same way if they don’t give their location you can’t show them as a pin on GoogleMaps.

However semantic understanding is (IMHO) more than just the use of RDF documents. Tools like Nstein’s Text Mining Engine can be used to create a semantic footprint describing a piece of text. I’ve talked, in previous posts, about using the data gleaned by the TME in imaginative and experimental ways. Take the example above. If a user were to post a comment about a talk they attended the TME could extract, not only the concepts of the comment, but also data like the location of the subject. That semantic understanding can be used to programatically call the GoogleMaps API to add a new pin in your map.

And there you have it. Semantic understanding of interactive content used to harness the power of Cloud computing. One of the most important benefits of the TME, for me, is the flexibility it affords you. If you know that you can get access to that kind on information it opens up all kinds of possibilities. Exploring some of these possibilities has to be the focus for making a brand stand out against the plethora of content suppliers and aggregators available; for improving the users experience and gaining their loyalty.

So it’s time to stop thinking about Web 2.0 or Web 3.0 and start thinking about the technology and techniques available and how they can be used to the greatest effect.

About me

I’m an entrepreneur and technologist. I’m passionate about building SaaS products that provide real value, solving hard problems, but are easy to pick up and scale massively.

I’m the technical co-founder of a venture-backed start-up, Zephr. We have built the worlds most exciting CDN which delivers dynamic content to billions of our customer’s visitors, executing live, real-time decisions for every page.