Asimov’s 4th law: A robot will not tweet.
Well, that might be a bit extreme. At least if they do they should put in a bit more effort.
Perhaps I need to explain my problem here. The complaint I have concerns automatic tweets – popular with bloggers and online publshers in general. Extremely unpersonal, often unhelpful clipits drawing the audiences attention to a new article or blog entry. Here’s an example:
[news] Pepsi drinkers join the dots: Anyone buying a Pepsi Max soft drink over the next few w.. http://tinyurl.com/5qu3w3
Ok, so it’s pretty obvious what’s wrong with this tweet. The article the Guardian Media is trying to promote is about a campaign by Pepsi which uses QR codes on the side of their cans – not that you’d have known from the tweet.
The problem is they’ve used a witty headline not a descriptive one. In itself that is fine. Like many online publishers, however, the Guardian have opted against manually tweeting and have integrated (presumably) their CMS with Twitter. More specifically, the tweet is a concatination of the articles title and the begining of the text. It just so happens that neither of those blocks of text mension QR codes.
There is a lot to be said for automation, though. It’s not just that this system saves the author of the article or blog time. It also ensures consistency – all articles get posted. And, to be fair, most of the time these posts are okay…
…not always though. Personally, I’ve stopped following the Guardian Media on twitter (and Scientific American) because these badly formed tweets annoy me way too much. Take the article above, for example. A human author might tweet something like this:
Pepsi launch campaign using QR codes on cans. Drinkers get access to secret content through phone browser.
That sums up the article much better, with 33 characters spare for the URL. I’d be far more likely to read the article having read that tweet, as I think QR codes are interesting (I’m a bit of a geek) and appreciate imaginative marketing.
So what’s the answer? Is there a way to achieve the normalization and efficiency of an automated system while being a good Twitterer? Well yes, I think there is.
I’ve been playing with the workflow engine in Nstein’s WCM and have written a nifty little Twitter-bot. It’s secret is it’s ability to understand content. Nstein also produce a text mining engine (TME) which is ingrained into the WCM right down to the core. This means that semantic data about an article is always easily accessible. I’ve used this automatically extracted meta data in two ways for my bot.
Firstly, I’ve made use of the TME’s concept and entity extraction features to create hash-tags. For those who don’t know, a hash-tag is a peice of meta-data associated to a tweet. They are prefixed with a hash (#) character and generally are alpha numeric. A lot of automated tweets now use hash-tags with vary degrees of success. @northamptonrfc (the rugby team I support), for example, tags all tweets with “#rugby”. Well I never. The correct use of hash-tags (IMHO) is to:
- Add relevant meta data to a tweet which adds meaning.
- Create a trend to follow (essencially a thread accross all Twitter users).
In order to meet those criteria the tag needs to be meaningful. It stands to reason. In the Pepsi example above two tags spring to mind: “#pepsi” and “#qrcode”. Including 2 spaces that makes an extra 15 characters which can (relatively) easily be fitted in before the TinyURL. Nstein’s TME would, undoubtedly, have picked these concepts out.
“QR Code” is what the TME refers to as a complex concept, that is, a phrase. “Pepsi” is an entity, specifically an organisation name. A simple regex can transform these strings into hash-tags. Using this technique the bot imediately adds a great deal of meaning to the tweet.
The second way in which I’ve leveraged the meta data extracted by the TME is using NSummarizer. This cartridge takes a document, splits it into sentence components, rates each component on its relevance to the article and returns the best scoring one(s) as a brief summary of the document. This is a really useful tool for getting around the issue of having a first sentence which is not (particularly) descriptive of the article as a whole.
So, does it work? Well I’ve used this blog as a test, here’s the resultant tweet:
I’ve made use of the TME’s concept and entity extraction features to create hash-tags. #tweet #nsteinswcm http://tinyurl.com/d3ozzn
Personally, I count that as a success.