Posts Tagged ‘Open Linked Data’

Open Data Strategies and News Media – update

July 24, 2010 Leave a comment

Last year, I had several posts around Open Data strategies – focusing specifically on News Media organizations. I’d like to provide an update. Actually, this post is a compilation of a collection of e-mails, so hopefully it come together in some coherent manner.

Data-driven Journalism

The collection of posts began with a questions as to whether data-driven journalism should be considered a future strategic capability of news media organizations?

The question was prompted by a post from from zero hedge: Another Massively Interactive European Chart, which referenced an interactive chart published by the Economist. It reminded me again of the power of Info-graphics to “enlighten and explain”.

For additional articles on data-driven journalism, see the following:

The Bigger Picture – Open Data

I then briefly explored the importance of Open Data, a capability that would offer strong material for data-driven journalism. I provided the following links:

Also of interest is The Guardian’s strong advocacy for opening up public data sources, in part to put to the service of journalism.

Linked Data – Technological foundation for Open Data on the Web

The following e-mail provided some context for the W3C’s Linked Data initiative. In particular, it provided links to thoughts from Martin Belam, the Chief Information Architect at The Guardian, on how Linked Data will affect the future of News organizations. These links are provided below:

There’s also a very interesting presentation from the News Linked Data Summit in February 2010, where a presentation was given titled News Media Metadata – The Current Landscape. It would be nice to have the video to go with this presentation, but there some great content in the slide deck.

On the topics of semantics, here’s ReadWriteWeb’s archived articles from SemTech 2010 if anyone is interested. Facebook and Google both had a strong presence at this year’s Semtech conference.

Government and Community Open Data Initiatives

A third e-mail followed discussed some of the current movements by various level of government – from countries to municipalities – to freely open up their data to the public.

Here’s an interesting link announcing the pending formal UK government launch of their Open Data initiative, prompted by Tim Berners-Lee. And here’s The Guardian’s announcement of the launch the following day, with a video clip of Sir Tim himself. As The Guardian’s Martin Belam comments in a post days after the announcement, “We now know that, whatever the outcome of the next election, we are only going to see more Government and state gathered data published, not less. So how, as the news industry, are we going to respond to this, and what does the digital news media look like in a world with a high level of semantic state data available?”

The UK Government is a pioneer here for sure, but it’s a trend that many are already promoting in Canada. This represents a real opportunity, IMO, for journalism – as Belam strongly advocates for – for helping people make sense out of government data, to illuminate the broader patterns and relevance to peoples’ lives, and to host discussion around important “topics that matter”. Note the list of Canadian municipalities in this wiki page that are moving ahead full steam with Open Data initiatives. See the following articles for Toronto, Ottawa, Vancouver, Edmonton, and Calgary. And here’s a recent Forrester blog post on the topic.

And that’s about that. 🙂



Post-relational Data Representations

October 11, 2009 1 comment

In a previous blog post, I made the comment:

I’ll be posting further on the Semantic Web in the coming weeks, and I’ll explore both how graph-like data representation differs from traditional relational modeling, and the benefits such a representation provides over more traditional data modeling approaches.

This post briefly elaborates on this topic by exploring two examples of post-relational data representations.

Key-Value Data Stores

The ReadWriteWeb had an interesting article from February 2009 titled
Is the Relational Database Doomed? If I understand this correctly, the issue here is basically indexing vast amounts of items indexed by a key – for example, documents on the Web.

This data management strategy is the norm for massively scalable indexing requirements. The ReadWriteWeb article discusses key-value data stores in the content of Cloud Computing.

*** Update 1 (11/09)
Interesting comment (comment #2) in the ReadWriteWeb article. Here it is:

There is also a new crop of databases called “graph databases” gaining traction (with a model based on nodes, relationships and properties), one of them being the open-source neo4j (

Using graphs to structure information is very powerful and intuitive.

Exactly! See the section below. Also check out the remaining Comments associated with the ReadWriteWeb article. Fantastic discussion!
*** End Update 1

Graph-like Data Representations

Graph-based data representations (for example, RDF) provide a interesting contrast to traditional relational data modeling approaches, and are critical to the vision of the Semantic Web. Here are some of the key differences between graph-based and relational data representations:

  • “Triple” as the key data contruct – Graph-like data representations (for example, RDF) treat metadata and data the exact same way. Both metadata and data are expressed as a “triple” – a subject-predicate-object relation. The entire graph is nothing but a collection of these “triple” statements.
  • Triples are composed to build the Graph – The database concept of a “join” is accomplished through the flexible “composition” of triple statements. This would appear to be a much more flexible way to “compose” semantic structures dynamically, across multiple disparate data sources with different data representations/semantics.
  • Metadata IS Data – Data and Metadata about “concepts” are both expressed in the same manner – as Triple statements. This provides an extremely flexible and scalable representation of knowledge representation (i.e. data + semantics), because a new data element or metadata dimension can be added by simply adding another triple to the data store.

The above being said, relational data representations still have some important advantages where the data schema is well-known and relatively static. They also tend to be a good choice for transactional systems where the schema for key entities is, again, well-known and non-volatile. However, the schemas are more brittle and less malleable and composable compared to graph-based data representations.

*** Update 2 (11/09)
Here’s another key point about graphs, in contrast to tree-like data representations such as XML, from the Semantic Web Programming book (p. 72):

Graphs do not have roots. Some other representations, for example XML, are tree based. In an XML document, the root element of the tree has a special significance because all the other elements are oriented with respect to the document root. When trying to merge two trees, it can be difficult to determine what the root node should be because the structure of the tree is so important to the overall significance of the data. In an RDF graph, by contrast, no single resource is of any inherent significance as compared to any other.

*** End Update 2

Linked Data Initiative and RDF

A powerful example of the impact of RDF, graph-like data representations is the W3C’s Linked Data initiative. The Linked Data initiative, spearheaded by the Web’s founder Tim Berners-Lee, is an initiative to put data on the Web using URIs and RDF. I’ve blogged about Linked Data in a previous post.

BTW, I love this quote from Tim Berners-Lee in response to a question on how Linked Data relates to the Semantic Web:

“Linked Data is the Semantic Web done as it should be. It is the Web done as it should be.”

The above quote was taken from this article from 2008.

Triple Stories – Key-Value Data Stores for the Semantic Web

Interestingly, a Triple Store is a Key-Value data store purpose-built to manage RDF Triples. Basically, a Triple Store is the Semantic Web’s version of an RDBMS.

In conclusion …

I’ll be commenting more on graph-based data representations in the coming weeks, as well as foundational Semantic Web standards such as RDF and OWL.


Programming the Semantic Web

September 19, 2009 Leave a comment

The Semantic Web, and related technologies, looms large on the horizon. It’s business impact will be felt on all companies that in some ways organization information – that is, everyone.

I have blogged before about Tim Berner-Lee’s Open Data initiative, which leverages Semantic Web technologies. This post is for Programmers who are looking to understand how to develop software that leverage Semantic Web technologies.

Programming the SemWeb

There are a number of Semantic Web books out on the market that discuss Semantic Web technologies from a Researcher’s, or academic’s, point-of-view. My favorite is Semantic Web for the Working Ontologist.

However, until recently, there have been relatively few books that discuss the Semantic Web from a software development point-of-view.

The first books I came across to address the topic were not exactly Semantic Web focused, but instead were on the related topic of Collective Intelligence – see Programming Collective Intelligence and Collective Intelligence in Action. These books focus on developing capabilities such as Product Rating Engines (like you find in Amazon), Clustering algorithms, and Web Search algorithms.

Recently, however, a couple books have appeared on the market focusing specifically on developing software using Semantic Web technologies (e.g. RDF, OWL, SPARQL, etc.) – specificaly Semantic Web Programming and Programming the Semantic Web. One of the authors of these books, Toby Segaran, also has an interesting book out lately called Beautiful Data: The Stories Behind Elegant Data Solutions. I believe this book focuses on the “data” side of the Semantic Web. The books “in the mail”, so I’ll find out soon enough.

In Summary …

The Semantic Web, as mentioned above, looms large on the horizon of the future Web. If you’re a developer, now’s a great time to begin experimenting with the technologies.


The Guardian’s Open API strategy

September 18, 2009 1 comment

The Guardian, IMO, has a very forward-looking strategy around Open Data. Please see my previous related post on this topic.

This post is going to explore some of the core underpinnings of the Guardian’s Open Data strategy.

The Guardian’s Open Platform Strategy

In March of this year The Guardian officially launched its Open Platform strategy. It’s a very forward-looking strategy IMO, and has been generally applauded.

Here’s a link explaining what the Guardian’s Open Platform is all about. Effectively, it opens up the Guardian’s content “to the world”, and to developers, as a platform upon which to develop appliactions and services … in an application style this is called a “mashup” application.

The Content API and the Data Store

There are two key components to The Guardian’s Open Platform: (i) the Content API, and (ii) the Data Store.

The Content API is a mechanism for progamatically accessing Guardian content. You can query the Guardian’s content database for articles and get them back in formats that are geared toward integration with other internet applications.

The Data Store is a VERY cool product. It is a collection of important and high quality data sets curated by Guardian journalists. You can find useful data here, download it, and integrate it with other internet applications.

The Data Store and Database-driven Journalism

The Guardian’s Data Store is a brilliant enabler of database-driven journalism. Adrian Holovaty of Everyblock is probably the leading proponent of this movement, and I’m sure he’d be a big fan of The Guardian’s Data Store.

For a wonderful example of the power of The Guardian’s Data Store, and the mashup-friendly services that the product enables, check out this wonderful blog post by The Guardian’s Martin Belam describing the Data Store’s role in a scandal that arose in Great Britain this summer around MP expenses, and his discussion of the contrasting “open” and “closed” models of 21st-century journalism. It’s a great read.

All for now.


Linked Data and the future of Journalism

September 17, 2009 1 comment

So I have a passionate interest in the Tim Berner-Lee and the W3C’s Linked Data initiative, and have blogged about the topic before.

While I was checking up on Martin Belam’s latest posts, these two popped up:

  1. Linked Data and the future of journalism – part 1, and
  2. Linked Data and the future of journalism – part 2

This may not be everyone’s cup of tea, but Linked Data and the Semantic Web are going to be increasingly hot topics over the next several years IMO.


Open Linked Data

August 23, 2009 1 comment

Introduction to Open Linked Data

A very interesting groundswell is forming around the desire for opening up data to the web, and making it available for all to link to and share. This movement goes by various names including Open Data, Linked Data, and Opening Linked Data.

Open Linked Data leverages technologies inherent in the Semantic Web – specifically RDF.

Here are some interesting articles on the topic:

To get a sense of how data is being opened to and linked in the Web of Data, here is a visual from the W3C from March 2009:


OK, so how does this relate to Journalism?

This past week, MSNBC accounced that they had acquired Everyblock. To learn more about Everyblock, and there strategic position in the HyperLocal space, visit there site here. The Guardian is also moving fast in the data-driven journalism space – see here, and here, and here.

Lamenting on the Newspapers’ failure to act on the strategic importance of Everyblock, Alan Mutter has this to say.

What Everyblock and the Guardian are fast engaging is what is sometimes referred to as Database Journalism. For further insight and discussion into the role of Database Journalism, see:

Other Open Data initiatives

The Open Data movement is also pressing Governments to open up their data. See:

All for now,