Open Linked Data

Open Data Strategies and News Media – update

July 24, 2010 glennas Leave a comment

Last year, I had several posts around Open Data strategies – focusing specifically on News Media organizations. I’d like to provide an update. Actually, this post is a compilation of a collection of e-mails, so hopefully it come together in some coherent manner.

Data-driven Journalism

The collection of posts began with a questions as to whether data-driven journalism should be considered a future strategic capability of news media organizations?

The question was prompted by a post from from zero hedge: Another Massively Interactive European Chart, which referenced an interactive chart published by the Economist. It reminded me again of the power of Info-graphics to “enlighten and explain”.

For additional articles on data-driven journalism, see the following:

A fundamental way newspaper sites need to change
Journalism Needs Data in 21st Century
MPs expenses, The Telegraph, The Guardian, and the ‘open’ and ‘closed’ models of 21st century journalism – a particular interesting example of data-driven journalism (and open APIs) from the Guardian

The Bigger Picture – Open Data

I then briefly explored the importance of Open Data, a capability that would offer strong material for data-driven journalism. I provided the following links:

Also of interest is The Guardian’s strong advocacy for opening up public data sources, in part to put to the service of journalism.

Linked Data – Technological foundation for Open Data on the Web

The following e-mail provided some context for the W3C’s Linked Data initiative. In particular, it provided links to thoughts from Martin Belam, the Chief Information Architect at The Guardian, on how Linked Data will affect the future of News organizations. These links are provided below:

What is the value of Linked Data to the News industry? – February 2010
Podcast with Martin Belam on Linked Data and News – February 2010
A history of linked data at the BBC – February 2010

There’s also a very interesting presentation from the News Linked Data Summit in February 2010, where a presentation was given titled News Media Metadata – The Current Landscape. It would be nice to have the video to go with this presentation, but there some great content in the slide deck.

On the topics of semantics, here’s ReadWriteWeb’s archived articles from SemTech 2010 if anyone is interested. Facebook and Google both had a strong presence at this year’s Semtech conference.

Government and Community Open Data Initiatives

A third e-mail followed discussed some of the current movements by various level of government – from countries to municipalities – to freely open up their data to the public.

Here’s an interesting link announcing the pending formal UK government launch of their Open Data initiative, prompted by Tim Berners-Lee. And here’s The Guardian’s announcement of the launch the following day, with a video clip of Sir Tim himself. As The Guardian’s Martin Belam comments in a post days after the announcement, “We now know that, whatever the outcome of the next election, we are only going to see more Government and state gathered data published, not less. So how, as the news industry, are we going to respond to this, and what does the digital news media look like in a world with a high level of semantic state data available?”

The UK Government is a pioneer here for sure, but it’s a trend that many are already promoting in Canada. This represents a real opportunity, IMO, for journalism – as Belam strongly advocates for – for helping people make sense out of government data, to illuminate the broader patterns and relevance to peoples’ lives, and to host discussion around important “topics that matter”. Note the list of Canadian municipalities in this wiki page that are moving ahead full steam with Open Data initiatives. See the following articles for Toronto, Ottawa, Vancouver, Edmonton, and Calgary. And here’s a recent Forrester blog post on the topic.

And that’s about that. 🙂

glenn

Categories: Future of News Media, Open Data, Semantic Web Tags: Data-driven Journalism, Martin Belam, News Media, Open Linked Data, The Guardian

Semantics (and Metadata) at the New York Times

October 17, 2009 glennas 2 comments

***** Nov 10 2009 Update:
I have uploaded a summary doc of the NY Times presentation. Please click the following link to access: Semantics at The New York Times – notes – SemTech 2009
*****

Yet another great presentation from the SemTech 2009 conference this past June in San Jose. This presentation is on Semantics at the New York Times.

Here is a slide presentation that the New York Times delivered at a different conference, but it’s very similar to the one delivered at SemTech.

The (Long) History of Metadata at the New York Times

The presentation starts out exploring the history of metadata at the New York Times, from the beginnings of their Morgue archive which was created at the newspaper’s inception in, if you can believe, 1851. The so-called Morgue was not a collection of corpses (thank goodness), but rather a collection of newspaper clippings and photos.

No subject was too big or small to be indexed in the Morgue. As the Times VP of Digital Production Rob Larson states in the presentation, in 1907 the Times’ Managing Editor Carr Van Anda invested in the Morgue to add staff and rigor of organization to the files, and a Tagging system grew up around this effort.

At the Morgue’s zenith a few decades ago, the Morgue had a staff of 24 persons, creating 600 new clip folders per week, cutting up 36 editions of the final New York city edition of the Times, as well as copies of other prominent newspapers.

Within its main operation on the third floor, there were more than 4,000 cabinet drawers of newspaper clippings, containing 1,126,000 named individuals (including animals, etc), 65,000 subject headings, 300,000 ships and planes, 500,000 places, and 500,000 corporations. (Wow!)

The Morgue is only one form of tagging system used at the Times – others include the New York Times Index and the NYTimes.com website.

So what is the Tagging workflow at the New York Times?

A few slides to show from the presentation. The first slide depicts the tagging workflow at the New York Times, and what roles apply metadata at what step in the workflow.

Tagging at the NY Times

This visual oversimplifies the underlying complexity of the application of metadata, however, in the editorial workflow. Here’s a very-hard-to-read workflow diagram of the stages at which metadata is applied in the NY Times – which suggests the overall complexity of the end-to-end workflow, to both Print and Online channels.
Tagging Workflow at the Times

Why Tag?

Another core visual is shown below, which summarizes the motivation for tagging – that is the various use cases for metadata-tagged content at the Times.

Tagging - Use Cases

Rob Larson specifically addresses the importance of metadata for generating NY Times Topic Pages, 4 examples of which are provided below:

Topic Pages - NY Times

The Future

Next the presenters address the future of metadata (and now the talk turns more to “semantics”) at the NY Times.

What near-term plans does the Times have for evolving their metadata management practice? See the slide below:

Metadata Opportunities

Next up the presenters discusses the New York Times’ various Open Data initiatives, and the APIs the Times is making avaiable to the public to access and build applications on top of its data.

New York Times and Linked Data

Finally, the New York Times announced at SemTech the next phase of their Open Data strategy, which is to prepare their Corpus to be exposed to the Linked Data Cloud.

Interesting stuff.

glenn

Categories: Future of Newspapers, Semantic Web Tags: Future of Newspapers, Metadata, New York Times, News Media, Open Linked Data, Semantic Web

Post-relational Data Representations

October 11, 2009 glennas 1 comment

In a previous blog post, I made the comment:

I’ll be posting further on the Semantic Web in the coming weeks, and I’ll explore both how graph-like data representation differs from traditional relational modeling, and the benefits such a representation provides over more traditional data modeling approaches.

This post briefly elaborates on this topic by exploring two examples of post-relational data representations.

Key-Value Data Stores

The ReadWriteWeb had an interesting article from February 2009 titled
Is the Relational Database Doomed? If I understand this correctly, the issue here is basically indexing vast amounts of items indexed by a key – for example, documents on the Web.

This data management strategy is the norm for massively scalable indexing requirements. The ReadWriteWeb article discusses key-value data stores in the content of Cloud Computing.

*** Update 1 (11/09)
Interesting comment (comment #2) in the ReadWriteWeb article. Here it is:

There is also a new crop of databases called “graph databases” gaining traction (with a model based on nodes, relationships and properties), one of them being the open-source neo4j (http://neo4j.org).

Using graphs to structure information is very powerful and intuitive.

Exactly! See the section below. Also check out the remaining Comments associated with the ReadWriteWeb article. Fantastic discussion!
*** End Update 1

Graph-like Data Representations

Graph-based data representations (for example, RDF) provide a interesting contrast to traditional relational data modeling approaches, and are critical to the vision of the Semantic Web. Here are some of the key differences between graph-based and relational data representations:

“Triple” as the key data contruct – Graph-like data representations (for example, RDF) treat metadata and data the exact same way. Both metadata and data are expressed as a “triple” – a subject-predicate-object relation. The entire graph is nothing but a collection of these “triple” statements.
Triples are composed to build the Graph – The database concept of a “join” is accomplished through the flexible “composition” of triple statements. This would appear to be a much more flexible way to “compose” semantic structures dynamically, across multiple disparate data sources with different data representations/semantics.
Metadata IS Data – Data and Metadata about “concepts” are both expressed in the same manner – as Triple statements. This provides an extremely flexible and scalable representation of knowledge representation (i.e. data + semantics), because a new data element or metadata dimension can be added by simply adding another triple to the data store.

The above being said, relational data representations still have some important advantages where the data schema is well-known and relatively static. They also tend to be a good choice for transactional systems where the schema for key entities is, again, well-known and non-volatile. However, the schemas are more brittle and less malleable and composable compared to graph-based data representations.

*** Update 2 (11/09)
Here’s another key point about graphs, in contrast to tree-like data representations such as XML, from the Semantic Web Programming book (p. 72):

Graphs do not have roots. Some other representations, for example XML, are tree based. In an XML document, the root element of the tree has a special significance because all the other elements are oriented with respect to the document root. When trying to merge two trees, it can be difficult to determine what the root node should be because the structure of the tree is so important to the overall significance of the data. In an RDF graph, by contrast, no single resource is of any inherent significance as compared to any other.

*** End Update 2

Linked Data Initiative and RDF

A powerful example of the impact of RDF, graph-like data representations is the W3C’s Linked Data initiative. The Linked Data initiative, spearheaded by the Web’s founder Tim Berners-Lee, is an initiative to put data on the Web using URIs and RDF. I’ve blogged about Linked Data in a previous post.

BTW, I love this quote from Tim Berners-Lee in response to a question on how Linked Data relates to the Semantic Web:

“Linked Data is the Semantic Web done as it should be. It is the Web done as it should be.”

The above quote was taken from this article from 2008.

Triple Stories – Key-Value Data Stores for the Semantic Web

Interestingly, a Triple Store is a Key-Value data store purpose-built to manage RDF Triples. Basically, a Triple Store is the Semantic Web’s version of an RDBMS.

In conclusion …

I’ll be commenting more on graph-based data representations in the coming weeks, as well as foundational Semantic Web standards such as RDF and OWL.

glenn

Categories: Semantic Web Tags: Open Linked Data, RDF, Semantic Web, Triple

Programming the Semantic Web

September 19, 2009 glennas Leave a comment

The Semantic Web, and related technologies, looms large on the horizon. It’s business impact will be felt on all companies that in some ways organization information – that is, everyone.

I have blogged before about Tim Berner-Lee’s Open Data initiative, which leverages Semantic Web technologies. This post is for Programmers who are looking to understand how to develop software that leverage Semantic Web technologies.

Programming the SemWeb

There are a number of Semantic Web books out on the market that discuss Semantic Web technologies from a Researcher’s, or academic’s, point-of-view. My favorite is Semantic Web for the Working Ontologist.

However, until recently, there have been relatively few books that discuss the Semantic Web from a software development point-of-view.

The first books I came across to address the topic were not exactly Semantic Web focused, but instead were on the related topic of Collective Intelligence – see Programming Collective Intelligence and Collective Intelligence in Action. These books focus on developing capabilities such as Product Rating Engines (like you find in Amazon), Clustering algorithms, and Web Search algorithms.

Recently, however, a couple books have appeared on the market focusing specifically on developing software using Semantic Web technologies (e.g. RDF, OWL, SPARQL, etc.) – specificaly Semantic Web Programming and Programming the Semantic Web. One of the authors of these books, Toby Segaran, also has an interesting book out lately called Beautiful Data: The Stories Behind Elegant Data Solutions. I believe this book focuses on the “data” side of the Semantic Web. The books “in the mail”, so I’ll find out soon enough.

In Summary …

The Semantic Web, as mentioned above, looms large on the horizon of the future Web. If you’re a developer, now’s a great time to begin experimenting with the technologies.

glenn

Categories: Semantic Web Tags: Collective Intelligence, Open Linked Data, Semantic Web, Toby Segaran

The Guardian’s Open API strategy

September 18, 2009 glennas 1 comment

The Guardian, IMO, has a very forward-looking strategy around Open Data. Please see my previous related post on this topic.

This post is going to explore some of the core underpinnings of the Guardian’s Open Data strategy.

The Guardian’s Open Platform Strategy

In March of this year The Guardian officially launched its Open Platform strategy. It’s a very forward-looking strategy IMO, and has been generally applauded.

Here’s a link explaining what the Guardian’s Open Platform is all about. Effectively, it opens up the Guardian’s content “to the world”, and to developers, as a platform upon which to develop appliactions and services … in an application style this is called a “mashup” application.

The Content API and the Data Store

There are two key components to The Guardian’s Open Platform: (i) the Content API, and (ii) the Data Store.

The Content API is a mechanism for progamatically accessing Guardian content. You can query the Guardian’s content database for articles and get them back in formats that are geared toward integration with other internet applications.

The Data Store is a VERY cool product. It is a collection of important and high quality data sets curated by Guardian journalists. You can find useful data here, download it, and integrate it with other internet applications.

The Data Store and Database-driven Journalism

The Guardian’s Data Store is a brilliant enabler of database-driven journalism. Adrian Holovaty of Everyblock is probably the leading proponent of this movement, and I’m sure he’d be a big fan of The Guardian’s Data Store.

For a wonderful example of the power of The Guardian’s Data Store, and the mashup-friendly services that the product enables, check out this wonderful blog post by The Guardian’s Martin Belam describing the Data Store’s role in a scandal that arose in Great Britain this summer around MP expenses, and his discussion of the contrasting “open” and “closed” models of 21st-century journalism. It’s a great read.

All for now.

glenn

Categories: Future of Journalism, Open Data Tags: Data-driven Journalism, Martin Belam, News Media, Open API, Open Linked Data, The Guardian

Linked Data and the future of Journalism

September 17, 2009 glennas 1 comment

So I have a passionate interest in the Tim Berner-Lee and the W3C’s Linked Data initiative, and have blogged about the topic before.

While I was checking up on Martin Belam’s latest posts, these two popped up:

This may not be everyone’s cup of tea, but Linked Data and the Semantic Web are going to be increasingly hot topics over the next several years IMO.

glenn

Categories: Future of Journalism, Semantic Web Tags: Future of Journalism, Martin Belam, Open Linked Data

August 23, 2009 glennas 1 comment

Introduction to Open Linked Data

A very interesting groundswell is forming around the desire for opening up data to the web, and making it available for all to link to and share. This movement goes by various names including Open Data, Linked Data, and Opening Linked Data.

Open Linked Data leverages technologies inherent in the Semantic Web – specifically RDF.

Here are some interesting articles on the topic:

Tim Berners-Lee on the Next Web – Tim Berner-Lee TED talk, February 2009
Open Data is the future of Web Discovery – Doug Sherrets, July 2009
Interview with Tim Berners-Lee: Part I Linked Data – ReadWriteWeb, July 2009
Linked Data is Blooming: Why you should care – ReadWriteWeb, May 2009
Linked Data: Principles and State of the Art – Bizer, Heath, and Berners-Lee, April 2008
Tom Coates: Web of Data – ReadWriteWeb, February 2008
Giant Global Graph – Tim Berners-Lee, November 2007

To get a sense of how data is being opened to and linked in the Web of Data, here is a visual from the W3C from March 2009:

lod-datasets_2009-03-05

OK, so how does this relate to Journalism?

This past week, MSNBC accounced that they had acquired Everyblock. To learn more about Everyblock, and there strategic position in the HyperLocal space, visit there site here. The Guardian is also moving fast in the data-driven journalism space – see here, and here, and here.

Lamenting on the Newspapers’ failure to act on the strategic importance of Everyblock, Alan Mutter has this to say.

What Everyblock and the Guardian are fast engaging is what is sometimes referred to as Database Journalism. For further insight and discussion into the role of Database Journalism, see:

@ Future of Journalism: Adrian Holovaty’s vision for data-friendly journalists – The Guardian, June 2008
The Mashup Man – American Journalism Review, Winter 2006/2007
Journalism needs Data in the 21st century – ReadWriteWeb, August 2009

Other Open Data initiatives

The Open Data movement is also pressing Governments to open up their data. See:

Opening up Government Data – BBC interview with Tim Berners-Lee, June 2009
Data SF – City of San Francisco Open Data initiative
DC Data Catalog – Washington DC Open Data

All for now,
Glenn

Categories: Linked Data Tags: Open Linked Data, Semantic Web, Tim Berners-Lee

Archive

Data-driven Journalism

The Bigger Picture – Open Data

Linked Data – Technological foundation for Open Data on the Web

Government and Community Open Data Initiatives

The (Long) History of Metadata at the New York Times

So what is the Tagging workflow at the New York Times?

Why Tag?

The Future

New York Times and Linked Data

Key-Value Data Stores

Graph-like Data Representations

Linked Data Initiative and RDF

Triple Stories – Key-Value Data Stores for the Semantic Web

In conclusion …

Programming the SemWeb

In Summary …

The Guardian’s Open Platform Strategy

The Content API and the Data Store

The Data Store and Database-driven Journalism

Introduction to Open Linked Data

OK, so how does this relate to Journalism?

Other Open Data initiatives

Categories

Advertising and Marketing

Architecture

Business Strategy and Innovation

Citizen/Community Journalism

Cloud Computing

Commerce

Content Management

Content Strategy

Data Architecture & Analysis

Design

Favorite News Sources

Funny

Information Architecture

Interesting and Creative

Investing and Economy

Local

Media and Content

Media and Culture

Mobile

News Media and Journalism

Politics

Product Management

Search Marketing & SEO

Semantic Web

Social Business

Social Media/Social Web

Structured/Linked Data

Technology News

Trendwatching

Visual Thinking

Archives

Meta