Semantics (and Metadata) at the New York Times
***** Nov 10 2009 Update:
I have uploaded a summary doc of the NY Times presentation. Please click the following link to access: Semantics at The New York Times – notes – SemTech 2009
*****
Yet another great presentation from the SemTech 2009 conference this past June in San Jose. This presentation is on Semantics at the New York Times.
Here is a slide presentation that the New York Times delivered at a different conference, but it’s very similar to the one delivered at SemTech.
The (Long) History of Metadata at the New York Times
The presentation starts out exploring the history of metadata at the New York Times, from the beginnings of their Morgue archive which was created at the newspaper’s inception in, if you can believe, 1851. The so-called Morgue was not a collection of corpses (thank goodness), but rather a collection of newspaper clippings and photos.
No subject was too big or small to be indexed in the Morgue. As the Times VP of Digital Production Rob Larson states in the presentation, in 1907 the Times’ Managing Editor Carr Van Anda invested in the Morgue to add staff and rigor of organization to the files, and a Tagging system grew up around this effort.
At the Morgue’s zenith a few decades ago, the Morgue had a staff of 24 persons, creating 600 new clip folders per week, cutting up 36 editions of the final New York city edition of the Times, as well as copies of other prominent newspapers.
Within its main operation on the third floor, there were more than 4,000 cabinet drawers of newspaper clippings, containing 1,126,000 named individuals (including animals, etc), 65,000 subject headings, 300,000 ships and planes, 500,000 places, and 500,000 corporations. (Wow!)
The Morgue is only one form of tagging system used at the Times – others include the New York Times Index and the NYTimes.com website.
So what is the Tagging workflow at the New York Times?
A few slides to show from the presentation. The first slide depicts the tagging workflow at the New York Times, and what roles apply metadata at what step in the workflow.

This visual oversimplifies the underlying complexity of the application of metadata, however, in the editorial workflow. Here’s a very-hard-to-read workflow diagram of the stages at which metadata is applied in the NY Times – which suggests the overall complexity of the end-to-end workflow, to both Print and Online channels.

Why Tag?
Another core visual is shown below, which summarizes the motivation for tagging – that is the various use cases for metadata-tagged content at the Times.

Rob Larson specifically addresses the importance of metadata for generating NY Times Topic Pages, 4 examples of which are provided below:

The Future
Next the presenters address the future of metadata (and now the talk turns more to “semantics”) at the NY Times.
What near-term plans does the Times have for evolving their metadata management practice? See the slide below:

Next up the presenters discusses the New York Times’ various Open Data initiatives, and the APIs the Times is making avaiable to the public to access and build applications on top of its data.
New York Times and Linked Data
Finally, the New York Times announced at SemTech the next phase of their Open Data strategy, which is to prepare their Corpus to be exposed to the Linked Data Cloud.
Interesting stuff.
glenn