Semantics (and Metadata) at the New York Times
***** Nov 10 2009 Update:
I have uploaded a summary doc of the NY Times presentation. Please click the following link to access: Semantics at The New York Times – notes – SemTech 2009
*****
Yet another great presentation from the SemTech 2009 conference this past June in San Jose. This presentation is on Semantics at the New York Times.
Here is a slide presentation that the New York Times delivered at a different conference, but it’s very similar to the one delivered at SemTech.
The (Long) History of Metadata at the New York Times
The presentation starts out exploring the history of metadata at the New York Times, from the beginnings of their Morgue archive which was created at the newspaper’s inception in, if you can believe, 1851. The so-called Morgue was not a collection of corpses (thank goodness), but rather a collection of newspaper clippings and photos.
No subject was too big or small to be indexed in the Morgue. As the Times VP of Digital Production Rob Larson states in the presentation, in 1907 the Times’ Managing Editor Carr Van Anda invested in the Morgue to add staff and rigor of organization to the files, and a Tagging system grew up around this effort.
At the Morgue’s zenith a few decades ago, the Morgue had a staff of 24 persons, creating 600 new clip folders per week, cutting up 36 editions of the final New York city edition of the Times, as well as copies of other prominent newspapers.
Within its main operation on the third floor, there were more than 4,000 cabinet drawers of newspaper clippings, containing 1,126,000 named individuals (including animals, etc), 65,000 subject headings, 300,000 ships and planes, 500,000 places, and 500,000 corporations. (Wow!)
The Morgue is only one form of tagging system used at the Times – others include the New York Times Index and the NYTimes.com website.
So what is the Tagging workflow at the New York Times?
A few slides to show from the presentation. The first slide depicts the tagging workflow at the New York Times, and what roles apply metadata at what step in the workflow.
This visual oversimplifies the underlying complexity of the application of metadata, however, in the editorial workflow. Here’s a very-hard-to-read workflow diagram of the stages at which metadata is applied in the NY Times – which suggests the overall complexity of the end-to-end workflow, to both Print and Online channels.
Why Tag?
Another core visual is shown below, which summarizes the motivation for tagging – that is the various use cases for metadata-tagged content at the Times.
Rob Larson specifically addresses the importance of metadata for generating NY Times Topic Pages, 4 examples of which are provided below:
The Future
Next the presenters address the future of metadata (and now the talk turns more to “semantics”) at the NY Times.
What near-term plans does the Times have for evolving their metadata management practice? See the slide below:
Next up the presenters discusses the New York Times’ various Open Data initiatives, and the APIs the Times is making avaiable to the public to access and build applications on top of its data.
New York Times and Linked Data
Finally, the New York Times announced at SemTech the next phase of their Open Data strategy, which is to prepare their Corpus to be exposed to the Linked Data Cloud.
Interesting stuff.
glenn
-
October 19, 2009 at 12:14 amSemantics (and Metadata) at the New York Times | Digital Asset Management
-
November 5, 2009 at 7:54 amHow to Solve 5 Common Web Publishing Mistakes | Digital Tonto
Categories
Advertising and Marketing
- Adotas
- Ant's Eye View Blog
- BIA-Kelsey blog
- Borrell Associates
- Brand Autopsy – John Moore
- Brian Solis
- Church of the Customer blog
- ClickZ
- Convince and Convert – Jay Baer
- David Berkowitz's Marketing Blog
- Digital Tonto – Greg Satell
- Direct Marketing News
- Duct Tape Marketing Blog
- eMarketer Blog
- GasPedal
- HubSpot – Internet Marketing Blog
- IABlog
- iMedia Connection
- Influential Marketing Blog
- MarketingProfs
- Mashable Advertising & Marketing
- Ogilvy PR 360 Digital Influence Blog
- Screenwerk – Greg Sterling's blog
- Seth's Blog
- The Bad Pitch Blog
- The Daily Influence – Ogilvy PR
- TopRank Online Marketing Blog
- UnMarketing – Scott Stratten
- Web Ink Now – David Meerman Scott
Architecture
Business Strategy and Innovation
Citizen/Community Journalism
Commerce
Content Management
Content Strategy
Data Architecture & Analysis
Design
- A List Apart
- Aza's Thoughts
- Boxes and Arrows
- Cogapp blog
- Core77
- Designful Thinking
- disambiguity – Leisa Reichelt
- emergent by design
- Experiencing Information – James Kalbach
- InfoDesign
- Joe Lamantia.com
- Johnny Holland
- Logic + Emotion – David Armano
- Semantic Foundry – Will Evans
- Skilfull Minds – Larry Irons
- UX Booth
- UX Magazine
- UXmatters
Favorite News Sources
- Al Jazeera English
- Al Jazeera Listening Post
- Al Jazeera YouTube channel
- Ambrose Evans-Pritchard
- Boiling Frogs – Sibel Edmonds
- Business Insider
- CounterPunch
- Daily Show /w Jon Stewart
- Democracy Now
- Glenn Greenwald
- globalresearch.ca
- Huffington Post
- Mark Crispin Miller
- Project Censored
- Robert Fisk
- WikiLeaks on Twitter
Funny
Information Architecture
Interesting and Creative
Investing and Economy
Local
- BIA-Kelsey blog
- Borrell Associates
- Breaking News Network blog
- Google Maps & Local Search – Mike Blumenthal
- HyperlocalBlogger – Matt McGee
- Local SEO Guide
- Lost Remote
- Media Transparent – Pat Kitano blog
- Praized Blogs – Seb Provencher
- Screenwerk – Greg Sterling's blog
- Small Business Search Marketing – Matt McGee
Media and Content
Media and Culture
Mobile
News Media and Journalism
- 10,000 Words
- Adam Westbrook
- Blogically Thinking – Jan Schaffer's blog
- BuzzMachine
- Columbia Journalism Review
- eMedia Vitals
- Knight Digital Media Center
- Muck Rack – Journalists on Twitter
- News 3.0 – Steffen Konrath
- News for Digital Journalists (KDMC)
- News Innovation
- News Leadership 3.0 (KDMC)
- Newsonomics
- Newspaper Death Watch
- Nieman Journalism Lab
- Online Journalism Review (KDMC)
- Pointer Online
- PressThink – Jay Rosen's blog
- Rebooting the News
- Recovering Journalist – Mark Potts' blog
- Reflections of a Newsosaur
- Reportr.net – Alfred Hermida
- Save the Media – Gina Chen
- Steve Buttry
- SteveOuting.com
- Vadim Lavrusik
Politics
Product Management
Search Marketing & SEO
- Bryan & Jeffrey Eisenberg
- Chris Silver Smith
- Google Maps & Local Search – Mike Blumenthal
- HuoMah SEO Blog
- John Battelle's Searchblog
- Search Engine Land
- SEM ClubHouse
- SEO Book Blog
- SEO by the Sea – Bill Slawski
- SEOmoz Blog
- Small Business Search Marketing – Matt McGee
- The Noisy Channel – Daniel Tunkelang
- This Week in Search – Google Blog
Semantic Web
Social Business
Social Media/Social Web
- apophenia – Dana Boyd
- Bokardo – Joshua Porter's blog
- Brass Tack Thinking
- Brian Solis
- Chris Messina
- Clay Shirky (Twitter)
- Digital Tonto – Greg Satell
- Epeus' epigone – Kevin Marks
- iLibrarian – Ellyssa Kroski
- Mashable
- Skilfull Minds – Larry Irons
- SmartMobs
- Social Computing Journal
- Social Media Today
- The Community Roundtable Blog
- The Facebook Era – Clara Shih's blog