Metadata | End of Business as Usual

Metadata for a Web of Data – Scott Davis on RDFa and Microformats

April 3, 2011 glennas Leave a comment

A very informative and entertaining presentation by Scott Davis of ThirstyHead on using RDFa and Microformats to build services that are OF the Web, not just ON the Web. Presentation slides can be found here.

glenn

Categories: Semantic Markup Tags: Metadata, Microformats, RDFa, Scott Davis, Semantic Markup

Power of Metadata to enable Business Transformation

July 24, 2010 glennas 1 comment

Introduction

I’m currently reading a business-oriented book on the power of the Semantic Web to enable new business models called Pull, the Power of the Semantic Web to Transform you Business, by David Siegel, and published December 2009.

I’ve read several excellent technology-focused books on the Semantic Web, including Semantic Web Programming and Programming the Semantic Web. But this is the first book I’ve seen that specifically looks at the Semantic Web, and structured Metadata, from the vantage point of enabling Business Transformation and the development of new Business Models.

BTW, David Siegel recently delivered the keynote at SemTech 2010.

Metadata enables Smart Objects

I’m currently about 30 pages into the 250-ish page book, but several key messages have already been presented. And the one that most immediately grabbed my attention is the notion of “smart objects” [BTW, Siegel doesn’t explicitly use this term (at least he hasn’t thusfar in the book), but it is a notion that underlies much of this message.]

The key idea here is that objects – products and content – have a unique ID, and associated metadata, such they effectively “know about themselves”. They know what their meaning is and how to describe themselves, they know where they’ve been, they know what state they are in, and so forth. Obviously, these are not exactly the objects as we encounter them in the everyday world. Rather, we are speaking about an electronic representation of the object, that has some smarts associated with it.

In the first 2 chapter, Siegel provides several examples in the Shipping and Retail industries. In the shipping industry, he talks about smart “packages”. Quoting Siegel:

Using a new universal tracking number and open standards for messaging creates package-level autonomy: the package itself will send a message to the customer to make sure he or she is there to receive it.

The basic idea here is that an electronic representation exists of a package, that is basically “smart”. It knows about itself, and it can respond to events that are of interest to it. As well, the “vocabulary” that describes the data and events associated with packages are formalized as an industry standard, so that packages can easily cross process boundaries between different companies in the industry, that operate in different parts of the supply chain. This is the “package-level autonomy” that Siegel mentions above.

Another example is “smart products” in Retail. Here, Siegel provides examples of a Smart Cart and Smart Products. The Smart Cart knows what products you’ve put into it, and can take actions on the items currently in the cart – whether its adding up the total, applying coupons, or providing information at checkout.

Smart Products are tagged with bar codes and RFID codes as universal identifiers and tracking tags, whereby scanning these tags can provide product description information, competitive pricing information, and can be used to track the transport of product across various stages of its production and delivery lifecycle.

Smart Containers manage and leverage Smart Objects

It’s not, however, just the objects themselves that are smart. It is also the “containers” of these objects – whether the container is a Shopping Cart, a Shipment, Carton of items, a Palette of goods, or a Truck. Smart Containers know precisely the nature of the goods or products they contain, and what state they are in.

Applying the “Smart Object” concept to the Media Industry

Can these concepts be applied to the media industry? I think they can. The media industry has its own version of objects and containers of those objects. Our objects are most importantly Content – Articles, Photos, Videos, etc, and the discussions and conversations around that content. Our packages are the containers for this content – Publications, Websites, Web pages, and Stories that aggregate multiple types of content.

So, like the smart objects above, our objects – our Content – needs to be “smart” or “intelligent”. It needs to know about itself, it needs to be self-describing. And our “containers” and media products need to be able to take advantage of that intelligence – to understand what content is most relevant to our audiences, and make sure that content is available and discoverable by our users when they want it, where they want it, and in the form they want to consume it.

On the web, of course, one of the most important rationales for “smart content” is to make it easily discoverable by Search Engines – SEO-friendly, as they say. In this sense, a Search Results page is like a “dynamic container” that is constructed on-the-fly according to a Query that specifies relevance criteria (metadata) that express the intention of a user/consumer (machine or human) at that particular moment.

We also need to be able to learn from the behavior and media consumption patters of our users. To “learn” from their behavior, and calibrate our content delivery to their preferences and behaviors.

In Summary

Well that’s it really. Just wanted to:

Introduce the notion of “smart” or “intelligent” objects and containers, powered by metadata
Suggest the power of these intelligent objects to transform existing, and enable new, business models, and
Suggest that the Media industries have their own versions of smart objects and containers – their content, and the platforms, products, and delivery channels that showcase their content.

glenn

Categories: Business Transformation, Future of News Media, Metadata, Semantic Web Tags: Business Innovation, Business Transformation, David Siegel, Future of News Media, Metadata, News Media, Pull, Semantic Web, Smart Containers, Smart Objects

Understanding Classification and Taxonomies – Building Enterprise Taxonomies

February 28, 2010 glennas Leave a comment

Currently reading a very nice book on designing Classification systems and Taxonomies titled Building Enterprise Taxonomies, authored by Darin L. Stewart (Director or Web Strategies and Research Information Services for Oregon Health and Science University), published in 2008.

The title of the book, IMO, is a bit of a misnomer. This is not so much a book about designing Taxonomies for Enterprises, as it is an elegant, easy-to-digest framework for classifying knowledge and designing Knowledge Representation, Search, and Discovery environments. To prove the point, here are the chapter titles (with my comments appended):

Findability – on Search and Information Discovery
Metadata – including an overview of Dublin Core
Taxonomy – an overview of Classification systems, and the role of Controlled Vocabularies
Preparations – references what Stewart calls the Taxonomy Development Lifecycle
Terms
Structure – a deeper exploration of the task of Categorization
Ontology– exploring the Semantic Web
Folksonomy – community-generated Classification in an era of the Social Web

Pretty great stuff eh?

Stewart also makes reference to a very nice research article from 1999 by Barbara H. Kwasnik: Role of Classification in Knowledge Representation and Discovery. It’s a nice piece.

All for now.

glenn

Categories: Classification, Metadata Tags: Classification, Darin L. Stewart, Metadata, Taxonomy

What is RDFa? – Mark Birbeck

January 31, 2010 glennas 1 comment

In a previous post, I referenced an excellent talk that Mark Birbeck gave at Google in 2009, as well as a couple excellent introductory articles he wrote on RDFa.

I was re-viewing Birbeck’s Google TechTalk on RDFa, and really liked his brief explanation about what RDFa actually is. So thought I’d quote Birbeck from his talk:

I’m using RDFa as a bit of a shorthand, because I’m saying really “embedded metadata”. I’m saying any way of actually putting information into the HTML page, rather than the traditional semantic web approach of having a “separate channel”. By separate channel, I’m saying you might have had an RDF-XML document, or even an RSS feed you could regard as a kind of semantic channel of information. But a channel of information that’s kind of distinct from the web page.

Whereas what we’ve done with RDFa, and what the people behind Microformats were doing, basically the same goal, was actually make the HTML page the carrier of the metadata. And some times it’s carrying metadata about other things, and sometimes it’s carrying metadata about itself. So really, when I say RDFa (throughout this talk) I’m generally meaning those kind of solutions that allow you to embed metadata.

The reason I’m favoring RDFa is because it’s very specific goal was to align itself with RDF, so it’s actually much more precise than Microformats, but the idea is the same that you embed information [in the HTML page].

So that’s the purpose of RDFa according to Birbeck. As far as what RDFA actually is:

As for what it is, it’s a W3C standard now. It’s something we’ve been working on for four or so years – which I guess is quick for the W3C, we’ve been working on it for quite a long time, and it recently became a standard.

And it’s very much about defining the syntax of how you embed information. It’s not really about saying what the vocabularies should be. Whereas Microformats is very much more about the vocabularies.

And a good example of the flexibility of what that brings is when Google did its Rich Snippets, it just came out with its own vocabulary. It got a lot of stick for it from the Semantic Web community, or some there. But the point is that you were able to just come out with your own vocabulary, because RDFa is about the syntax and the structure, rather than the actual terms.

So it’s very much in the spirit of the Web in the sense that it allows people to define their own vocabularies or reuse existing vocabularies, and put them into their documents however they see fit.

So RDFa is a standard, and its goal is embedding metadata in pages.

That’s a very nice exaplanation I must say. Please view the entirety of Birbeck’s talk for deeper insight into the mechanics of RDFa.

glenn

Categories: Semantic Markup, Semantic Web Tags: Google TechTalk, Mark Birbeck, Metadata, RDFa, Semantic Markup, Semantic Web

Dublin Core Metadata Initiative (DCMI) – Learning Resources

January 31, 2010 glennas Leave a comment

A nice set of learning resources for the Dublin Core Metadata Initiative at the DCMI’s Metadata Training Resources page. In particular, there’s a series of links to presentations delivered by Makx Dekkers and Thomas Baker in December 2009 in Florence, Italy. For ease of access, here are the links:

History, objectives and approaches of the Dublin Core Metadata Initiative – Makx Dekkers
DCMI and the metadata landscape – Makx Dekkers
Basics of Dublin Core Metadata – Thomas Baker
Data Integration and Structured Search – Thomas Baker
The “metadata record” and DCMI Abstract Model – Thomas Baker
Web-enabled vocabularies – Thomas Baker
Linking legacy data – Thomas Baker
Outcomes of DC-2009 – Makx Dekkers

Just reading the Basics of Dublin Core Metadata presentation now, and for someone who’s relatively new to Dublin Core, it’s both fascinating and very well presented. Just a couple quick slide visuals to illustrate. First, the Dublin Core vocabulary circa 2000:

A very nice, clean, well-factored representation. And then there’s the important migration from 2003-2007 of the Dublin Core to RDF and the Semantic Web:

And the there’s the structured search scenarios that Dublin Core seeks to enable today:

And so the story progresses. Lots of semantic gold in them thar presentations. 🙂

glenn

Categories: Metadata, Semantic Web Tags: Dublin Core, Metadata, Semantic Web

Enterprise Metadata – thoughts

December 13, 2009 glennas 3 comments

The company I work for is about to embark on an Enterprise Metadata initiative. So I thought I’d write an introductory blog post on what metadata is, and how I think of metadata within the Enterprise. So here goes …

What is metadata?

So here’s the Wikipedia page on metadata. Blah, blah, blah. To me, saying metadata is “data about data” is about as about as useful as saying information is information about information. I mean, it’s a bit recursive don’t you think?

I view metadata as descriptive information about a “thing”, where the “thing” is anything that can be represented as a concept. Depending on the context, this “thing” could be a person, a topic, a piece of content, an ad, a real-world entity like a truck or a house, or even an abstract concept like “love” or “beauty”. Any piece of information, or “semantics”, describing the underlying entity can be viewed as metadata.

The descriptive information can be information about the thing itself (for example the name and address of a business), or the relational context of the thing (for example, as we will see below, related content or persons associated with that thing).

What is “enterprise” metadata?

Enterprise metadata, therefore, is descriptive information that describes the core concepts within an enterprise – customers, ads, content, business units, you name it – and the web of things that are related to it.

Representing Metdata

So how is metadata represented in information systems? Well, it can be represented in many ways. It can be represented as fields in a table (i.e. a relational database), as tags, as categories, or as Properties associated with an Object in code, or even associated with a variable baked into the code (bad, bad programming!). Metadata can be represented as “structured” data (as in a relational database or markup language), or “semi-structured” or “unstructured” data (as in a Word document).

Representing metadata as a Graph

However, the most powerful way of representing metadata that is highly-relational and subject to change is with a graph. Here, we don’t mean “graph” in the sense of a visual representation of data in Excel. But rather in the mathematical sense of the term, as a network of nodes and links (or edges). In social networking, this is how people are related together … in terms of a Social Graph.

With graph-based data representations, there’s essentially no difference between data and metadata. What is viewed as data from one perspective, can be viewed as metadata from another perspective. For more on this topic, see my previous posts here and here.

Listings Metadata – an example

With this introduction, let’s consider what metadata might be associated with a Business Listing. So traditionally, when one thinks of a Business Listing, one might think of something that looks like this:

In this example above, you would probably say that the metadata associated with this Listing is the name of the business, the location of the business, phone #, etc.

However, what if the content associated with a business listing was displayed on an entire page, like this:

Here we see the entire page is full of metadata – or associated content – about the business listing. We have the business listing itself, but we also have all sorts of additional information/content associated with the listing: comments, editorial reviews, a map showing where the business is located, perhaps pricing information about the business’ products, and even a video supplied by the proprietor. We may also have descriptive tags associated with the listing or business, as well as people who “subscribe” or “follow” the listing, and want to be notified of updates to the listing.

Now the listing is less the descriptive data associated with the physical image of the listing provided in the initial example, but more like a “concept”, with all sorts of associated content and metadata – some basic textual information, and a whack of related content and even the social context of community members who might be interested in the listing, or who have contributed content.

This “web of related content” associated with the listing can be represented as a “graph” (as discussed above), which forms a “web” of related objects of content associated with the listing.

In Summary …

And that is really how I view metadata. It’s the immediate information that describes or characterizes a “thing”. But it’s also the web of contextual information that is associated with the thing – related content, user-generated content, social context, and so forth.

Thoughts? Comments?

glenn

Categories: Metadata Tags: Graph-based Data Representation, Metadata

Semantics (and Metadata) at the New York Times

October 17, 2009 glennas 2 comments

***** Nov 10 2009 Update:
I have uploaded a summary doc of the NY Times presentation. Please click the following link to access: Semantics at The New York Times – notes – SemTech 2009
*****

Yet another great presentation from the SemTech 2009 conference this past June in San Jose. This presentation is on Semantics at the New York Times.

Here is a slide presentation that the New York Times delivered at a different conference, but it’s very similar to the one delivered at SemTech.

The (Long) History of Metadata at the New York Times

The presentation starts out exploring the history of metadata at the New York Times, from the beginnings of their Morgue archive which was created at the newspaper’s inception in, if you can believe, 1851. The so-called Morgue was not a collection of corpses (thank goodness), but rather a collection of newspaper clippings and photos.

No subject was too big or small to be indexed in the Morgue. As the Times VP of Digital Production Rob Larson states in the presentation, in 1907 the Times’ Managing Editor Carr Van Anda invested in the Morgue to add staff and rigor of organization to the files, and a Tagging system grew up around this effort.

At the Morgue’s zenith a few decades ago, the Morgue had a staff of 24 persons, creating 600 new clip folders per week, cutting up 36 editions of the final New York city edition of the Times, as well as copies of other prominent newspapers.

Within its main operation on the third floor, there were more than 4,000 cabinet drawers of newspaper clippings, containing 1,126,000 named individuals (including animals, etc), 65,000 subject headings, 300,000 ships and planes, 500,000 places, and 500,000 corporations. (Wow!)

The Morgue is only one form of tagging system used at the Times – others include the New York Times Index and the NYTimes.com website.

So what is the Tagging workflow at the New York Times?

A few slides to show from the presentation. The first slide depicts the tagging workflow at the New York Times, and what roles apply metadata at what step in the workflow.

Tagging at the NY Times

This visual oversimplifies the underlying complexity of the application of metadata, however, in the editorial workflow. Here’s a very-hard-to-read workflow diagram of the stages at which metadata is applied in the NY Times – which suggests the overall complexity of the end-to-end workflow, to both Print and Online channels.
Tagging Workflow at the Times

Why Tag?

Another core visual is shown below, which summarizes the motivation for tagging – that is the various use cases for metadata-tagged content at the Times.

Tagging - Use Cases

Rob Larson specifically addresses the importance of metadata for generating NY Times Topic Pages, 4 examples of which are provided below:

Topic Pages - NY Times

The Future

Next the presenters address the future of metadata (and now the talk turns more to “semantics”) at the NY Times.

What near-term plans does the Times have for evolving their metadata management practice? See the slide below:

Metadata Opportunities

Next up the presenters discusses the New York Times’ various Open Data initiatives, and the APIs the Times is making avaiable to the public to access and build applications on top of its data.

New York Times and Linked Data

Finally, the New York Times announced at SemTech the next phase of their Open Data strategy, which is to prepare their Corpus to be exposed to the Linked Data Cloud.

Interesting stuff.

glenn

Categories: Future of Newspapers, Semantic Web Tags: Future of Newspapers, Metadata, New York Times, News Media, Open Linked Data, Semantic Web

"People, Places, Subjects" – BBC Topic and Guardian keyword pages

September 17, 2009 glennas Leave a comment

More great content from the Guardian’s Information Architect Martin Belam. In this series of posts, he explores the metadata and taxonomy strategies at the BBC and Guardian. Here are the posts:

Here’s the BBC’s presentation of a Topic – in this case, Climate Change. Note how videos, news, and blogs are aggregated for a particular topic. At first glance, the page has a nice look and feel. But the approach is somewhat brittle. The stories under the Topics don’t seem to be aggregated search results, but rather stories placed into the Topics as “one-off” stories.

By contrast, here is the Guardian’s Climate Change topic page. Note that the Guardian has also implemented a Taxonomy around its topics, with Climate Change being a sub-topic of the broader Environment topic.

*** Update #1 ***
Kind of interesting exploring this whole “Topic” theme. Here’s the New York Times Topics page, which looks to be a pretty standard “sections” based approach that a Newspaper might be expected to take.

OTOH, here’s all topic pages about people, places, organizations, and subjects that start with the letter “A”. Clearly, there’s some keyword indexing going on here … althought I’m not crazy about the presentation.
****************

More later,
glenn

Categories: Classification, Digital Design, Information Architecture, Metadata Tags: Digital Design, Information Architecture, Martin Belam, Metadata, The Guardian

Archive

Introduction

Metadata enables Smart Objects

Smart Containers manage and leverage Smart Objects

Applying the “Smart Object” concept to the Media Industry

In Summary

What is metadata?

What is “enterprise” metadata?

Representing Metdata

Representing metadata as a Graph

Listings Metadata – an example

In Summary …

The (Long) History of Metadata at the New York Times

So what is the Tagging workflow at the New York Times?

Why Tag?

The Future

New York Times and Linked Data

Categories

Advertising and Marketing

Architecture

Business Strategy and Innovation

Citizen/Community Journalism

Cloud Computing

Commerce

Content Management

Content Strategy

Data Architecture & Analysis

Design

Favorite News Sources

Funny

Information Architecture

Interesting and Creative

Investing and Economy

Local

Media and Content

Media and Culture

Mobile

News Media and Journalism

Politics

Product Management

Search Marketing & SEO

Semantic Web

Social Business

Social Media/Social Web

Structured/Linked Data

Technology News

Trendwatching

Visual Thinking

Archives

Meta