A very informative and entertaining presentation by Scott Davis of ThirstyHead on using RDFa and Microformats to build services that are OF the Web, not just ON the Web. Presentation slides can be found here.
I’m currently reading a business-oriented book on the power of the Semantic Web to enable new business models called Pull, the Power of the Semantic Web to Transform you Business, by David Siegel, and published December 2009.
I’ve read several excellent technology-focused books on the Semantic Web, including Semantic Web Programming and Programming the Semantic Web. But this is the first book I’ve seen that specifically looks at the Semantic Web, and structured Metadata, from the vantage point of enabling Business Transformation and the development of new Business Models.
BTW, David Siegel recently delivered the keynote at SemTech 2010.
Metadata enables Smart Objects
I’m currently about 30 pages into the 250-ish page book, but several key messages have already been presented. And the one that most immediately grabbed my attention is the notion of “smart objects” [BTW, Siegel doesn’t explicitly use this term (at least he hasn’t thusfar in the book), but it is a notion that underlies much of this message.]
The key idea here is that objects – products and content – have a unique ID, and associated metadata, such they effectively “know about themselves”. They know what their meaning is and how to describe themselves, they know where they’ve been, they know what state they are in, and so forth. Obviously, these are not exactly the objects as we encounter them in the everyday world. Rather, we are speaking about an electronic representation of the object, that has some smarts associated with it.
In the first 2 chapter, Siegel provides several examples in the Shipping and Retail industries. In the shipping industry, he talks about smart “packages”. Quoting Siegel:
Using a new universal tracking number and open standards for messaging creates package-level autonomy: the package itself will send a message to the customer to make sure he or she is there to receive it.
The basic idea here is that an electronic representation exists of a package, that is basically “smart”. It knows about itself, and it can respond to events that are of interest to it. As well, the “vocabulary” that describes the data and events associated with packages are formalized as an industry standard, so that packages can easily cross process boundaries between different companies in the industry, that operate in different parts of the supply chain. This is the “package-level autonomy” that Siegel mentions above.
Another example is “smart products” in Retail. Here, Siegel provides examples of a Smart Cart and Smart Products. The Smart Cart knows what products you’ve put into it, and can take actions on the items currently in the cart – whether its adding up the total, applying coupons, or providing information at checkout.
Smart Products are tagged with bar codes and RFID codes as universal identifiers and tracking tags, whereby scanning these tags can provide product description information, competitive pricing information, and can be used to track the transport of product across various stages of its production and delivery lifecycle.
Smart Containers manage and leverage Smart Objects
It’s not, however, just the objects themselves that are smart. It is also the “containers” of these objects – whether the container is a Shopping Cart, a Shipment, Carton of items, a Palette of goods, or a Truck. Smart Containers know precisely the nature of the goods or products they contain, and what state they are in.
Applying the “Smart Object” concept to the Media Industry
Can these concepts be applied to the media industry? I think they can. The media industry has its own version of objects and containers of those objects. Our objects are most importantly Content – Articles, Photos, Videos, etc, and the discussions and conversations around that content. Our packages are the containers for this content – Publications, Websites, Web pages, and Stories that aggregate multiple types of content.
So, like the smart objects above, our objects – our Content – needs to be “smart” or “intelligent”. It needs to know about itself, it needs to be self-describing. And our “containers” and media products need to be able to take advantage of that intelligence – to understand what content is most relevant to our audiences, and make sure that content is available and discoverable by our users when they want it, where they want it, and in the form they want to consume it.
On the web, of course, one of the most important rationales for “smart content” is to make it easily discoverable by Search Engines – SEO-friendly, as they say. In this sense, a Search Results page is like a “dynamic container” that is constructed on-the-fly according to a Query that specifies relevance criteria (metadata) that express the intention of a user/consumer (machine or human) at that particular moment.
We also need to be able to learn from the behavior and media consumption patters of our users. To “learn” from their behavior, and calibrate our content delivery to their preferences and behaviors.
Well that’s it really. Just wanted to:
- Introduce the notion of “smart” or “intelligent” objects and containers, powered by metadata
- Suggest the power of these intelligent objects to transform existing, and enable new, business models, and
- Suggest that the Media industries have their own versions of smart objects and containers – their content, and the platforms, products, and delivery channels that showcase their content.
Currently reading a very nice book on designing Classification systems and Taxonomies titled Building Enterprise Taxonomies, authored by Darin L. Stewart (Director or Web Strategies and Research Information Services for Oregon Health and Science University), published in 2008.
The title of the book, IMO, is a bit of a misnomer. This is not so much a book about designing Taxonomies for Enterprises, as it is an elegant, easy-to-digest framework for classifying knowledge and designing Knowledge Representation, Search, and Discovery environments. To prove the point, here are the chapter titles (with my comments appended):
- Findability – on Search and Information Discovery
- Metadata – including an overview of Dublin Core
- Taxonomy – an overview of Classification systems, and the role of Controlled Vocabularies
- Preparations – references what Stewart calls the Taxonomy Development Lifecycle
- Structure – a deeper exploration of the task of Categorization
- Ontology– exploring the Semantic Web
- Folksonomy – community-generated Classification in an era of the Social Web
Pretty great stuff eh?
Stewart also makes reference to a very nice research article from 1999 by Barbara H. Kwasnik: Role of Classification in Knowledge Representation and Discovery. It’s a nice piece.
All for now.
I was re-viewing Birbeck’s Google TechTalk on RDFa, and really liked his brief explanation about what RDFa actually is. So thought I’d quote Birbeck from his talk:
I’m using RDFa as a bit of a shorthand, because I’m saying really “embedded metadata”. I’m saying any way of actually putting information into the HTML page, rather than the traditional semantic web approach of having a “separate channel”. By separate channel, I’m saying you might have had an RDF-XML document, or even an RSS feed you could regard as a kind of semantic channel of information. But a channel of information that’s kind of distinct from the web page.
Whereas what we’ve done with RDFa, and what the people behind Microformats were doing, basically the same goal, was actually make the HTML page the carrier of the metadata. And some times it’s carrying metadata about other things, and sometimes it’s carrying metadata about itself. So really, when I say RDFa (throughout this talk) I’m generally meaning those kind of solutions that allow you to embed metadata.
The reason I’m favoring RDFa is because it’s very specific goal was to align itself with RDF, so it’s actually much more precise than Microformats, but the idea is the same that you embed information [in the HTML page].
So that’s the purpose of RDFa according to Birbeck. As far as what RDFA actually is:
As for what it is, it’s a W3C standard now. It’s something we’ve been working on for four or so years – which I guess is quick for the W3C, we’ve been working on it for quite a long time, and it recently became a standard.
And it’s very much about defining the syntax of how you embed information. It’s not really about saying what the vocabularies should be. Whereas Microformats is very much more about the vocabularies.
And a good example of the flexibility of what that brings is when Google did its Rich Snippets, it just came out with its own vocabulary. It got a lot of stick for it from the Semantic Web community, or some there. But the point is that you were able to just come out with your own vocabulary, because RDFa is about the syntax and the structure, rather than the actual terms.
So it’s very much in the spirit of the Web in the sense that it allows people to define their own vocabularies or reuse existing vocabularies, and put them into their documents however they see fit.
So RDFa is a standard, and its goal is embedding metadata in pages.
That’s a very nice exaplanation I must say. Please view the entirety of Birbeck’s talk for deeper insight into the mechanics of RDFa.
A nice set of learning resources for the Dublin Core Metadata Initiative at the DCMI’s Metadata Training Resources page. In particular, there’s a series of links to presentations delivered by Makx Dekkers and Thomas Baker in December 2009 in Florence, Italy. For ease of access, here are the links:
- History, objectives and approaches of the Dublin Core Metadata Initiative – Makx Dekkers
- DCMI and the metadata landscape – Makx Dekkers
- Basics of Dublin Core Metadata – Thomas Baker
- Data Integration and Structured Search – Thomas Baker
- The “metadata record” and DCMI Abstract Model – Thomas Baker
- Web-enabled vocabularies – Thomas Baker
- Linking legacy data – Thomas Baker
- Outcomes of DC-2009 – Makx Dekkers
Just reading the Basics of Dublin Core Metadata presentation now, and for someone who’s relatively new to Dublin Core, it’s both fascinating and very well presented. Just a couple quick slide visuals to illustrate. First, the Dublin Core vocabulary circa 2000:
A very nice, clean, well-factored representation. And then there’s the important migration from 2003-2007 of the Dublin Core to RDF and the Semantic Web:
And the there’s the structured search scenarios that Dublin Core seeks to enable today:
And so the story progresses. Lots of semantic gold in them thar presentations. 🙂
The company I work for is about to embark on an Enterprise Metadata initiative. So I thought I’d write an introductory blog post on what metadata is, and how I think of metadata within the Enterprise. So here goes …
What is metadata?
So here’s the Wikipedia page on metadata. Blah, blah, blah. To me, saying metadata is “data about data” is about as about as useful as saying information is information about information. I mean, it’s a bit recursive don’t you think?
I view metadata as descriptive information about a “thing”, where the “thing” is anything that can be represented as a concept. Depending on the context, this “thing” could be a person, a topic, a piece of content, an ad, a real-world entity like a truck or a house, or even an abstract concept like “love” or “beauty”. Any piece of information, or “semantics”, describing the underlying entity can be viewed as metadata.
The descriptive information can be information about the thing itself (for example the name and address of a business), or the relational context of the thing (for example, as we will see below, related content or persons associated with that thing).
What is “enterprise” metadata?
Enterprise metadata, therefore, is descriptive information that describes the core concepts within an enterprise – customers, ads, content, business units, you name it – and the web of things that are related to it.
So how is metadata represented in information systems? Well, it can be represented in many ways. It can be represented as fields in a table (i.e. a relational database), as tags, as categories, or as Properties associated with an Object in code, or even associated with a variable baked into the code (bad, bad programming!). Metadata can be represented as “structured” data (as in a relational database or markup language), or “semi-structured” or “unstructured” data (as in a Word document).
Representing metadata as a Graph
However, the most powerful way of representing metadata that is highly-relational and subject to change is with a graph. Here, we don’t mean “graph” in the sense of a visual representation of data in Excel. But rather in the mathematical sense of the term, as a network of nodes and links (or edges). In social networking, this is how people are related together … in terms of a Social Graph.
With graph-based data representations, there’s essentially no difference between data and metadata. What is viewed as data from one perspective, can be viewed as metadata from another perspective. For more on this topic, see my previous posts here and here.
Listings Metadata – an example
With this introduction, let’s consider what metadata might be associated with a Business Listing. So traditionally, when one thinks of a Business Listing, one might think of something that looks like this:
In this example above, you would probably say that the metadata associated with this Listing is the name of the business, the location of the business, phone #, etc.
However, what if the content associated with a business listing was displayed on an entire page, like this:
Here we see the entire page is full of metadata – or associated content – about the business listing. We have the business listing itself, but we also have all sorts of additional information/content associated with the listing: comments, editorial reviews, a map showing where the business is located, perhaps pricing information about the business’ products, and even a video supplied by the proprietor. We may also have descriptive tags associated with the listing or business, as well as people who “subscribe” or “follow” the listing, and want to be notified of updates to the listing.
Now the listing is less the descriptive data associated with the physical image of the listing provided in the initial example, but more like a “concept”, with all sorts of associated content and metadata – some basic textual information, and a whack of related content and even the social context of community members who might be interested in the listing, or who have contributed content.
This “web of related content” associated with the listing can be represented as a “graph” (as discussed above), which forms a “web” of related objects of content associated with the listing.
In Summary …
And that is really how I view metadata. It’s the immediate information that describes or characterizes a “thing”. But it’s also the web of contextual information that is associated with the thing – related content, user-generated content, social context, and so forth.