In a previous blog post, I made the comment:
I’ll be posting further on the Semantic Web in the coming weeks, and I’ll explore both how graph-like data representation differs from traditional relational modeling, and the benefits such a representation provides over more traditional data modeling approaches.
This post briefly elaborates on this topic by exploring two examples of post-relational data representations.
Key-Value Data Stores
The ReadWriteWeb had an interesting article from February 2009 titled
Is the Relational Database Doomed? If I understand this correctly, the issue here is basically indexing vast amounts of items indexed by a key – for example, documents on the Web.
This data management strategy is the norm for massively scalable indexing requirements. The ReadWriteWeb article discusses key-value data stores in the content of Cloud Computing.
*** Update 1 (11/09)
Interesting comment (comment #2) in the ReadWriteWeb article. Here it is:
There is also a new crop of databases called “graph databases” gaining traction (with a model based on nodes, relationships and properties), one of them being the open-source neo4j (http://neo4j.org).
Using graphs to structure information is very powerful and intuitive.
Exactly! See the section below. Also check out the remaining Comments associated with the ReadWriteWeb article. Fantastic discussion!
*** End Update 1
Graph-like Data Representations
Graph-based data representations (for example, RDF) provide a interesting contrast to traditional relational data modeling approaches, and are critical to the vision of the Semantic Web. Here are some of the key differences between graph-based and relational data representations:
- “Triple” as the key data contruct – Graph-like data representations (for example, RDF) treat metadata and data the exact same way. Both metadata and data are expressed as a “triple” – a subject-predicate-object relation. The entire graph is nothing but a collection of these “triple” statements.
- Triples are composed to build the Graph – The database concept of a “join” is accomplished through the flexible “composition” of triple statements. This would appear to be a much more flexible way to “compose” semantic structures dynamically, across multiple disparate data sources with different data representations/semantics.
- Metadata IS Data – Data and Metadata about “concepts” are both expressed in the same manner – as Triple statements. This provides an extremely flexible and scalable representation of knowledge representation (i.e. data + semantics), because a new data element or metadata dimension can be added by simply adding another triple to the data store.
The above being said, relational data representations still have some important advantages where the data schema is well-known and relatively static. They also tend to be a good choice for transactional systems where the schema for key entities is, again, well-known and non-volatile. However, the schemas are more brittle and less malleable and composable compared to graph-based data representations.
*** Update 2 (11/09)
Here’s another key point about graphs, in contrast to tree-like data representations such as XML, from the Semantic Web Programming book (p. 72):
Graphs do not have roots. Some other representations, for example XML, are tree based. In an XML document, the root element of the tree has a special significance because all the other elements are oriented with respect to the document root. When trying to merge two trees, it can be difficult to determine what the root node should be because the structure of the tree is so important to the overall significance of the data. In an RDF graph, by contrast, no single resource is of any inherent significance as compared to any other.
*** End Update 2
Linked Data Initiative and RDF
A powerful example of the impact of RDF, graph-like data representations is the W3C’s Linked Data initiative. The Linked Data initiative, spearheaded by the Web’s founder Tim Berners-Lee, is an initiative to put data on the Web using URIs and RDF. I’ve blogged about Linked Data in a previous post.
BTW, I love this quote from Tim Berners-Lee in response to a question on how Linked Data relates to the Semantic Web:
“Linked Data is the Semantic Web done as it should be. It is the Web done as it should be.”
The above quote was taken from this article from 2008.
Triple Stories – Key-Value Data Stores for the Semantic Web
Interestingly, a Triple Store is a Key-Value data store purpose-built to manage RDF Triples. Basically, a Triple Store is the Semantic Web’s version of an RDBMS.
In conclusion …
I’ll be commenting more on graph-based data representations in the coming weeks, as well as foundational Semantic Web standards such as RDF and OWL.