Posts Tagged ‘Emil Eifrem’

NoSQL and Graph Databases – Neo4j’s Emil Eifrem at QCON London 2010

April 2, 2011 Leave a comment

Really enjoyed Neo4j’s Emil Eifrem’s presentation on NoSQL and Graph Databases at QCON London 2010. The presentation can be viewed at the preceding link, and the slides can be viewed below:

A very nice overview and contrasting of the four main NoSQL database categories: (i) Key-value stores, (ii) BigTable clones, (iii) Document databases, and (iv) Graph databases. One particular insight I picked up from the presentation is the pervasiveness of key-value data representations across all four categories. Quoting Eifrem:

[Document databases – e.g. CouchDB and MongoDB] are inspired by Lotus Notes. CouchDB was founded by the guy that wrote Lotus Notes at IBM, and it basically has a data model of a collection of key-value pairs that they call documents – a JSON document – and then a collection of those, sometimes hierarchically organized. …

The fourth category are Graph databases … The data model here is nodes, with relationships between nodes. And then key-value pairs that you can attached to both nodes and relationships.

The prominence of the Key-Value structure relates to two fundamental advantages to NoSQL datastores: (i) their ability to manage flexible schemas and complex data structures, and (ii) their ability to scale. Eifrem makes the case that different families of NoSQL databases make different design decisions that involve tradeoffs between ease of scaling vs. ease of representing complex data. This is illustrated by the slide below:

Again quoting Eifrem from the presentation:

If you look at [these 4 families of NoSQL databases] they’re all about scaling. But there are two aspects to scaling: data complexity … and scaling to size. If you map these models you see they’re positioned differently along [the two axes].

We have Key Value stores [at the top left] – an extremly simple data model, which means it’s poor at handling complex data. It’s just a hash table right. But the fact that it has such a simple data model means that it’s really easy to scale out …

The Big Table clones have a less simple, more capable data model that can capture [semi-structured data]. But have slightly less ability to scale to size. … It’s more difficult to get HBase to scale to get Voldermort to scale to insane size.

More over to the right we have the document databases. A more capable data model, but you can’t push to scale to the size [of the previous models.]

And finally all the way out to the right are the Graph databases. It’s the data model which is most capable of dealing with complexity. It’s easiest to model complex domains. But it’s most challenging to get it to scale to size.

Eifrem continues:

The interesting thing about these data models is that they’re all isomorphic. If you have data, you can squeeze it into a graph database or into a key value store, or into [the other two models].

For example, we sometimes jokingly say about document databases that if you want a document database, just take a graph database but remove the relationships. The nodes are key-value pairs just like the documents. So from a data model perpsective, a graph database is clearly a superset of a document database.

And one document is sort of like the entire key value store in the key value store model. Now this is a pretty theoretic exercise … so when it comes down to specifics, obviously there are a bunch of things that [differentiate] a document database from a graph database, in terms of the REST API, in terms of how we handle indexes, and other things like that.

That’s great.

To understand more about the fundamentals of the Neo4j database, please see The Neo Database: a Technology Introduction.