Archive

Posts Tagged ‘NoSQL’

Ad Targeting at AOL using Couchbase

December 31, 2011 Leave a comment

Nice presentation by Matt Ingenthron from August 2011 on ad targeting at AOL using Couchbase.

Slides for the presentation can be viewed here.

glenn

Categories: AOL, NoSQL Databases Tags: , ,

NoSQL and Graph Databases – Neo4j’s Emil Eifrem at QCON London 2010

April 2, 2011 Leave a comment

Really enjoyed Neo4j’s Emil Eifrem’s presentation on NoSQL and Graph Databases at QCON London 2010. The presentation can be viewed at the preceding link, and the slides can be viewed below:

A very nice overview and contrasting of the four main NoSQL database categories: (i) Key-value stores, (ii) BigTable clones, (iii) Document databases, and (iv) Graph databases. One particular insight I picked up from the presentation is the pervasiveness of key-value data representations across all four categories. Quoting Eifrem:

[Document databases – e.g. CouchDB and MongoDB] are inspired by Lotus Notes. CouchDB was founded by the guy that wrote Lotus Notes at IBM, and it basically has a data model of a collection of key-value pairs that they call documents – a JSON document – and then a collection of those, sometimes hierarchically organized. …

The fourth category are Graph databases … The data model here is nodes, with relationships between nodes. And then key-value pairs that you can attached to both nodes and relationships.

The prominence of the Key-Value structure relates to two fundamental advantages to NoSQL datastores: (i) their ability to manage flexible schemas and complex data structures, and (ii) their ability to scale. Eifrem makes the case that different families of NoSQL databases make different design decisions that involve tradeoffs between ease of scaling vs. ease of representing complex data. This is illustrated by the slide below:

Again quoting Eifrem from the presentation:

If you look at [these 4 families of NoSQL databases] they’re all about scaling. But there are two aspects to scaling: data complexity … and scaling to size. If you map these models you see they’re positioned differently along [the two axes].

We have Key Value stores [at the top left] – an extremly simple data model, which means it’s poor at handling complex data. It’s just a hash table right. But the fact that it has such a simple data model means that it’s really easy to scale out …

The Big Table clones have a less simple, more capable data model that can capture [semi-structured data]. But have slightly less ability to scale to size. … It’s more difficult to get HBase to scale to get Voldermort to scale to insane size.

More over to the right we have the document databases. A more capable data model, but you can’t push to scale to the size [of the previous models.]

And finally all the way out to the right are the Graph databases. It’s the data model which is most capable of dealing with complexity. It’s easiest to model complex domains. But it’s most challenging to get it to scale to size.

Eifrem continues:

The interesting thing about these data models is that they’re all isomorphic. If you have data, you can squeeze it into a graph database or into a key value store, or into [the other two models].

For example, we sometimes jokingly say about document databases that if you want a document database, just take a graph database but remove the relationships. The nodes are key-value pairs just like the documents. So from a data model perpsective, a graph database is clearly a superset of a document database.

And one document is sort of like the entire key value store in the key value store model. Now this is a pretty theoretic exercise … so when it comes down to specifics, obviously there are a bunch of things that [differentiate] a document database from a graph database, in terms of the REST API, in terms of how we handle indexes, and other things like that.

That’s great.

To understand more about the fundamentals of the Neo4j database, please see The Neo Database: a Technology Introduction.

glenn

Building a Scalable Database on top of Apache Cassandra at SimpleGeo

April 1, 2011 Leave a comment

Mike Malone presents SimpleGeo’s use of Apache Cassandara to scale geospatial data at Cloudstock 2010:

Slides from a various similar presentation delivered by Malone at Strange Loop 2010 can be found here.

glenn

Key-Value Programming Model in Redis – Billy Newport

March 31, 2011 Leave a comment

Nice presentation by Billy Newsport at QCon San Franscisco 2009 on Key-Value programming using Redis.

glenn

Scaling with Redis – Amir Salihefendic Google TechTalk from December 2010

March 31, 2011 Leave a comment

Nice introduction to Redis – key-value data store – and the data management challenges that it solves – delivered by Amir Salihefendic at a Google TechTalk in December 2010:

Presentations slides can be found here.

glenn

Understanding Graph Databases – Marko Rodriguez

March 13, 2011 Leave a comment

The best introduction to Graph Databases that I’ve seen, from Marko Rodriguez at WindyCityDB 2010 event in June:

Slides to the talk can be found here. There is one particular slide in the presentation that I found extremely beautiful. It is this one:

What is so extremely elegant about this structure is that the Index structures are of the same form (i.e. a Graph) as the relational structure of the core domain model. Quoting Rodriguez:

So now this is what a Graph Database starts to look like. You have your domain model, this is the human world that we think about. And then you have these others structures on top – that how you are partitioning thatworld. And that’s more the computer’s interpretations of the world.

And again it’s just nodes and edges, it’s one atomic entity.

I can’t speak to the computational efficiency of this model. But clearly there’s a conceptual elegance that feels very natural. I highly recommend watching the entire presentation.

glenn

Introduction to MongoDB – Document-oriented approach to NoSQL

March 12, 2011 Leave a comment

In the video below, Mike Dirolf – a software engineer at 10gen – provides an introduction to MongoDB:

I wasn’t able to find the slides for this presentation, but a similar presentation can be found here.

glenn

Categories: NoSQL Databases Tags: ,

Schema Design for NoSQL data in Riak – Sean Cribbs of Basho

March 12, 2011 Leave a comment

Informative presentation by Sean Cribbs of Basho Technologies on schema design when writing data-driven apps on top of NoSQL database Riak.

A nice overview of thinking about schemas for key-value store databases. Presentation slides can be found here.

Cribbs also give a brief overview of Links and Link Walking in Riak in the brief video below:

The is a pretty simple form of link traversal, but it does inspire me to understand more about graph-oriented databases and traversal mechanisms.

glenn

Categories: NoSQL Databases Tags: ,

Overview of Apache Cassandra – Eben Hewitt at Strange Loop 2010

March 12, 2011 Leave a comment

Nice presentation by Eben Hewitt on Apache Cassandra at Strange Loop 2010.

Slides for the presentation can be found here. Hewitt is also the author of Cassandra: The Definitive Guide, published by O’Reilly in November 2010.

glenn

Introduction to NoSQL – John Nunemaker presentation from June 2010

March 11, 2011 Leave a comment

Probably the best introduction I’ve seen to NoSQL databases:

Presentation slides for Nunemaker’s talk can be found here. For a nice analysis of the whole NoSQL space see Is the Relational Database Doomed? (ReadWriteWeb article from February 2009).

glenn

Categories: NoSQL Databases Tags: