A place to keep my thoughts on programming

February 22, 2010 geek , , , ,

NoSQL is the new Arcadia

Recently there’s been a lot of people lamenting the sheep like mentality of picking RDBMS (and with it ORMs) as the way to model persistence, without first considering solutions that do not suffer the object-relational impedance mismatch.

Many of the arguments for having to use RDBMS’ are easily shot down, such as the relentless requirements for adhoc reporting against production data (If your OLTP and OLAP are the same DB you are doing it wrong™.) But just because the arguments for picking an RDBMS are often ill-considered, the reasons for abandoning it also seem to suffer from some depth of consideration.

Let me be clear that I do my best to stay away from RDBMS’ whenever i can. I have plenty of scars from supporting large production DB environments over the years and there are lots of pain points in writing web applications against RDBMS’. I, too, love schema-less, document and object databases. They make so much sense. I rabidly follow the MongoDB and Riak mailing lists and prototype projects with them and others NoSQL tech, such as Db4o. However, following those lists it is clear to me that a) they are still re-discovering lessons painfully learned by RDBMS folks and b) my knowledge of working with these systems when something goes wrong is woefully behind my knowledge of the same for RDBMS.

Pick the best tool

So yes, marvel at the simplicity of mapping your object model to a document model, or even serialize that object graph using an object or graph DB. But don’t just concentrate on what they do better for development, ignoring the day-to-day production support issues. Take a minute and see if you can answer these questions for yourself:

Can you troubleshoot performance problems?

Every RDBMS has some kind of profiling tool and process list. And on the ORM side, Ayende‘s Uberprof is doing a fantastic job of bringing additional transparency to many ORMs. Do you have any similar tools for the alternative persistence layer? Do you know what’s blocking your writes, your reads? What’s slowing down your map/reduce? What indicies, if applicable, are being hit? And if you’re using a sharded setup, profiling just got an order of magnitude more complicated.

What about concurrency on non-key accesses?

Key/value stores are much faster than even primary key hits on RDBMS. And document databases let you store the entire data hierarchy instead of normalizing them across foreign key tables making graph retrieval cheap too.

But as NoSQL goes beyond simple key retrieval with query APIs and map/reduce, concurrency concerns sneak back in along with the query power. Many NoSQL stores are still using single threaded concurrency per node or at least data silos (read: table locking).In RDBMS land, mysql was the last one to solve that and it did it 6-7 years ago.

What tools to you have to recover a corrupted data file?

Another set of tools you are guaranteed to find with any RDBMS are utilities for recovering corrupted data and index files. Or at the very least utilities for extracting data from them in case of catastrophic failure.

With many NoSQL stores using memory mapped files, corruption on power loss or DB crash is not uncommon. Does you persistence choice have ways to recover those files?

What’s your backup strategy?

Most DBs have non-blocking DB dumps. Almost all have replication. Both are valid mechanisms.

Some NoSQL stores use replication to address the problem, others seem to punt on it by using redundant data duplication across nodes. But unless your redundant/replica nodes are geographically co-located, it’s not the same as being able to go back to a backup on catastrophic loss.

How do you know your replicas are working?

So you say, you don’t care if your data gets corrupted or that you can’t do live backups, because it all gets replicated to a safe server. Well, much like going back to tape only to discover that your back-up process hasn’t actually backed up anything, do you have the tools to ensure that your replicas are up to date and didn’t get the corruption replicated into them?

Do your sysadmins share your comfort level?

A lot of these production level and back-up related issues are not even something developers think about, because with the maturity of RDBMS’ their maintenance and back-up are often tightly integrated into the sysadmin’s processes. If you don’t think you need to care about the above questions, chances are you have others doing it for you. And in that case, it’s vital that your sysadmins are versed the in NoSQL tool you are choosing before you throw the operations requirements over the wall at them.

The tool you know

Maybe you have all those questions covered for your NoSQL tool of choice. Google, Facebook, LinkedIn do. But likely, you don’t. Maybe you don’t have them covered for any RDBMS that you know either. But here’s the difference: These problems have been tackled in painstaking detail in thousands of RDBMS production environments. So, when you hit a wall with an RDBMS, chances are you can find an answer and get yourself out of that production mess.

The relative novelty and deployment size of most NoSQL solutions means you can’t easily fall back on established production experience. Until you have that same certainty when, not if, you face problems in production, you can’t really say that you objectively evaluated all choices and found NoSQL to be the superior solution to your problem.

2 to “NoSQL is the new Arcadia”

  1. Bhaskar says...

    The following comments apply to GT.M (, a key-value database engine (the paradigm is an array reference with multiple subscripts) which I manage.

    Can you troubleshoot performance problems?

    Yes. There is not a single tool, but there are a collection of tools (some part of GT.M and some part of UNIX/Linux) that form a capable toolbox for troubleshooting performance problems.

    What about concurrency on non-key access?

    I don't fully understand your issues with non-key access, but in GT.M you can bracket your code with TStart / TCommit commands and GT.M ensures ACID properties for the bracketed code. Unlike many databases, GT.M uses optimistic concurrency control to ensure ACIDity. (Details on request; I want to keep this response brief.)

    What tools do you have to recover a corrupted data file?

    GT.M comes with a Database Structure Editor which is a low level tool to repair damaged database files. However, damaged databases are so rare that as a practical matter, there are no more experts who can repair databases with one hand tied behind their backs when beeped at 3am. When a system crashes, yes, the database file is nominally damaged, but when the system comes up, a recover operation using the journal file makes the database whole (and ensures ACIDity on a multi-region database).

    What's your backup strategy?

    There is an online backup utility that provides a transaction-consistent snapshot of a multi-region logical database even as the application continues running.

    Also, an originating instance on which business logic is running can stream logical database updates to as many as sixteen replicating instances (each of which can stream to 16 more; and so on). In the event the originating instance goes down for a planned or unplanned event, any of the replicating instances can become the originating instance.

    How do you know your replicas are working?

    There are utility programs which monitor replicating instances that you can query. Also, you can fire application level queries at the replicating instances (updates are permitted only on originating instances).

    Do your sysadmins share your comfort level?

    Yes. Most of our sysadmins are actually UNIX/Linux sysadms. GT.M operations are designed to be very comfortable to a UNIX/Linux sysadm.

    Apropos the comments about novelty and deployment size, GT.M first went into production in 1986 and has continuously evolved since. It is the legal system of record for tens of millions of bank accounts around the world. The largest production sites have environments with individual database files in the hundreds of GB, and logical databases (consisting of one or more individual database files) to several TB.

  2. dm says...

    These are great questions. I'll try to answer for MongoDB which is the nosql product I know:

    >Can you troubleshoot performance problems?
    Yes, see explain() and and Http console in the docs.

    >What about concurrency on non-key accesses?

    >What tools to you have to recover a corrupted data file?
    See db.repairDatabase() and db.collection.validate() in the docs, and the dump utilities. And –syncdelay.

    >What's your backup strategy?

    >How do you know your replicas are working?
    See db.getReplicationInfo(), db.printReplicationInfo(), db.printSlaveReplicationInfo()

    >Do your sysadmins share your comfort level?
    It's certainly true the space is new and it is going to take a while for admins to get used to the new systems.
    Commercial support and training are available for MongoDB from 10gen.

Leave a comment