Recently there’s been a lot of people lamenting the sheep like mentality of picking RDBMS (and with it ORMs) as the way to model persistence, without first considering solutions that do not suffer the object-relational impedance mismatch.
Many of the arguments for having to use RDBMS’ are easily shot down, such as the relentless requirements for adhoc reporting against production data (If your OLTP and OLAP are the same DB you are doing it wrong™.) But just because the arguments for picking an RDBMS are often ill-considered, the reasons for abandoning it also seem to suffer from some depth of consideration.
Let me be clear that I do my best to stay away from RDBMS’ whenever i can. I have plenty of scars from supporting large production DB environments over the years and there are lots of pain points in writing web applications against RDBMS’. I, too, love schema-less, document and object databases. They make so much sense. I rabidly follow the MongoDB and Riak mailing lists and prototype projects with them and others NoSQL tech, such as Db4o. However, following those lists it is clear to me that a) they are still re-discovering lessons painfully learned by RDBMS folks and b) my knowledge of working with these systems when something goes wrong is woefully behind my knowledge of the same for RDBMS.
So yes, marvel at the simplicity of mapping your object model to a document model, or even serialize that object graph using an object or graph DB. But don’t just concentrate on what they do better for development, ignoring the day-to-day production support issues. Take a minute and see if you can answer these questions for yourself:
Every RDBMS has some kind of profiling tool and process list. And on the ORM side, Ayende‘s Uberprof is doing a fantastic job of bringing additional transparency to many ORMs. Do you have any similar tools for the alternative persistence layer? Do you know what’s blocking your writes, your reads? What’s slowing down your map/reduce? What indicies, if applicable, are being hit? And if you’re using a sharded setup, profiling just got an order of magnitude more complicated.
Key/value stores are much faster than even primary key hits on RDBMS. And document databases let you store the entire data hierarchy instead of normalizing them across foreign key tables making graph retrieval cheap too.
But as NoSQL goes beyond simple key retrieval with query APIs and map/reduce, concurrency concerns sneak back in along with the query power. Many NoSQL stores are still using single threaded concurrency per node or at least data silos (read: table locking).In RDBMS land, mysql was the last one to solve that and it did it 6-7 years ago.
Another set of tools you are guaranteed to find with any RDBMS are utilities for recovering corrupted data and index files. Or at the very least utilities for extracting data from them in case of catastrophic failure.
With many NoSQL stores using memory mapped files, corruption on power loss or DB crash is not uncommon. Does you persistence choice have ways to recover those files?
Most DBs have non-blocking DB dumps. Almost all have replication. Both are valid mechanisms.
Some NoSQL stores use replication to address the problem, others seem to punt on it by using redundant data duplication across nodes. But unless your redundant/replica nodes are geographically co-located, it’s not the same as being able to go back to a backup on catastrophic loss.
So you say, you don’t care if your data gets corrupted or that you can’t do live backups, because it all gets replicated to a safe server. Well, much like going back to tape only to discover that your back-up process hasn’t actually backed up anything, do you have the tools to ensure that your replicas are up to date and didn’t get the corruption replicated into them?
A lot of these production level and back-up related issues are not even something developers think about, because with the maturity of RDBMS’ their maintenance and back-up are often tightly integrated into the sysadmin’s processes. If you don’t think you need to care about the above questions, chances are you have others doing it for you. And in that case, it’s vital that your sysadmins are versed the in NoSQL tool you are choosing before you throw the operations requirements over the wall at them.
Maybe you have all those questions covered for your NoSQL tool of choice. Google, Facebook, LinkedIn do. But likely, you don’t. Maybe you don’t have them covered for any RDBMS that you know either. But here’s the difference: These problems have been tackled in painstaking detail in thousands of RDBMS production environments. So, when you hit a wall with an RDBMS, chances are you can find an answer and get yourself out of that production mess.
The relative novelty and deployment size of most NoSQL solutions means you can’t easily fall back on established production experience. Until you have that same certainty when, not if, you face problems in production, you can’t really say that you objectively evaluated all choices and found NoSQL to be the superior solution to your problem.
I set up a new dev machine last week and decided to give win7 a try. Most recent dev setup was using win2k8 server and it’s still my favorite dev environment. Fast, unobtrusive, things just worked.
Win7 appeared to be a different story, reminding me of the evil days of Vista. I had expected it to be more like Win2k8 server, but it just wasn’t. I was trying to be zen about the constant UAC nagging and just get used to the way it wanted me to work. But two days in, it just came to a head and after wasting countless hours trying to work within the security circus it set up, i was ready to pave the machine.
Here’s just a couple of things that were killing me:
All these things need administrator privileges. But wait, I am an administrator, so what’s going on? It appears that being an administrator is more like being in the sudoers file on unix. I have the right to invoke commands in the context of an administrator, but my normal actions aren’t. I tried to work around this with registry hacks, shortcuts set to run as administrator and so on, to try to get things to start-up with administrator privs by default, but Visual Studio 2k8 just refused to play along. You cannot set it up so that you can double-click on a solution and it launch the solution as administrator in Win7. And even if you start VS as administrator, you cannot drag&drop files to it since it’s now running in a different context as Explorer.
And if you ask MS Connnect about this you’ll find that like anything of value the issue has been closed as “By Design.”. Ok, look buddy, just because you designed a horrible user experience doesn’t mean the problem can just be dismissed.
But why was win2k8 so much better an experience, a nagging voice kept asking. Turns out that on win2k8, i just run as Administrator. Win7 never gave that option (and you have to do some cmdline foo to enable the account.) Being a unix guy as well, running dev in what is root, just felt distasteful. But distaste or not, it’s the key for actually being able to do productive development work in windows. As soon as I became THE Administrator, instead of an administrator, all was smooth again.
Stupid lesson learned.