This weekend has been a hack-a-thon, trying to build a simple linq provider for MongoDB. I’m using Sam Corder, et al.’s excellent C# MongoDB Driver as the query pipeline, so my provider really is just a translator from Linq syntax to Mongo Document Query syntax. I call it a hack-a-thon, because it’s my first linq provider attempt and, boy, is that query translator state machine ugly already. However, I am covering every bit of syntax with tests, so that once i understand it all better, i can rewrite the translator in a cleaner fashion.
My goals for this provider is to replace a document storage layer i’ve built for a new notify.me project using NHibernate against mysql. This is in no way a judgment against NHibernate. It just happens that for this project, my schema is a heavily denormalized json document database. While fluent NHibernate made it a breeze to let me map it into mysql, it’s really an abuse of an RDBMS. It was a case of prototyping with what you know, but now it’s time to evaluate whether a document database is the way to go.
Replacing existing NHibernate code does mean that, eventually, i want the provider to work with POCO entities and use a fully strong-typed query syntax. But that layer will be built on top of the string-key based version i’m building right now. The string-key based version will be the primary layer, so that you never loose any of the schema-less flexibility of MongoDB, unless you choose to.
So, lacking an entity with named properties to map against, what does the syntax look like right now? First thing we need is an IQueryable<Document> which is created like this:
var mongo = new Mongo(); var queryable = mongo["db"]["collection"].AsQueryable();
Given the queryable, the queries can be built using the Document indexer like this:
var q = from d in queryable where (string)d["foo"] == "bar" select d;
The Document returns an object, which means a cast is unfortunately required on one side of the conditional. Alternatively, Equals, either the static or instance version, also works, alleviating the need for a cast:
var q = from d in queryable where Equals(d["foo"], "bar") select d;
// OR
var q = from d in queryable where d["foo"].Equals("bar") select d;
Better, but it’s not as nice as operator syntax would be, if we could get rid of the casts..
As it turns out there is a number of query operators in MongoDB that don’t have an equivalent syntax in Linq, so a helper class to generate query expression was already needed. The helper is instantiated via the Document extension method .Key(key), giving us the opportunity to overload operators for the various types recognized by MongoDB’s BSON. This allows for the following conditional syntax:
var q = from d in queryable
where d.Key("type") == "customer" &&
d.Key("created") >= DateTime.Parse("2009/09/27")
d.Key("status") != "inactive"
select d;
In addition to normal conditional operators, the query expression helper class also defines IN and NOT IN syntax:
var in = from d in queryable where d.Key("foo").In("bar", "baz") select d;
var notIn = from d in queryable where d.Key("foo").NotIn("bar", "baz") select d;
The helper will be the point of extension to support more of MongoDB’s syntax, so that most query definitions will use the d.Key(key) syntax.
Linq has matching counter parts of MongoDB’s findOne(), limit() and skip(), in First or FirstOrDefault, Take and Skip respectively, and the current version of Linq provider already supports them.
There is a lot in Linq that will likely never be supported, since MongoDB is not a relational DB. That means joins, sub-queries, etc. will not covered by the provider. Anything that does map to MongoDB’s capabilities, though, will be added over time. The low hanging fruit are Count() and order by, with group by following thereafter.
Surprisingly, || (or conditionals) are not going to happen as fast, since aside from or type queries using the .In syntax, it is not directly supported by MongoDB. In order to perform || queries, the query has to be written as a javascript function, which would basically mean that as soon as a single || shows up in the where clause the query translato would have to rewrite all other conditions in javascript as well. So, that’s a bit more on the nice to have end of the spectrum of priorities.
I will most likely concentrate on the low hanging fruit and then work on the POCO query layer next, since my goal is to be able to try out MongoDB as an alternative to my NHibernate code.
All that said, the code described above works now and is ready for some test driving. It’s currently only in my branch on github, but I hope it will make it into the master soon.
Been using NHibernate a lot recently, and just love having POCO data objects. This means that i can just hand out objects and manipulate them and then save them back to the DB without ever exposing my persistence model. For this purpose, I’d been using Data Access Objects to give me access to the objects. But like stored procedures, after a while you DAOs have this mess of Find methods on them that either have many overloads or, even worse, optionally nullable parameters. Never mind the acrobatics you jump through once you want to provide methods that query entities, based on some other entity, that’s abstracted by its on DAO.
The option was to expose the NHibernate query api (talking ICriteria here, never touched HQL because the reason i went NHibernate in the first place is because i didn’t want queries in strings), but that didn’t taste right, so i added more methods on my DAO instead, hiding the criteria queries.
Once i started playing with the experimental Linq support, i didn’t change my abstraction, just started using Linq instead of criteria inside the DAO. But last week, I finally broke down and just exposed session.Linq<T>(); as an IQueryable<T> on my DAO. For example my Account DAO, AccountService now simply looks like this:
public interface IAccountService
{
Account CreateAccount(JObject json);
Account GetAccount(int accountId);
IQueryable<Account> Accounts { get; }
void Save(Account account);
}
NHibernate is completely invisible, I have full query capabilities and even mocking has become easier, just returning a List<T>.AsQueryable() from my mock DAO. I was able to remove all my Find methods. I did leave the by Id accessor GetAccount in there, just because it's the predominant use case, and i didn't want to litter code with identical Linq queries.
I have to say, i like this approach a lot.