Skip to content

Linq2MongoDB: Building a Linq Provider for MongDB

This weekend has been a hack-a-thon, trying to build a simple linq provider for MongoDB. I'm using Sam Corder, et al.'s excellent C# MongoDB Driver as the query pipeline, so my provider really is just a translator from Linq syntax to Mongo Document Query syntax. I call it a hack-a-thon, because it's my first linq provider attempt and, boy, is that query translator state machine ugly already. However, I am covering every bit of syntax with tests, so that once i understand it all better, i can rewrite the translator in a cleaner fashion.

My goals for this provider is to replace a document storage layer i've built for a new notify.me project using NHibernate against mysql. This is in no way a judgment against NHibernate. It just happens that for this project, my schema is a heavily denormalized json document database. While fluent NHibernate made it a breeze to let me map it into mysql, it's really an abuse of an RDBMS. It was a case of prototyping with what you know, but now it's time to evaluate whether a document database is the way to go.

Replacing existing NHibernate code does mean that, eventually, i want the provider to work with POCO entities and use a fully strong-typed query syntax. But that layer will be built on top of the string-key based version i'm building right now. The string-key based version will be the primary layer, so that you never loose any of the schema-less flexibility of MongoDB, unless you choose to.

Basic MongoDB queries

So, lacking an entity with named properties to map against, what does the syntax look like right now? First thing we need is an IQueryable<Document> which is created like this:

var mongo = new Mongo();
var queryable = mongo["db"]["collection"].AsQueryable();

Given the queryable, the queries can be built using the Document indexer like this:

var q = from d in queryable where (string)d["foo"] == "bar" select d;

The Document returns an object, which means a cast is unfortunately required on one side of the conditional. Alternatively, Equals, either the static or instance version, also works, alleviating the need for a cast:

var q = from d in queryable where Equals(d["foo"], "bar") select d;
// OR
var q = from d in queryable where d["foo"].Equals("bar") select d;

Better, but it's not as nice as operator syntax would be, if we could get rid of the casts..

As it turns out there is a number of query operators in MongoDB that don't have an equivalent syntax in Linq, so a helper class to generate query expression was already needed. The helper is instantiated via the Document extension method .Key(_key_), giving us the opportunity to overload operators for the various types recognized by MongoDB's BSON. This allows for the following conditional syntax:

var q = from d in queryable
        where d.Key("type") == "customer" &&
              d.Key("created") >= DateTime.Parse("2009/09/27")
              d.Key("status") != "inactive"
        select d;

IN and NOT IN

In addition to normal conditional operators, the query expression helper class also defines IN and NOT IN syntax:

var in = from d in queryable where d.Key("foo").In("bar", "baz") select d;

var notIn = from d in queryable where d.Key("foo").NotIn("bar", "baz") select d;

The helper will be the point of extension to support more of MongoDB's syntax, so that most query definitions will use the d.Key(_key_) syntax.

findOne, limit and skip

Linq has matching counter parts of MongoDB's findOne(), limit() and skip(), in First or FirstOrDefault, Take and Skip respectively, and the current version of Linq provider already supports them.

What's missing?

There is a lot in Linq that will likely never be supported, since MongoDB is not a relational DB. That means joins, sub-queries, etc. will not covered by the provider. Anything that does map to MongoDB's capabilities, though, will be added over time. The low hanging fruit are Count() and order by, with group by following thereafter.

Surprisingly, || (or conditionals) are not going to happen as fast, since aside from or type queries using the .In syntax, it is not directly supported by MongoDB. In order to perform || queries, the query has to be written as a javascript function, which would basically mean that as soon as a single || shows up in the where clause the query translato would have to rewrite all other conditions in javascript as well. So, that's a bit more on the nice to have end of the spectrum of priorities.

Ready to go!

I will most likely concentrate on the low hanging fruit and then work on the POCO query layer next, since my goal is to be able to try out MongoDB as an alternative to my NHibernate code.

All that said, the code described above works now and is ready for some test driving. It's currently only in my branch on github, but I hope it will make it into the master soon.