fluent interface

Linq2MongoDB: Building a Linq Provider for MongDB

This weekend has been a hack-a-thon, trying to build a simple linq provider for MongoDB. I’m using Sam Corder, et al.’s excellent C# MongoDB Driver as the query pipeline, so my provider really is just a translator from Linq syntax to Mongo Document Query syntax. I call it a hack-a-thon, because it’s my first linq provider attempt and, boy, is that query translator state machine ugly already. However, I am covering every bit of syntax with tests, so that once i understand it all better, i can rewrite the translator in a cleaner fashion.

My goals for this provider is to replace a document storage layer i’ve built for a new notify.me project using NHibernate against mysql. This is in no way a judgment against NHibernate. It just happens that for this project, my schema is a heavily denormalized json document database. While fluent NHibernate made it a breeze to let me map it into mysql, it’s really an abuse of an RDBMS. It was a case of prototyping with what you know, but now it’s time to evaluate whether a document database is the way to go.

Replacing existing NHibernate code does mean that, eventually, i want the provider to work with POCO entities and use a fully strong-typed query syntax. But that layer will be built on top of the string-key based version i’m building right now. The string-key based version will be the primary layer, so that you never loose any of the schema-less flexibility of MongoDB, unless you choose to.

Basic MongoDB queries

So, lacking an entity with named properties to map against, what does the syntax look like right now? First thing we need is an IQueryable<Document> which is created like this:

var mongo = new Mongo();
var queryable = mongo["db"]["collection"].AsQueryable();

Given the queryable, the queries can be built using the Document indexer like this:

var q = from d in queryable where (string)d["foo"] == "bar" select d;

The Document returns an object, which means a cast is unfortunately required on one side of the conditional. Alternatively, Equals, either the static or instance version, also works, alleviating the need for a cast:

var q = from d in queryable where Equals(d["foo"], "bar") select d;
// OR
var q = from d in queryable where d["foo"].Equals("bar") select d;

Better, but it’s not as nice as operator syntax would be, if we could get rid of the casts..

As it turns out there is a number of query operators in MongoDB that don’t have an equivalent syntax in Linq, so a helper class to generate query expression was already needed. The helper is instantiated via the Document extension method .Key(key), giving us the opportunity to overload operators for the various types recognized by MongoDB’s BSON. This allows for the following conditional syntax:

var q = from d in queryable
        where d.Key("type") == "customer" &&
              d.Key("created") >= DateTime.Parse("2009/09/27")
              d.Key("status") != "inactive"
        select d;

IN and NOT IN

In addition to normal conditional operators, the query expression helper class also defines IN and NOT IN syntax:

var in = from d in queryable where d.Key("foo").In("bar", "baz") select d;

var notIn = from d in queryable where d.Key("foo").NotIn("bar", "baz") select d;

The helper will be the point of extension to support more of MongoDB’s syntax, so that most query definitions will use the d.Key(key) syntax.

findOne, limit and skip

Linq has matching counter parts of MongoDB’s findOne(), limit() and skip(), in First or FirstOrDefault, Take and Skip respectively, and the current version of Linq provider already supports them.

What’s missing?

There is a lot in Linq that will likely never be supported, since MongoDB is not a relational DB. That means joins, sub-queries, etc. will not covered by the provider. Anything that does map to MongoDB’s capabilities, though, will be added over time. The low hanging fruit are Count() and order by, with group by following thereafter.

Surprisingly, || (or conditionals) are not going to happen as fast, since aside from or type queries using the .In syntax, it is not directly supported by MongoDB. In order to perform || queries, the query has to be written as a javascript function, which would basically mean that as soon as a single || shows up in the where clause the query translato would have to rewrite all other conditions in javascript as well. So, that’s a bit more on the nice to have end of the spectrum of priorities.

Ready to go!

I will most likely concentrate on the low hanging fruit and then work on the POCO query layer next, since my goal is to be able to try out MongoDB as an alternative to my NHibernate code.

All that said, the code described above works now and is ready for some test driving. It’s currently only in my branch on github, but I hope it will make it into the master soon.

Stupid ExtensionMethod tricks

I have yet to decide whether Extension Methods in C# are a boon or bane. I’ve already several times, been frustrated by Intellisense not showing me a method that was legal somewhere else, until I could figure out what using statement brought that extension method into scope. On one hand Extension Methods can be used to simplify code, on the other, I see them as the source of much confusion as they become popular.

Worse yet, they have potential for more about than the blink tag, imho.

The one place I see extension methods being instrumental is in defining fluent interfaces, yet another practice I have yet to decide whether I am in favor of or not. Partially, because I don’t see them as intrinsically easier to read. Partially because they allow for much syntactic abuse.

So today, I created a fluent interface for an operation that I wish was just support in the language in the first place — the between operator. It exists in some SQL dialects and is a natural part of so many math equations. I wish I could just write:

if( 0.5 < x < 1.0 )
{
  // do something
}

Instead, I’ll settle for this:

if( x.Between(0.5).And(1.0) )
{
  // do something
}

The first part is easy, it’s just an Extension Method on double. And if I just had it take the lower and upper bound, then we would have been done. But this is where the fluent interface bug bites me and I want to say And. This means, that Between can’t return a boolean. It needs to return the result of the lower bound test and the value to be tested. That means that Between returns a helper class, which has one method And, which finally returns the boolean value.

public static class DoubleExtensions
{

  public static BetweenHelper Between(this double v, double lower)
  {
    return new BetweenHelper(v > lower, v);
  }

  public struct BetweenHelper
  {
    public bool passedLower;
    public double v;

    internal BetweenHelper(bool passedLower, double v)
    {
      this.passedLower = passedLower;
      this.v = v;
    }

    public bool And(double upper)
    {
      if (passedLower && v < upper)
      {
        return true;
      }
      else
      {
        return false;
      }
    }
  }
}

That’s a lot of code for a simple operation and it’s still questionable whether it really improves readability. But it is a common enough operation if you have a lot of bounds checking, that it might be worth throwing into a common code dll. I’ve yet to make up my mind, I mostly did it because i wanted to play with the syntax.