Skip to content

db40 indexing and query performance

Indexing on db4o is a bit non-transparent, imho. There's barely a blurp in their Documentation app and it just tells you how to create an index and how to remove it. But you can't easily inspect that one exists, or whether it's being used. So i spent a good bit of time today trying to figure out why my queries were so slow, was an index created and if so, was it being used? The final answer is, if querying is slow in db4o, you're not using an index, because, OMG, you'll know when you do an indexed query.

Index basics

Given an object such as

public class Foo
{
  public string Bar;
}

you create an index, globally (meh) for that object on all databases you create thereafter, with this call:

Db4oFactory.Configure().ObjectClass(typeof(Foo)).ObjectField("Bar").Indexed(true);

So far, straight forward enough. But let's say you're using a property? Well, db4o does its magic by inspecting your underlying storage fields, so you have to index them, not the properties that expose them. That means if our object was supposed to have a readonly property Bar, like this:

public class Foo
{
  private string bar;
  public Foo(string bar)
  {
    this.bar = bar;
  }
  public string Bar { get { return bar; } }
}

then the field you need to index is actually the private member bar:

Db4oFactory.Configure().ObjectClass(typeof(Foo)).ObjectField("bar").Indexed(true);

Given this idiosyncrasy, the obvious question is "what about automatic properties?". Well, as of right now the answer is, no such luck, because you'd have to reflect the underlying storage field that is created and index it, and you don't get any guarantees that field is named the same from compiler to compiler or version to version. That probably also means, that automatic properties are dangerous all around, because you may never get your data back if the storage changes, although on that conclusion i'm just speculating wildly.

Query performance

Index in hand, I decided to populate a DB, always checking if the existing item already existed, using a db4o native query. That started at 1 ms query time and then linearly increased with every item added. That sure didn't seem like an indexed search to me. I finally discovered a useful resource on the db4o site, but unfortunately it's behind a login, so google didn't help me find it and my link to it will only take you to the login. That's a shame because this bit of information ought to be somewhere in big bold letters!

You must have the following DLLs available for Native Queries to be optimized into SODA queries, which apparently is the format that hits the index:

  • Db4obects.Db4o.Instrumentation.dll
  • Db4objects.Db4o.NativeQueries.dll
  • Mono.Cecil.dll
  • Cecil.FlowAnalysis.dll

The query will execute fine, regardless of their presence, but the performance difference between the optimized, index using query and the unoptimized native query is orders of magnitude. My queries went from 100-500ms to 0.01ms, just by dropping those DLLs into my executable directory. Yeah, that's a useful change.

Interestingly enough, the same is not required for linq queries. They seem to hit the index without the extra help (although just to even run, Mono.Cecil and Cecil.FlowAnalysis need to be present, so here you at least get an error). There currently appears to be about 1ms overhead for parsing linq into SODA, but i'll take that hit for the syntactic sugar.

Conclusions

I'm pretty happy with simplicity and performance of db4o so far. It seems like an ideal local, queryable persistence layer. The way it works does want to make me abstract my data model into simple data objects that are then converted into business entities. I'd rather have the attribute based markup of ActiveRecord, but that's not a deal breaker.