ILoggable

A place to keep my thoughts on programming

 Subscribe

geekblog
[at]
claassen [dot] net

Powered by Blogger

Thursday, January 21, 2010

Duckpond: Lightweight duck-typing for C#

Edit: Changed As to AsImplementationOf since it's an extension method on object and too likely to collide.

A while back I was talking about Interface Segregation and proposed using either LinFu's DynamicObject or delegate injection. While I played a bit more with delegate injection, in practical use delegate injection turned out to be rather ugly and not really improve readability.

So, I've come back to wanting to cast an object to an interface regardless of what interfaces that object implemented. I wanted this to be as simple and lightweight as possible, so rather than using a dynamic proxy framework, i simply rolled my own IL and wrote a pure proxy that does nothing but call the identical method on the class it wraps.

Introducing DuckPond

DuckPond is a very simple and focused library. It currently adds only a single extension method: object.AsImplementationOf<Interface>

Given a class Duck that implements a number of methods, including Quack:

public class Duck {
  public void Quack(double decibels) {
    ...
  }
  ... various other methods ...
}
we can easily cast Duck to a more limited interface that the class doesn't implement such as:
public interface IQuacker {
  void Quack(double decibels);
}
using the object.AsImplementationOf<T> extension method:
using Droog.DuckPond;

...

var quacker = new Duck().AsImplementationOf<IQuacker>();
That's all there is to it.

But is it fast?

Honestly, i don't know yet. I have not benchmarked the generated classes against virtual method dispatches or LinFu's and Castle's dynamic proxy. I assume it is, since unlike with dyanmic proxy, DuckPond doesn't use an interceptor. Instead it emits Intermediate Language for each call in the interface, dispatching the call against the wrapped instance's counterpart.

Try it, fork it, let me know what you think

The code is available now at GitHub: http://github.com/sdether/duckpond

Labels: , , ,

Sunday, September 27, 2009

Linq2MongoDB: Building a Linq Provider for MongDB

This weekend has been a hack-a-thon, trying to build a simple linq provider for MongoDB. I'm using Sam Corder, et al.'s excellent C# MongoDB Driver as the query pipeline, so my provider really is just a translator from Linq syntax to Mongo Document Query syntax. I call it a hack-a-thon, because it's my first linq provider attempt and, boy, is that query translator state machine ugly already. However, I am covering every bit of syntax with tests, so that once i understand it all better, i can rewrite the translator in a cleaner fashion.

My goals for this provider is to replace a document storage layer i've built for a new notify.me project using NHibernate against mysql. This is in no way a judgment against NHibernate. It just happens that for this project, my schema is a heavily denormalized json document database. While fluent NHibernate made it a breeze to let me map it into mysql, it's really an abuse of an RDBMS. It was a case of prototyping with what you know, but now it's time to evaluate whether a document database is the way to go.

Replacing existing NHibernate code does mean that, eventually, i want the provider to work with POCO entities and use a fully strong-typed query syntax. But that layer will be built on top of the string-key based version i'm building right now. The string-key based version will be the primary layer, so that you never loose any of the schema-less flexibility of MongoDB, unless you choose to.

Basic MongoDB queries

So, lacking an entity with named properties to map against, what does the syntax look like right now? First thing we need is an IQueryable<Document> which is created like this:

var mongo = new Mongo();
var queryable = mongo["db"]["collection"].AsQueryable();
Given the queryable, the queries can be built using the Document indexer like this:
var q = from d in queryable where (string)d["foo"] == "bar" select d;
The Document returns an object, which means a cast is unfortunately required on one side of the conditional. Alternatively, Equals, either the static or instance version, also works, alleviating the need for a cast:
var q = from d in queryable where Equals(d["foo"], "bar") select d;
// OR
var q = from d in queryable where d["foo"].Equals("bar") select d;
Better, but it's not as nice as operator syntax would be, if we could get rid of the casts..

As it turns out there is a number of query operators in MongoDB that don't have an equivalent syntax in Linq, so a helper class to generate query expression was already needed. The helper is instantiated via the Document extension method .Key(key), giving us the opportunity to overload operators for the various types recognized by MongoDB's BSON. This allows for the following conditional syntax:

var q = from d in queryable 
        where d.Key("type") == "customer" &&
              d.Key("created") >= DateTime.Parse("2009/09/27")
              d.Key("status") != "inactive"
        select d;

IN and NOT IN

In addition to normal conditional operators, the query expression helper class also defines IN and NOT IN syntax:

var in = from d in queryable where d.Key("foo").In("bar", "baz") select d;

var notIn = from d in queryable where d.Key("foo").NotIn("bar", "baz") select d;
The helper will be the point of extension to support more of MongoDB's syntax, so that most query definitions will use the d.Key(key) syntax.

findOne, limit and skip

Linq has matching counter parts of MongoDB's findOne(), limit() and skip(), in First or FirstOrDefault, Take and Skip respectively, and the current version of Linq provider already supports them.

What's missing?

There is a lot in Linq that will likely never be supported, since MongoDB is not a relational DB. That means joins, sub-queries, etc. will not covered by the provider. Anything that does map to MongoDB's capabilities, though, will be added over time. The low hanging fruit are Count() and order by, with group by following thereafter.

Surprisingly, || (or conditionals) are not going to happen as fast, since aside from or type queries using the .In syntax, it is not directly supported by MongoDB. In order to perform || queries, the query has to be written as a javascript function, which would basically mean that as soon as a single || shows up in the where clause the query translato would have to rewrite all other conditions in javascript as well. So, that's a bit more on the nice to have end of the spectrum of priorities.

Ready to go!

I will most likely concentrate on the low hanging fruit and then work on the POCO query layer next, since my goal is to be able to try out MongoDB as an alternative to my NHibernate code.

All that said, the code described above works now and is ready for some test driving. It's currently only in my branch on github, but I hope it will make it into the master soon.

Labels: , , , , , ,

Thursday, September 24, 2009

About Concurrent Podcast #3: Coroutines

Posted a new episode of the Concurrent Podcast over on the MindTouch developer blog. This time Steve and I delve into Coroutines, a programming pattern we use extensively in MindTouch 2009 and one that i'm also trying out as an alternative to my actor based Xmpp code in Notify.me.

Since there isn't a native coroutine framework in C#, we're using the one provided by MindTouch Dream. It's built on top of the .NET iterator pattern (i.e. IEnumerable and yield) and makes the assumption that all Coroutines are asynchronous methods using Dream's Result<T> object for coordinating the producer and consumer of a return values. Steve's previously blogged about Result. Since those posts there's also been a lot of performance improvements and capability improvements to Result committed to trunk, primarily providing robust cancellation with resource cleanup callbacks. For background on coroutines, you can also check out previous posts I'vee written.

The cool thing about asynchronous coroutines compared to an actor model is that call/response based actions can be written as a single linear block of code, rather than separate message handlers whose contiguous flow can only be determined by examining the message dispatcher. With a message dispatcher that can correlate message responses with suspended coroutines, sending and waiting for a message in a coroutine can be made to look like a method call without blocking the thread, which, especially with message passing concurrency, is vital, since a response isnn't in any way guaranteed to happen.

I'm due to write another post on how to use Dream's coroutine framework, but in the meantime i highly recommend checking out Dream from mindtouch's svn. Lot's of cool concurrency stuff in there. trunkis under heavy development, as we work towards Dream profile 2.0, but 1.7.0 is stable and production proven.

Labels: , , , ,

Tuesday, September 15, 2009

Composing remote and local linq queries

One of the cool things with Linq is that queries are composable, i.e. you can add further query constraints by selecting from an existing query. Nothing is executed until you try to read from the query. This allows IQueryable to compose all the added constraints and transform it into the underlying query structure, most commonly SQL.

However it does come with the pitfall that there are a lot of things legal in Linq expressions that will die at run time. This happens because an expression may not have an equivalent syntax in the transformed language, like calling a method as part of a where clause.

This does not mean that you can't use linq for the portion of the query that is not executable by the provider. As long as you know what expression is affected, you can use query composition to build a query that executes some part remotely and some part against the object model in memory.

Let's suppose we wish to execute the following query:

var q = from e in session.Queryable<Entry>()
    where e.Created > DateTime.Parse("2009/9/1") 
        &&  e.Created < DateTime.Now
        && e.Tags.Contains('foo')
    select e;
But our query provider doesn't understand the extension method that allows us to check the list of Tags. In order for this query to work, that portion must be executed against the result set of the date range query. We could coerce the first portion to a list or array and then query that portion, but that would just force the date query to be materialized before we could prune the set. Instead we want to feed the stream of matching entries into a second query, composing a new query that contains both portions as a single query and won't access the database until we iterate over it.

To accomplish this I created an extension method that coerces the query into a sequence that yields each item as it is returned by the database query:

public static class LinqAdapter {
    public static IEnumerable<T> AsSequence<T>(this IEnumerable<T> enumerable) {
        foreach(var item in enumerable) {
            yield return item;
        }
    }
}
UPDATE: As Scott points out in the comments, my AsSequence just re-implements what is already available in Ling as AsEnumerable. So the above really just serves to explain how AsEnumerable defers execution to enumeration rather than query definition.

Anyway, AsSequence or AsEnumerable allows me to compose the query from server and local expressions like this:

var q = from e in session.Queryable<Entry>()
    where e.Created > DateTime.Parse("2009/9/1") 
        &&  e.Created < DateTime.Now
    select e;
q = from e in q.AsSequence() where e.Tags.Contains('foo') select e;
When q is enumerated, the first expression is converted to SQL and executes against the database. Each item returned from the database is then fed into the second query, which checks its expression and yields the item to the caller, should the expression match. Since q.AsSequence() is used as part of query composition, it does not force the first expression to execute at the time of query definition as q.ToList() would. The additional benefit is that even when q.AsSequence() is executed, it never builds the entire result set in memory as a list to iterate over, but rather just streams each database query result item through its own expression evaluation.

Of course, this still have the performance implications of sending data across the wire and filtering it locally. However, this is not an uncommon problem when SQL alone cannot provide all the filtering. The benefit of this approach is reduced memory pressure on execution, better control when execution occurs and the ability to use Linq syntax to do the secondary filtering.

Labels: ,

Thursday, September 10, 2009

Concurrent Podcast and Producer/Consumer approaches

As usual, I've been blogging over on the MindTouch Developer blog, and since the topics i post about over there have a pretty strong overlap with what I'd post here, I figured i might as well start cross-posting about it here.

Aside from various technical posts, Steve Bjork and I have started recording a Podcast about concurrent programming. It's currently 2 episodes strong, with a third one coming soon. Information on past and future posts can always be found here.

Today's post on the MindTouch dev blog is about the producer/consumer pattern and how i moved from using dedicated workers with a blocking queue to using Dream's new ElasticThreaPool to dispatch work.

Labels: , , , ,

Tuesday, September 08, 2009

Writing a Tag Algebra compiler with Coco/R

This week, i was digging back into the Coco/R implemented parser of DekiScript, tracking down a bug, which turned out to not be in the parsing bits at all. It did, however, get me familiarized with Coco/R again. So I thought i'd give myself an hour to implement the parser for my Tag related boolean algebra with Coco/R. If i could pull it off, forget about the regex/state-machine approach i was considering.

Took me about 15 minutes to set up the classes to represent the intermediate AST and another 30 minutes for the grammar in Coco/R ATG format. After that I wrote a couple of unit tests to check that the parsing was right, only to realize that while AND and OR are left-to-right associative, NOT is right-to-left associative. Figuring out how to adjust the grammar for that actually took me another 10-15 minutes. But overall, I hit the one hour goal.

The Syntax Tree

Before tackling the grammar to parse, I needed to define data structures to represent the parsed syntax tree, which I'd later convert to executable code. The syntax is fairly simple:

(foo+bar)|(^foo+baz)
This can be represented by just 4 tree node types (with common parent TagExpression: The parentheses are just tree structure artifacts, so are not represented in the AST.

The Coco/R grammar

I've broken the grammar up to discuss the various parts, but the below sections represent a single ATG file, the Coco/R grammar format.
COMPILER TagAlgebra

  public TagExpression Result = TagExpression.Empty;
The COMPILER defines the entrypoint PRODUCTIONfor the parser. The following lines. until the next grammar definition, are inserted into the generated Parser, and can be used to inject class fields, extra methods, etc. into the Parser source. The only thing I inserted was a field to hold the root of the AST and initialize it to empty.
IGNORECASE
This tells Coco/R that our grammar is case insensitive.
CHARACTERS
  tab = '\t'.
  eol = '\n'.
  cr = '\r'.
  nbsp = '\u00a0'. // 0xA0 = 'unbreakable space'
  shy = '\u00ad'.  // 0xAD = 'soft-hyphen'
  letter = 'a'..'z'.
  digit = '0'..'9'.
CHARACTERS defines characters that the parser should recognize for matches.
TOKENS
  tag = 
    letter { letter | digit | "_" | ":" }
    .
The only token in the grammar are tag, which is composed from the characters defined above and extra quoted characters.
IGNORE eol + cr + tab + nbsp + shy
IGNORE tells Coco/R what characters have no meaning in parsing the input.

Next come the PRODUCTIONS, i.e. the meat of the grammar. These are the rules for matching input and converting it into code. Coco/R is an LL(1) parser generator, i.e. the grammar must be parsable from Left to Right with Left-canonical derivations and one look-ahead symbol. We also cannot have a loop in our grammar, i.e all possible branches have to lead to a terminal via a unique set of production matches.

PRODUCTIONS

  TagAlgebra                      (. Result = TagExpression.Empty; .)
    =
    [ BinaryExpr<out Result> ]
    .
The first production, is the entry point which, again, sets the result to an empty AST, since the same instance of the parser can parse multiple expressions. It then specifies 0 or 1 BinaryExpr productions.
  BinaryExpr<out TagExpression expr>  (.  expr = null; TagExpression right = null; .)
    =
    NotExpr<out expr> {               (. bool and = false; .)
      (
        "|"
        | "+"                         (. and = true; .)
      )
      NotExpr<out right>              (. expr = and ? (TagExpression)new AndExpression(expr,right) : new OrExpression(expr,right); .)
    }
    .
BinaryExpr is used for both AND and OR expressions, since both take two sub-expressions and combine them with a single operator. The production specifies a left side of a NotExpr followed by an optional operator and another NotExpr. I.e. should our first match happen to not be a BinaryExpr, the parser can fall through to NotExpr return its result instead, without matching the optional production body.
  NotExpr<out TagExpression expr> (. expr = null; int notCount = 0; .)
    =
    { 
      "^"                         (. notCount++; .)
    }
    (
      "(" BinaryExpr<out expr> ")"
      | tag                       (. expr = new Tag(t.val); .)
    )                             (. for(var i=0;i<notCount;i++) expr = new NotExpression(expr); .)
    .

END TagAlgebra.
NotExpr, just like BinaryExpr optionally matches on its operator, requiring only that the end of the operation it matches either a BinaryExpr enclosed by parentheses (i.e. not a circular match back into BinaryExpr, since it requires additional surrounding matches), or it matches a tag, the ultimate terminal in the grammar.

There is one tricky bit in this production, i.e. the NOT operator can match multiple times, which means, we need to accumulate the number of operator matches and build the chain of NOT expressions wrapping the current expression once we know how many, if any, matched.

What to do with the AST

The nice thing with Coco/R is that it adds no runtime dependency at all, building a fully self-contained Scanner and Parser. With these built, it is now possible to take Tag Algebra expressions and turn them into an executable tree of Func calls, as described in "Boolean Algebra for Tag queries".

The grammar could have been written to accumulate the unique tags and construct the Func tree right away, but the two benefits of going to an AST first, is that a)the AST can easily be rendered back into text form (even placing parentheses properly for expressions that previously had not parentheses), and b) the AST can easily be programatically composed with other expressions, or decomposed into sub-expressions, which can be used for caching and other efficiency operations.

I'll probably play with Irony next, but the "no runtime dependency" and existing familiarity made Coco/R the winner this time.

Labels: , , , , , ,

Tuesday, September 01, 2009

Boolean Algebra for Tag queries

Currently working on a tag query engine and couldn't find anything all that useful in the published approaches. I want to do arbitrary boolean algebras against a set of tags in a database, which seems to be out of scope of SQL approaches. All the various tagging schemas out there reduce to either just AND or just OR queries, but not complex logic. However, I want to be able to do something like:
(foo+bar)|(foo+baz)|(bar+^baz)
If there is a way to do this with SQL, i'd love to know. But the way i look at it, i really have to fetch all tags for each item and then do apply that formula to the list of tags on the item.

But let's break down the matching problem itself into something i can execute. Let's assume I've got a simple parser that can turn the above into an AST. Really, i can decompose any variation into three operations, AND(a,b), OR(a,b) and NOT(a). And I can represent those with some simple Func<> definitions:

Func<bool, bool, bool> AND = (a, b) => a && b;
Func<bool, bool, bool> OR = (a, b) => a || b;
Func<bool, bool> NOT = (a) => !a;
Assuming that i have boolean tokens for foo, bar and baz, the expressions becomes:

OR(AND(foo, bar), OR(AND(foo, baz), AND(bar,NOT(baz))))

Now, the above expression can be expressed as a function that takes three booleans describing the presence of the mentioned tags, ignoring any other tags that the item has, returning a boolean indicating a successful match. In C# that expression would look like this:

Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));

Next we need to generate this boolean map from the list of tags on the item. Assuming that the tag list is a list of strings, we can define an extension methods on IEnumerable<string> to generate the boolean map like this:

public static bool[] GetBinaryMap(this IEnumerable<string> tags, string[] mask) {
    var map = new bool[mask.Length];
    foreach(var x in tags) {
        for(var i = 0; i < mask.Length; i++) {
            if(x == mask[i]) {
                map[i] = true;
            }
        }
    }
    return map;
}
And with this we can define a linq query that will return us all matching items:
var mask = new[] { "foo", "bar", "baz"};
Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));
var match = from item in items
            where f(item.Tags.GetBinaryMap(mask))
            select item;

Clearly this is isn't the fastest executing query, since we first had to create our items, each item in which has a collection of tags. But there is a lot of optimizations left on the table here, such as using our tag mask to pre-qualify items, breaking down the AST into sub-matches that could be used against a cache to find items, etc.

But at least we have a fairly simple way to take complex boolean algebra on tags and convert them into something that we can evaluate generically

Labels: , , , ,

Friday, June 19, 2009

Ultimate Interface Segregation: Dependency injection by Delegate

I've been on a bit of a tear about declaring dependency contracts and injecting only what is required. While examining the use of Interfaces in IoC and their shortcomings, I decided that taken to the extreme, dependencies come down to call dependencies, which could be modeled with delegates rather than interfaces. Instead of writing a novel, as I've been prone to, i thought I'd do a shorter post on my approach to this solution, and expand on the implementation in later posts.

To recap, in the SOLID principles, the Interface Segregation Principle states: Clients should not be forced to depend upon interfaces that they do not use. This means that interfaces should be fine-grained enough to expose no more than one responsibility. Taken to the extreme, this could be taken to mean that each interface only has a single method. There are valid SRP scenarios where a responsibility is modeled by more than one call, but let's start with the simplest scenario first, then see how well it applies to more complex responsibilities later.

In C# we have delegates, which describe a single method call. A delegate instance is a reference to a method that encapsulates a specific instance of a class, without exposing the underlying class (unless your delegate is a static method). A delegate can even be used to expose internal, protected and private methods.

Instead of declaring a list of interfaces that the IoC container should inject, classes would define their dependencies as delegates. Taking the example from my duck typing post, we would get the following dependency declarations.

First, we have the same service provider, MessageQueue, which still doesn't need to implement an interface:

public class MessageQueue
{
 public void Enqueue(string recipient, string message) { ... }
 public string TryDequeue(string recipient) { ... }
}
Next, we have the new Producer, now declaring its dependency has a delegate:
public class Producer : IProducer
{
 public delegate void EnqueueDelegate(string recipient, string message);
 public Producer(EnqueueDelegate dispatcher) { ... }
}
And finally, we have the new Consumer, also declaring a delegate for construction time injection:
public class Consumer : IConsumer
{
 public delegate string TryDequeueDelegate(string recipient);
 public Consumer(TryDequeueDelegate inbox) { ... }
}

Think of the delegate as your Method Interface. You could define your dependencies as Func's and Action's, but that would obfuscate your dependencies beyond recognition in most scenarios. By using an explicit delegate, you get to attach the dependency to the class that has the dependency, in addition to having a descriptive signature.

Now, if we were to wire this up manually we'd get something like this:

var queue = new MessageQueue();
IProducer producer = new Producer(queue.Enqueue); 
IConsumer consumer = new Consumer(queue.TryDequeue);
That's simple enough, but not really very scalable, once you get a lot of dependencies to wire up. What we really need is an IoC container that let's us register delegates against classes, instead of having to have instances at dependency declaration time. Delegates can't be cast from one to another and are not, strictly speaking, types, which posts some challenges with creating a type-safe registration interface. There are a number of ways to accomplish this syntax, which I will elaborate on in my next post.

Labels: , , ,

Wednesday, June 17, 2009

C# duck-typing to the rescue for Interface Segregation

Interfaces are the method by which we get multiple inheritance in C#, Java, etc. They are contracts without implementation. We don't get the messy resolution of which code to use from multiple base classes, because there's only one inheritance chain that includes code.

They're useful to let us provide one contract and multiple implementations or simply describe a contract for our code and allow someone else to come along and replace our implementation entirely.

In practice, though, I almost always use them purely for decoupling the contract from the code, so that I can replace the implementation with a mock version in unit tests. Hmm.. So, I use interfaces just to get around the yoke of the type system? Wait, why I am so in love with statically typed languages again?

Right... While the above sounds like the interface exists just so that I can mock my implementation, the real purpose of the interface is the ability for the consuming code to express a contract for its dependencies. It's unfortunate that interface implementation forces them to be attached at the implementation side, which is why I say that Interface attachment is upside down. And it's deep rooted in the language, after all it's called Interface Inheritance not Dependency Contract Definition.

Dynamic Proxy

So, it's not surprising that there is no CLR-level way around this limitation. Fortunately, you can always create a dynamic proxy that wraps your class and implements the Interface. Both Castle's DynamicProxy and LinFu's DynamicProxy are excellent frameworks for writing your own proxies. I've never tested them against each other, but have used both in production and neither showed up as culprits when time for profiling came about.

With a dynamic proxy, you can generate an object that claims to implement an interface but under the hood just has a interceptors that provide the call signature to let you respond correctly or proxy the call on to a class you are wrapping. I've previously covered how you can even use them to have a class inherit from more than one base class via a proxy. This is necessary if you want to remote an object, which requires a base of MarshalByRefObject, but you already have a base class.

However, proxies require a fair bit of hand-rolling so they are not the most lightweight way, development time wise, to attach an Interface.

Duck Typing

What would be really useful would be the ability to cast an object to an interface:

IUser user = new User() as IUser;

The above code would even be compile time verifiable, since we can simply see if the User implements the call signatures promised by IUser. This would be provide us strongly typed Duck Typing -- an object that can quack ought to be able to be treated as a duck.

This is where LinFu goes a step further than just DynamicProxy and provides duck typing as well:

IUser user = DynamicObject(new User()).CreateDuck<IUser>();

DynamicObject's constructor takes an instance of a class to wrap. You can then create a duck from that dynamic object which automatically proxies the given interface and will call the appropriate method on the wrapped class on demand.

Using duck typing to satisfy the Interface Segregation Principle

Saying that you may have a class that has the perfect method signature but doesn't implement an interface you already have, does sound rather contrived. However, forcing a class to implement an interface of your choosing does have some real benefits, aside from being able to abstract an existing class into a mockable dependency:

Clients should not be forced to depend upon interfaces that they do not use
One problem with interfaces is that they tell you everything an implementation can do. And often a class acts as a service that provides functionality to more than one client class, but provides just a single interface. That single interface may expose capabilities that you don't care about.

Instead, interfaces should be fine-grained to only include the methods appropriate to the client. But that's not always feasible. Aside from having a class implement lots of tiny interfaces, the service class does not know about the client's requirements, so it really doesn't know what the interfaces should include. The client, on the other hand, does know and can tailor exactly the right interface it wants as a contract with the dependency.

Suppose we have message queue for passing data between decoupled classes:

public class MessageQueue
{
 public void Enqueue(string recipient, string message) { ... }
 public string TryDequeue(string recipient) { ... }
}
Proper interface segregation would have us create a Dispatcher interface for our message Producer
public interface IMessageDispatcher
{
 void Enqueue(string recipient, string message);
}

public class Producer : IProducer
{
 public Producer(IMessageDispatcher dispatcher) { ... }
}
and an inbox interface for our message Consumer
public interface IMessageBox
{
 string TryDequeue(string recipient);
}

public class Consumer : IConsumer
{
 public Consumer(IMessageBox inbox) { ... }
}
Assuming that MessageQueue does not implement our interfaces (yes, in this case it would not have been a problem to have the class implement them both, but this is a simplified example with obvious segregation lines), we can now configure our IoC container (example uses AutoFac) to create the appropriately configured IProducer and IConsumer, each receiving exactly those capabilities they should depend on:
var queue = new MessageQueue();
var builder = new ContainerBuilder();
builder.Register<Producer>().As<IProducer>();
builder.Register<Consumer>().As<IConsumer>();
builder.Register(c => new DynamicObject(queue).CreateDuck<IMessageDispatcher>()).As<IMessageDispatcher>();
builder.Register(c => new DynamicObject(queue).CreateDuck<IMessageBox>()).As<IMessageBox>();

using (var container = builder.Build())
{
 var producer = container.Resolve<IProducer>();
 var consumer = container.Resolve<IConsumer>();
}

But what about C# 4.0 & Dynamic

While I think Dynamic objects in C# 4.0 are very cool, as of right now, they seem to have skipped over duck typing, at least in a strongly typed fashion.

Sure, once you have a dynamic instance, the compiler will let you call whatever signature you wish on it and defers checking until execution time. But that means we have no contract on it, if used as a dependency, nor can we use it to dynamically create objects that provide implementations for existing contracts. So, you've have to wrap a dynamic object with a proxy, in which case, LinFu's existing duck typing already provides a superior solution.

The lack of casting to an interface, imho was already oversight with C# 3.0, which introduced anonymous classes that are so convenient for Linq projections, but can't be passed out of the scope of the current method, due to a lack of type.

So don't expect C# 4.0 to do anything to let you more easily attach your contracts at the dependency level. For the foreseeable future, this remains the territory of Dynamic Proxy.

Next time: Delegate injection

However, there is another way to deal with dependency injection that provides a fine-grained contract and imposes no proxies nor other requirement on the classes providing the dependency: Injection of the required capabilities as delegates

I've been experimenting with a framework to make this as painless as traditional IoC containers make dependency resolution. It's still a bit rough around the edges, but I hope to have some examples to write about soon.

Labels: , , , ,

Monday, June 15, 2009

Interfaces put contracts at the wrong end of the dependency

Over the years I've hopped back and forth between static and dynamically typed languages, trying to find the sweet spot. I currently still favor managed, static languages like C# and Java. But I agree that sometimes I have to write a whole lot of code to express my intent to the compiler without any great benefit. And no, i don't think that code-generation is a way out of this.

What's not to like about static?

I won't go over the usual arguments for dynamic, which basically boil down to "you can do what you want without having to explain it to the type system first". I'll stipulate that that is why most people choose dynamic, but it's not a significant a pain point with static for me. I did spend a good many years in dynamic land and switched to static of my own free will. Instead, I want to concentrate on some specific cases.

Fine grained basic types are usually overkill

I generally don't care whether i am dealing with int or long, or int64, or double vs. decimal. For the most part, number would do just fine. I think these types can be useful optimizations for both speed and memory, but certainly something that would be better optimized by a tracing rather than declaratively at compile time. And having to call special converters all over the place to go between these various types, is just not useful. I think type inference can handle these scenarios just fine.

Execution speed is a red-herring

I'm not saying that there aren't areas where speed isn't important enough to drop down to C/C++ levels, but anywhere where you are willing to use a statically typed managed language, a dynamic language can either perform right now or is only a short time away from being performant. After all, i already sacrifice performance to be in managed land, so a little more sacrifice for the development benefits seems arbitrary. Besides, recent javascript optimizations paint a pretty good picture that tracing and JIT compilation can make dynamic code fast enough for most scenarios.

Declaration and Discoverability of Dependencies

So what's the pain point of dynamic for me? I care neither about the locking down of a class to handle only statically defined things, nor about the guarantee that a type is really a particular type. Frankly "types" are not important to me. However, declaration of dependencies in a discoverable fashion is!

What do I mean by that? A class should tell me via a machine discoverable contract what it expects the passed instance to be capable of. If I use a class that has service dependencies at construction time, or instance dependencies at method invocation time, I want to be able to discover this in code, rather than by looking at documentation or going by naming convention. After all, hasn't documentation been deemed a code smell? Why is it then, that in dynamic languages the expected capabilities of the object to be passed is not expressed in a fashion that can be discovered without breaking encapsulation and looking at what the code expects to do with the passed instance?

Sure, dynamic languages pride themselves on not requiring an IDE. This is often held up as a strength and a key reason why they are faster to develop in. In my experience, however, I find dynamic languages faster for small things but as the project grows, my velocity decreases:

Declaring Requirements instead of Capabilities

All the above is not a problem in static languages, but at the cost of inflexible, rigid types. Types are a solution that are a trojan horse of limitations that are completely orthogonal to the problem of dependency discovery. A class requiring an object of type User should have no dependence on the implementation details of User. Having such a dependence would be a clear violation of encapsulation. The class should simply want an instance of an object that has the capabilities of a User, i.e. it has a requirement for an object that exposes certain capabilities. The class should be able to declare a contract for the instance to be passed in.

In C# and similar languages this contract is an Interface. Interfaces allow the declaration of capabilities without an implementation. In interface inheritance, a class commits to providing the contract expressed by the interface. So a class requiring an a specific interface can declare its requirements without any knowledge of implementation. All right, problem solved! Right?

Interfaces as Contracts are upside down

Interfaces unfortunately do not solve the problem, because the way the are attached to implementation inverts the dependence hierarchy. I.e. User implements an interface its author declared, called IUser. Now IUser becomes my dependency, which is still a declaration outside of my control. I should not care where the implementation comes from. But an interface, puts the burden on a third party to implement my interface, which means I cannot use anything pre-existing, since it wouldn't have implemented my interface, or the burden is put on the third party to provide an interface tome to use, which means another third party solving the same problem, provides their own interface.

This may be wonderful for mocking and unit testing, but it still ties me to a contract not of my own making and usually violates the Interface Segregation Principle: Clients should not be forced to depend upon interfaces that they do not use

. So interfaces provide a solution, but they still enforce rigidity that has no benefit to the definition of dependency contracts.

Contracts for dependencies

At the end of the day, I have less to quarrel about dynamic vs. static, than tribal definition (naming conventions, documentation, etc.) of dependencies vs. declarative definition of dependencies. Until I can discover what a class expects as its input without being told or cracking open the man page, I will suffer the yoke of interfaces. Especially since I can still use Dynamic Proxies to fake a class implementing an interface -- in yet another "more code than you'd think" way of working, tho.

Are there any static or dynamic languages that have tackled declarative contracts that are not attached at the implementation side that I'm not aware of? It seems like a sweet spot that isn't yet addressed.

Update: I realize i left those wanting a solution to the interface issue in C# wanting. There are two ways to solve the problem that I'm aware of, Duck Typing as offered by LinFu and delegate injection, both of which I will cover in future posts.

Labels: , , ,

Friday, June 12, 2009

When using won't Dispose

The using statement/block in C# (not the one used to pull in namespaces) is meant to aid in the IDisposable pattern, i.e. cleaning up resources that won't be handled by garbage collection and to do so in a deterministic fashion. I.e. everything that finalization is not. It really is just syntactic sugar to avoid try/finally all the time. I.e. this
using(var disposable = new DisposableObject())
{
  // do something with disposable
}
is pretty much the same as
var disposable = new Disposable();
try
{
  // do something with disposable
}
finally
{
  disposable.Dispose();
}

But there is a common pitfall with using, err, using. It's in the first line above: The disposable object is created outside the try/finally block! Now, a constructor failing shouldn't ever have allocated disposable resources, so you're usually safe here. But beware if the construction of the Disposable object is a Method.

using(var disposable = CreateDisposable())
{
  // do something with disposable
}
If CreateDisposable() fails after it has created Disposable, you'll end up with a resource leak!

You can easily avoid this by catching failure in your method and cleaning up, but you can't use using for this purpose, since success would return an already disposed instance. A safe implementation of CreateDisposable() looks like this:

public Disposable CreateDisposable()
{
  Disposable disposable = new Disposable();
  try {
    // do some extra initialization of disposable
  }
  catch
  {
    if( disposable != null )
      disposable.Dispose();
    throw;
  }
  return disposable; 
}

IDisposable is an important pattern in .NET, but because it is a pattern rather than a construct enforced by the compiler, it is a common source of "leaks" in .NET. using is very useful for handling the disposable, but it is important to remember that the only code covered by the automatic disposition logic is the code inside the using block, not the code inside the using statement.

Labels: , , ,

Thursday, June 11, 2009

Searching a Tree of Objects with Linq, Revisited

A while back, I wrote about searching through a tree using linq to objects. That post was mostly snippets of code about delegates, lambda's, yield and how it applies to linq -- more a technical exploration than an example. So I thought I'd follow it up with concrete extension methods to make virtually any tree searchable by Linq.

Linq, IEnumerable<T>, yield

All that is required to search a tree with Linq is creating a list of all nodes in the tree. Linq to Objects can operate on IEnumerable<T>. Really, Linq to objects is a way of expressing operations we've been doing forever in loops with if/else blocks. That means there isn't any search magic going on, it is a linear traversal of all elements in a set and examining each to determine whether it matches our search criteria.

To turn a tree into a list of node we need to walk and collect all children of every node. A simple task for a recursive list that carries along a list object to stuff every found node into. But there is a better way, using yield to return each item as it is encountered. Now we don't have to carry along a collection. Iterators using yield implement a pattern in which a method can return more than once. For this reason, a method using yield in C# must return an IEnumerable, so that the caller gets a handle to an object it can traverse the result of the multiple return values.

IEnumerable is basically an unbounded set. This is also the reason why unlike collections, it does not have a Count Property. It is entirely possible for an enumerator to return an infinite series of items.

Together IEnumerable<T> and yield are a perfect match for our problem, i.e. recursively walking a tree of nodes and return an unknown number of nodes.

Two types of Tree Traversal

Depth First

In depth-first traversal, the algorithm will dig continue to dig down a nodes children until it reaches a leaf node (a node without children), before considering the next child of the current parent node.

Breadth First

In breadth-first traversal, the algorithm will return all nodes at a particular depth first before considering the children at the next level. I.e. First return all the nodes from level 1, then all nodes from level 2, etc.

Tree to IEnumerable<T> Extension methods

public static class TreeToEnumerableEx
{
 public static IEnumerable<T> AsDepthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
 {
  yield return head;
  foreach (var node in childrenFunc(head))
  {
   foreach (var child in AsDepthFirstEnumerable(node, childrenFunc))
   {
    yield return child;
   }
  }
 }

 public static IEnumerable<T> AsBreadthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
 {
  yield return head;
  var last = head;
  foreach(var node in AsBreadthFirstEnumerable(head,childrenFunc))
  {
   foreach(var child in childrenFunc(node))
   {
    yield return child;
    last = child;
   }
   if(last.Equals(node)) yield break;
  }
 }
}

This static class provides two extension methods that can be used on any object, as long as it's possible to express a function that returns all children of that object, i.e. the object is a node in some type of tree and has a method or property for accessing a list of its children.

An Example

Let's use a hypothetical Tree model defined by this Node class:

public class Node
{
 private readonly List<Node> children = new List<Node>();
 
 public Node(int id)
 {
  Id = id;
 }
 
 public IEnumerable<Node> Children { get { return children; } }

 public Node AddChild(int id)
 {
  var child = new Node(id);
  children.Add(child);
  return child;
 }
 
 public int Id { get; private set; }
}
Each node simply contains a list of children and has an Id, so that we know what node we're looking at. The AddChild() method is a convenience method so we don't expose the child collection and no node can ever be added as a child twice.

The calling convention for a depth-first collection is:

IEnumerable<Node> = node.AsDepthFirstEnumerable(n => n.Children);

The lambda expression n => n.Children is the function that will return the children of a node. It simply states given n, return the value of the Children property of n. A simple test to verify that our extension works and to show us using the extension in linq looks like this:

[Test]
public void DepthFirst()
{
 // build the tree in depth-first order
 int id = 1;
 var depthFirst = new Node(id);
 var df2 = depthFirst.AddChild(++id);
 var df3 = df2.AddChild(++id);
 var df4 = df2.AddChild(++id);
 var df5 = depthFirst.AddChild(++id);
 var df6 = df5.AddChild(++id);
 var df7 = df5.AddChild(++id);

 // find all nodes in depth-first order and select just the Id of each node
 var IDs = from node in depthFirst.AsDepthFirstEnumerable(x => x.Children)
        select node.Id;

 // confirm that this list of IDs is in depth-first order
 Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}

For breadth-first collections, the calling convention is:

IEnumerable<Node> = node.AsBreadthFirstEnumerable(n => n.Children);

Again, we can test that the extension works like this:

[Test]
public void BreadthFirst()
{
 // build the tree in breadth-first order
 var id = 1;
 var breadthFirst = new Node(id);
 var bf2 = breadthFirst.AddChild(++id);
 var bf3 = breadthFirst.AddChild(++id);
 var bf4 = bf2.AddChild(++id);
 var bf5 = bf2.AddChild(++id);
 var bf6 = bf3.AddChild(++id);
 var bf7 = bf3.AddChild(++id);

 // find all nodes in breadth-first order and select just the Id of each node
 var IDs = from node in breadthFirst.AsBreadthFirstEnumerable(x => x.Children)
       select node.Id;

 // confirm that this list of IDs is in depth-first order
 Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}

Searching Trees

The tree used in the example is of course extremely simple, i.e. it doesn't even have any worthwhile data to query attached to a node. But these extension methods could be used on a node of any kind of tree, allowing the full power of Linq, grouping, aggregation, sorting, projection, etc. to be used on the tree.

As a final note, you may wonder, why bother with depth-first vs. breadth first? After all, in the end we do examine every node! There is however one particular case where the choice of algorithm can be very important: You are looking for one match or a particular number of matches. Since we are using yield, we can terminate the traversal at any time. Using the FirstOrDefault() extension on our Linq expression, the traversal would stop as soon as one match is found. And if have any knowledge where that node might be in the tree, the choice of search algorithm can be a significant performance factor.

Labels: , , , , , , ,

Sunday, June 07, 2009

Log4Net filtering by logger

Since i keep overwriting my App.Config with revision control configs and promptly forgetting how to set up filters, I figured i might was well write a brief article on filters here, so i have a place to look it up next time :)

My basic tenet with logging is that lots of debug statements is good. Now some may say that it just gets too noisy after a while, so don't put them in unless you need to debug. The problem is that you usually don't know when you'll need to debug, and if the code is deployed on a server or worse with a customer, generating a new release so they can run the debug is a burden you shouldn't have to shoulder. And commenting out log statements (or even conditional compiles) are a code smell reeking of Console.Writeline debugging

But it does get noisy! And noisy also means slow! However with log4net, noisy and performance degradation are non-arguments, since aside from levels, it has excellent filtering, which not only reduces the noise, but also cuts out 99% of the logging overhead. Worst case debugging example I've had was tracking down behavior in the motion control software for Full Motion Racing. The physics calculations in this software ran between 60Hz and 100Hz. When i added debug logging in that physics loop, the rate dropped down to about 20Hz because of I/O overhead, and this was with either RollingFile or Udp appenders. Needless to say, motion became jerky and unusable for a rider. But I got the debug data i needed. Disabling those logging statements with filters rather than removing left no appreciable degradation in the performance of the physics loop.

So, again, lots of debug logging == good. Because when you need that data, you need that data. But you may want to ship your code with a log4net configuration that pre-filters the loggers you know to be noisy, so that a user turning on debug logging doesn't overwhelm them.

How to filter

The basic deal with log4net filters is that they are applied in order and the first filter that matches short-circuits the matching logger. I.e. if the first filter is a DenyAllFilter, nothing else will even be considered, since it matches all loggers. That means there are generally two approaches to filtering, whitelisting and blacklisting. It also means that if you match a logger and a subsequent filter would remove that logger, the subsequent filter is never reached, since consideration of the filter chain stops at the first match

Whitelisting by logger

<filter type="log4net.Filter.LoggerMatchFilter">
  <loggerToMatch value="Only.Logger.To.Match" />
</filter>
<filter type="log4net.Filter.DenyAllFilter" />
LoggerMatchFilter filters default to acceptOnMatch being true, i.e. if omitted, the filter is a accepts (includes in logging) on match. The above will only emit logging statements for the Only.Logger.To.Match logger, since all others will hit the DenyAllFilter and be excluded.

Blacklisting by logger

<filter type="log4net.Filter.LoggerMatchFilter">
  <loggerToMatch value="Logger.To.Filter.Out" />
  <acceptOnMatch value="false" />
</filter>
This filter will show all logging statements except those for Logger.To.Filter.Out.

LoggerMatchFilter also matches on partial namespaces, which is very useful when you have a noisy namespace, but one logger in that namespace that you do want in your logs such as:

<filter type="log4net.Filter.LoggerMatchFilter">
  <loggerToMatch value="Noisy.Namespace.But.Important" />
</filter>
<filter type="log4net.Filter.LoggerMatchFilter">
  <loggerToMatch value="Noisy.Namespace" />
  <acceptOnMatch value="false" />
</filter>
With these filters, all of Noisy.Namespace.* will be filtered out, except for Noisy.Namespace.But.Important.

Filters are your friend

So, don't get locked into thinking that your choice for verbosity lies only in log levels and once committed to a level it's either all the noise or none of it. But to keep things sane, pre-populate your config with filters, because you are the one that knows best which loggers are of general use and which are special case only.

Labels: , ,

Monday, February 16, 2009

How I became a SOLID advocate and didn't even know it

There's been a lot of chatter of late about SOLID. It started with Uncle Bob talking on a couple of podcasts about the SOLID principles, but it really got the chatter going when Joel Spolsky and Jeff Atwood started talking smack about Uncle Bob on the Stackoverflow podcast. Since then battle lines seem to have been drawn between the TDD/SOLID folks and those who finally found a champion fighting to get them out from under the pattern yoke. Ok, that's a bunch of hyperbole, but it seems that the main objection to SOLID seems to be that having a list of Principles like that just feels bureaucratic and dogmatic, which rubs free thinking developers the wrong way. Since I've been practicing SOLID longer than I've been aware of, I wanted to walk through how I got here and to illustrate that the principles espoused by Uncle Bob are not a yoke, but rather helpful guidelines that will save you a lot of grief down the line.

The overall goal of development should be delivering software that solves the stated problem. Beyond that I do have some personal guiding principles for programming, i.e. what I personally want to get out of it once the raison d'ĂȘtre is accomplished:

  1. I want to learn new things rather than maintain old things
  2. I never want to have to fix the same bug twice
  3. I don't want to be prevented from doing something better by legacy decisions

To me, 1) means writing code that's easy to maintain, so that maintenance does not become a time suck interfering with new things, 2) means that i protect myself from regressions and 3) means that my code should allow me to refactor it without screwing up 1) or 2). So far that seems pretty non-controversial.

As I go on, the common theme will be testing, not because testing is some higher goal and end in itself, but because testing, in my experience, let's me prove that code does what it claims and I didn't break anything else by the addition/change. For those who think that writing tests is a lot of tedium that only leads to test maintenance rather than code maintenance, I can only respond "you're probably doing it wrong" and address that statement under Pain points of TDD below.

Doesn't QA test code?

In the early days of the web, testing was what the programmer did to make sure things didn't break and then you relied on the customer telling you if it didn't work correctly -- the wild west days of CGI scripts. Once I started at MP3.com, testing became more refined via QA. Sure, everybody tested their web apps before handing them off to QA (or at least they should), but there wasn't really any formalized testing on the development side. It was all manual functional testing of firing up the app, trying out the things that should work and looking at logs. QA was responsible for test plans, regression tests, etc.

Early on I switched from web apps to running the databases and with that came dealing with the pain of maintaining schemas when everyone had raw SQL in strings throughout their code. So I set out to write a DB API. Being an OO geek, it quickly morphed from an API into an ORM instead, which abstracted the DB and built the SQL on demand. This gave the DB group more freedom to refactor the database as needed without having to have every developer track down their SQL.

Developing the ORM did mean that I was now out of the QA loop, since my deliverables went to developers and had to work long before QA ever got involved. So I developed test suites that I could run from the shell whenever i changed something. These tests gave me confidence that I didn't just break live apps with code for a new app.

Part of having these type of tests, however, was a giant WTF in itself. Why was I constantly risking the codebase by futzing around in the guts? Yes, the ORM suffered horribly from fragile baseclass problems and I had designed myself into a number of corners that could only be addressed by modifying the base. Learning this lesson, I spent a lot more time trying to build object hierarchies to provide the proper hooks to let subclasses extend the functionality without affecting or overriding the base functionality. Little did I know that I had started practicing SOLID's O, or the Open/Closed Principle (OCP).

Unit tests, but not really

When I started at Printable Technologies a number years later, I became part of an effort to migrate the existing application from ASP to ASP.NET. Like most legacy ASP applications it was the usual single code file per page mixing data access, business logic and html rendering. It was something we did not want to repeat. We set out to separate our logical layers carefully so that we would get greater re-use and transparency of what was going on in the application. I wanted to start off on the right foot and played around with NUnit to try out unit testing our new code base. This was before TestDriven.Net or similar tools for integration into the IDE. But at least it gave me an automated test suite, rather than a series of console apps. I thought, "Now I'm doing unit testing!".

Except, like many test adoptees, I really wasn't. I was doing functional tests with a test running harness. I was hitting test databases to check my DB Abstraction Layer, and as you moved up the object hierarchy, the graphs of dependent code supporting the code to be tested got deeper and deeper. Testing something that was at the front end, really tested all pieces beneath it. It certainly gave us good test coverage, but the tests were fragile, took a lot time to write, and it was often difficult to dig out what actually failed. That didn't seem right and made me wonder if this unit testing thing was really so great. But the test coverage did help to achieve my personal guiding principles, so I wasn't ready to give up on it just yet. I just need to work through the pain points.

Towards actual unit testing

The two major problems we had in our tests were that each test really tested many things at once and that the setup to get to test running was tedious and pulled in too many dependencies.

The first was a problem with our class design. We needed to break classes up into smaller functional pieces that each could be tested before testing the whole. While testability was the driving force for this change, it requiring this change was really a symptom of how badly coupled things were, i.e. that the design was flawed. This is a pattern in testing that has since repeated itself many times: If your test is fragile or difficult, it's generally the fault of the design of the code to be tested not the testing process.

There were a bunch of monolithic classes that did lots of things at once, which meant a test failure could be one of a hundred things. I started to break up classes into smaller pieces, each dedicated to one functional area. Now I could take those helper classes that the main class was composed of and test them independently. When the composite broke but none of the components did, I knew where to look for failure. This compartmentalization just happens to be Single Responsibility Principle (SRP).

So far so good, but our second problem still dogged us. Tests were still annoying to write the further you got into business logic, since everything built on the supporting infrastructure. I had heard about mocking and started looking into it hoping for some magic bullet that could just create me fakes (which had been easy back in the perl days). This was before TypeMock hit the scene, so I couldn't create a fake version of my concrete type. It was either making everything virtual (yuck) or using interfaces instead of concrete classes. Interfaces won out, but because I had yet to discover the D in SOLID, introducing a lot of interfaces also led us to a pattern that itself became a major pain point. This pattern was the use of singletons and the static factory methods, both well meaning static accessors to get around the inability of new'ing up an interface. But before realizing this separate morrass, using interfaces had lead to using the L, Liskov Substitution and I, Interface Segregation principles.

Mocking out classes with interfaces exposed a couple of places where we had an abstract baseclass and code accepting the base class using typeof() to determine what class was actually provided. Well, with an interface being passed in instead of the abstract baseclass, the typeof() logic still worked, until the first test with a mock object was run. That failure illustrated what a bad idea that bit of code was. If we say we require an object implementing an interface, any object implementing that interface should work, and that right there is the Liskov Substitution Principle (LSP). Making sure that our interfaces really represented the required functionality enabled mocking and cleaned out some inappropriate knowledge embedded in code.

Another aspect of mocking (rolling mocks by hand rather than using a mocking framework) was that lazyness dictated that you didn't want to implement a lot of things just to get a test working. So large interfaces got widdled down to just the methods required by the object taking in that interface. And that happens to be the Interface Segregation Principle (ISP).

A brand new pain: wiring up lots of SRP objects

Many of the above principles were only partially applied because they imposed a new pain and a whole new set of plumbing code that was tedious to write and maintain. The issue with separation of concerns and abstracting those concerns with interfaces was two-fold:
  1. You can't new up an interface, so you needed factories everywhere
  2. Suddenly half the code seemed to be plumbing to wire up increasingly complex object graphs.

As I said, a lot of this was dealt with via Singleton's and static factory methods. Both are really just degenerate implementations of the Service Locator Pattern, but we weren't even aware of that. Since this plumbing had it's own set of pain points, we often skipped the abstractions unless we really needed them to keep life simpler. Generally that meant we paid for that convenience in maintenance debt.

When I started writing code for Full Motion Racing, like every new project, I wanted to take the lessons learned from Printable and avoid the pains I had come across. I once again had need for object graphs that required access to singleton type service objects, but wanted to avoid statics as much as I could because of previous experience. I built a repository of objects that I could stuff instances into, providing me a Service Locator. Looking more into how other people were doing this, I came across talk about service locator still being an inappropriate coupling, since it itself is a dependency that had nothing to do with the responsibility of the consuming objects. Instead, services should be passed in at construction time whenever possible. Wow, really? That just seemed to take the pain of wiring up object graphs to unprecedented heights. That just couldn't be how people were writing their code.

Wanting to understand how this way of building decoupled systems could actually work in the real world, I learned about the D of SOLID, or the Dependency Inversion Principle (DIP) also (and maybe more accurately) known as Inversion of Control. In my opinion, DIP may be the last principle mentioned, but in many ways it is the enabling plumbing without which the remaining principles are all well in theory but often feel worse than the disease they aim to cure.

Agile in action

Early use of IoC for Full Motion Racing still relied on a singleton container that factory classes could use to create their dependencies for creating transient objects. Only over time did I learn to trust the container to build up all my objects for me, and learned how to register factories to support lifestyles other than singleton via the container.

It wasn't until I started at Bunkspeed, that I really saw IoC used properly and was able to reap the true benefits of this design pattern. If you've ever seen Bunkspeed's HyperShot or HyperDrive in action you know that the visualizations they create are mind blowing, especially once you realize it's real-time. Needless to say, sitting down with this codebase was initially intimidating. It still is the largest single codebase i've worked on. Maybe some of the distributed web apps I've worked on had more code in total, but they were disparate systems that largely had no interdependence. The main Visual Studio Solution I worked on at Bunkspeed was one application with hundreds of projects all loaded at once.

I assumed this meant lots of branching, lots of areas of expertise where certain people would be responsible for a subset of the code. This was not the case. Everyone was trusted and had authority to modify, extend and refactor everything as they required it. Making a change that required a change much lower down in the system could be made by anyone. Tests ran with every build in addition to a full set of CI servers building various configurations on each check-in. And it all ran smoother than any other shop I'd been in and was easier to ramp up on then other, far simpler projects.

Bunkspeed employs every one of the SOLID principles. Systems were composed of lots of small classes with very limited responsiblity, each being abstracted by an interface. One of the reasons for the many projects rather than fewer, larger projects was that areas of responsiblity were segregated, including their interfaces being in separate DLLs so that low level changes wouldn't cause rebuilds of the entire system. Deep reaching refactors were not the norm but rather an indication that some inappropriate coupling had been done at some previous time and the refactor served to rectify the situation so that technical debt was accruing at much slower rates than is the norm. This was not some academic application of patterns from a book, but a truly agile development shop able to make significant changes with a small team in record time.

SOLID wasn't some rule set put before me, but a natural evolution of trying to make development easier. It wasn't until about 9 months ago that I read Robert C. Martin and Micah Martin's Agile Principles, Patterns, and Practices in C# because it was sitting on a co-worker's desk and for the first time I put names to the patterns I'd been applying this entire time.

Pain points of SOLID

There is definitely a different cost occured by applying SOLID to design. Most of this cost is in navigating the granularity of the design and in this tooling is an important aid to make this not only painless but more productive than the alternatives. The issue is that Visual Studio really isn't all that well suited to navigating large object hierarchies, especially when using interfaces for abstraction. There are those who will point to this pain as evidence of SOLID being a bad practice. "I don't want to have to get special tools just to do development." But if tooling really is your enemy then you probably shouldn't be working with a language like C# in the first place, because it already does rely on the many crutches VS offers up. Try writing C# without an IDE and you'll quickly understand why people love the simple and terse syntax of Ruby and other dynamic Languages. Saying "well, Visual Studio is as much tooling as I accept, beyond that it's ridiculous" is not an argument I can relate to, so if that's the objection to SOLID, I'll have to admit that I can't convince you.

The issue just is that VS does not provide efficient ways to navigate from class to class and from interface to implementers and from implementers to usage of the interface. This is where ReSharper entered the picture for me, and after adding more keybindings for some of their extended commands, the number of classes and abstractions really becomes a non-issue. I simply couldn't do development in C# without ReSharper at this point and that's more because of Visual Studio's shortcomings than anything else (eclipse, for example, provides most of the same features I rely on out of the box).

The other pain created by lack of tooling is that the larger surface area of code and the increased abstraction means that refactoring ususally touch more code than before, meaning that refactoring takes more work. However with proper tooling this also happens automatically. In addition, the flexibility of loose coupling introduced by SOLID generally makes code more pliable to refactoring.

Pain points of TDD

Finally there is the whole concept of TDD itself which to many seems like such a "making work" paradigm. Usually the examples of TDD failure pointed to are fragile tests and lost productivity due to writing tests instead of code.

Fragile tests refers to a simple change breaking a lot of tests and thereby incurring extra work instead of saving work. But if a simple change breaks a lot of tests, it's a clear indication that your design needs another look, because there seems to be coupling that is getting in the way. The only time (ideally at least) that more than one test breaks is when you change the expected behavior of a class, and in that case, refactoring the expected behavior would have included the refactoring of all dependencies, including the tests.

The lost productivity argument only holds water if you are not responsible for extension or maintenance. And all you are doing then is pushing the work that should have been yours in the first place on the poor sucker that's inheriting your legacy. It has been my experience that any time I find a bug or break something with a new feature, it's because I didn't have test coverage on the affected code. Which means I get to lose productivity when I am likely already under time pressue rather than up front.

Another part of the lost productivity argument usually refers to the amount of code required to test a particular object vs. just using that object in production code. Between DIP for wiring things up and the numerous mocking frameworks available to declare your expectations on dependencies, wiring up your test harness should be short. If there is a lot of plumbing required just to set up the test conditions something's wrong with the design.

Not dogmatic, just providing guidance

Since I arrived at practicing the SOLID principles, without being aware of them, I just can't see the dogma or beaurecracy in them. The recommendation in those 5 principles when taken as a whole is about making your life easier not forcing some philosophy down your throat.

If someone had put them front of me and told me "this is the way you must code, because good programmers do this", I'd likely be dismissive as well. Being thrown into implementing some process from a whitepaper without having seen it in practice and understood why it was useful, has a high likelyhood of leading to improper implementation which makes the presumed failure a self-fullfilling prophecy. However if taken as guidance, there is a lot of useful information there that can make projects, especially large projects, a lot less painful.

Labels: , ,

Saturday, January 10, 2009

db4o 7.4 binaries for mono

As I talked about recently, the standard binaries for db4o have some issues with mono, so I recompiled the unmodified source with the MONO configuration flag. I've packed up both the debug and release binaries and you can get them here. These are just the binaries (plus license). It's not the full db4o package. If you want the full package, just get it directly from the db4o site, since the MONO config flag and have Visual Studio rebuild the package.

This package should show up on the official db4o mono page shortly as well.

Labels: , ,

Saturday, January 03, 2009

db40 indexing and query performance

Indexing on db4o is a bit non-transparent, imho. There's barely a blurp in their Documentation app and it just tells you how to create an index and how to remove it. But you can't easily inspect that one exists, or whether it's being used. So i spent a good bit of time today trying to figure out why my queries were so slow, was an index created and if so, was it being used? The final answer is, if querying is slow in db4o, you're not using an index, because, OMG, you'll know when you do an indexed query.

Index basics

Given an object such as
public class Foo
{
  public string Bar;
}
you create an index, globally (meh) for that object on all databases you create thereafter, with this call:
Db4oFactory.Configure().ObjectClass(typeof(Foo)).ObjectField("Bar").Indexed(true);
So far, straight forward enough. But let's say you're using a property? Well, db4o does its magic by inspecting your underlying storage fields, so you have to index them, not the properties that expose them. That means if our object was supposed to have a readonly property Bar, like this:
public class Foo
{
  private string bar;
  public Foo(string bar)
  {
    this.bar = bar;
  }
  public string Bar { get { return bar; } }
}
then the field you need to index is actually the private member bar:
Db4oFactory.Configure().ObjectClass(typeof(Foo)).ObjectField("bar").Indexed(true);
Given this idiosyncrasy, the obvious question is "what about automatic properties?". Well, as of right now the answer is, no such luck, because you'd have to reflect the underlying storage field that is created and index it, and you don't get any guarantees that field is named the same from compiler to compiler or version to version. That probably also means, that automatic properties are dangerous all around, because you may never get your data back if the storage changes, although on that conclusion i'm just speculating wildly.

Query performance

Index in hand, I decided to populate a DB, always checking if the existing item already existed, using a db4o native query. That started at 1 ms query time and then linearly increased with every item added. That sure didn't seem like an indexed search to me. I finally discovered a useful resource on the db4o site, but unfortunately it's behind a login, so google didn't help me find it and my link to it will only take you to the login. That's a shame because this bit of information ought to be somewhere in big bold letters! You must have the following DLLs available for Native Queries to be optimized into SODA queries, which apparently is the format that hits the index: The query will execute fine, regardless of their presence, but the performance difference between the optimized, index using query and the unoptimized native query is orders of magnitude. My queries went from 100-500ms to 0.01ms, just by dropping those DLLs into my executable directory. Yeah, that's a useful change.

Interestingly enough, the same is not required for linq queries. They seem to hit the index without the extra help (although just to even run, Mono.Cecil and Cecil.FlowAnalysis need to be present, so here you at least get an error). There currently appears to be about 1ms overhead for parsing linq into SODA, but i'll take that hit for the syntactic sugar.

Conclusions

I'm pretty happy with simplicity and performance of db4o so far. It seems like an ideal local, queryable persistence layer. The way it works does want to make me abstract my data model into simple data objects that are then converted into business entities. I'd rather have the attribute based markup of ActiveRecord, but that's not a deal breaker.

Labels: , , ,

Friday, January 02, 2009

Db4o on .NET and Mono

After failing to get a cross-platform sample of NHibernate/Sqlite going, I decided to try out Db4o. This is for a simple, local object persistence layer anyhow, nothing more than a local cache, so db4o sounded perfect.

The initial DLLs for 7.4 worked beautifully on .NET but ran into problems on Mono. Apparently db4o imports FlushFileBuffers from kernel32.dll if your build target is not CF or mono. And in its call to FlushFileBuffers it uses FileStream.SafeFileHandle.DangerousGetHandle() which it not yet implemented under Mono, resulting in this exception:

Unhandled Exception: System.NotImplementedException: The requested feature is no
t implemented.
  at System.IO.FileStream.get_SafeFileHandle () [0x00000]
  at Sharpen.IO.RandomAccessFile.Sync () [0x00000]
  at Db4objects.Db4o.IO.RandomAccessFileAdapter.Sync () [0x00000]
  ...
I found this page on the Db4o site, which suggested just falling back to FileStream.Handle. However, that for me just resulted in this:
Unhandled Exception: System.EntryPointNotFoundException: FlushFileBuffers
  at (wrapper managed-to-native) Sharpen.IO.RandomAccessFile:FlushFileBuffers (intptr)
  at Sharpen.IO.RandomAccessFile.Sync () [0x00000]
  at Db4objects.Db4o.IO.RandomAccessFileAdapter.Sync () [0x00000]
  ...
So, i simply defined MONO as a compilation symbol in visual studio and rebuilt it. I figure the only time this code will run on Windows is during testing, so treating it as mono is fine. And that did solve my issues and i now have a DLL for db40 7.4 that works beautifully across .NET and mono from a single build.

Being a Linq nut, I immediately decided to skip the Native Query syntax and dive into using the Linq syntax instead. Which worked great on mono 2.0.1, but unfortunately on the current Redhat rpm (stuck back in 1.9.1 lang), the Linq implementation isnt' quite complete and you get this:

Unhandled Exception: System.NotImplementedException: The requested feature is not implemented.
  at System.Linq.Expressions.MethodCallExpression.Emit (System.Linq.Expressions.EmitContext ec) [0x00000]
  at System.Linq.Expressions.LambdaExpression.Emit (System.Linq.Expressions.EmitContext ec) [0x00000]
  at System.Linq.Expressions.LambdaExpression.Compile () [0x00000]
  at Db4objects.Db4o.Linq.Expressions.SubtreeEvaluator.EvaluateCandidate (System.Linq.Expressions.Expression expression) [0x00000]
  ...
But falling back from this syntax:
 var items =  from RosterItem r in db
            where r.CreatedAt > DateTime.Now.Subtract(TimeSpan.FromMinutes(10))
            select r;
to the NativeQuery syntax (with delegates replaced by lambda's):
 var items = db.Query<RosterItem>(r => r.CreatedAt > DateTime.Now.Subtract(TimeSpan.FromMinutes(10)));
It's still a fairly compact and straight forward syntax, so until i finish setting up my own Centos mono RPMs, i'll stick to this syntax.

I need to run db4o through some more serious paces, but I like what I see so far.

Labels: , , ,

Saturday, December 27, 2008

Comparison with default(T)

I was working on a generic class that I had limited to where : class because I wanted to use null as valid return value. Well, then I needed to use the class on Guid, which is a value type. So I replaced all return null with return default(T). That was fine except for my Enumerator which used null to yield break out of the iteration. Unfortunately
  if(t == default(T)) {
    yield break;
  }
wasn't legal either. Then i thought, how about
  if(t.Equals(default(T))) {
    yield break;
  }
which compiles just fine, but of course throws a NullArgumentException, since I am after all looking for a null value. After some digging I finally came across the solution:
  if(Comparer<T>.Default.Compare(t, default(T)) == 0) {
    yield break;
  }
and that did the trick.

Labels: ,

Sunday, November 02, 2008

Saying LINQ is about databases is missing its true benefits

Just came across a long discussion about LINQ in Java on the ODBMS blog (thanks to Miguel's tweet). There is some excellent discussion in there, but aside from a couple of people the discussion seemed to largely center on
  1. It's from MS so it's a bad idea to copy, and
  2. I don't think it adds anything over normal SQL syntax

The first is an unfortunate dismissal of a very powerful functional language construct because of its origin. And the second illustrates that the commenter does not truly understand what LINQ brings to the language in the first place.

Of course, I'd bet that 90% of .NET developers, if polled, would also equate LINQ with "type-safe SQL in the language", so this isn't a dig against Java people. Hopefully as Parallel LINQ gains some traction, this simplification will loose it's hold on people.

LINQ or Language INtegrated Query is really a functional way of expression operations on collections. And if you decompose a lot of code, anywhere where you are using loops to manipulating collections, LINQ is likely to create a more concise and powerful expression for the same operation. And being functional, the implementation of how LINQ does this is opaque to the caller. The caller simply describes what operations should be done on the data sources, allowing for optimization of the operations based on the data sources. That means they could be in a database, they could come from XML, or REST calls or simply exist as an in-memory object graph. But none of these things change the transformations desired.

I've only used LINQ once for SQL, although I use LINQ to objects, i.e. against IEnumerable<T>, almost daily and it's done away with a lot of foreach with temporary variables, temporary lists, etc. But even that scenario against the apparently now deprecated DLINQ or Linq2Sql illustrated how it wasn't just about replacing SQL with type-safe syntax, it allowed me to use one syntax for both database and local operations.

This project included doing a bunch of analytical processing against the data, including projection combining a number data sources. Not all of this was expressible in SQL and some of it wasn't a wise use of live DB queries, and performing the additional work in memory was a lot more efficient. Traditionally this type of work would exhibit a fairly obvious syntactic break between the local and the DB operations. And moving some part (say a sort of a sub-set) from SQL to local or vice versa would be a significant re-write. But using LINQ, the syntax was identical, it was merely a matter of deciding at what point the query should be turned into concrete data vs. a cursor against the DB. This is simply done by turning any IEnumerable to a List, forcing immediate execution of the query represented by the IEnumerable source. Either way, local or remote, the syntax stayed the same the power of where processing should happen was in my hands.

I do support the goal of getting something akin to LINQ in java, but I sure hope they don't attack the problem by creating some DB-centric query DSL. The greatest benefit of LINQ in .NET, imho, is that instead of hacking its syntax into the language, the building blocks of anonymous delegates, lambda expressions, anonymous types, var syntax and object initializers. Each one of these pieces is a fundamental part of C# 3.0 and can be used independently, but together they allow LINQ to exist. Discussion of common fluent APIs for databases or whole new languages like SBQL miss the benefit of "Language Integrated" in LINQ.

Going to be interesting to follow how this evolves, since I personally think that LINQ is one of the key differentiators between Java and C# that isn't just syntactic sugar (even though many think it is just that).

Labels: , ,

Tuesday, October 28, 2008

Dream access control

Just finished an article over on the MindTouch blog about tweaking Dream's default access patterns. I really like how Dream uses cookies, something you don't often see in REST services. Generally it's all about X-My-Cool-Auth-Header business, which is yet another manual burden for developers. Not sure if this originated because people did raw http requests and either didn't know that most http request mechanisms have cookie support (even curl has a cookie jar), or whether it was a dislike of cookies.

The article also briefly touches on Prologues and Epilogues, a topic I need go into with more detail some time in the future. Basically every Feature call can have n pre and post actions that can do anything from checking authentication to mutating the request (think accepting data in json or Xml and having a prologue and epilogue do transformations on the way in and out so that the feature itself doesn't have to worry about the data format but can assume that it always gets Xml. The system kind of reminds me of apache handler chaining from mod_perl.

Labels: , , ,