Skip to content

2010

Promise: Inversion of Control is the new garbage collection

Before continuing with additional forms of method defintions, I want to take a detour through the Inversion of Control facilities, since certain method resolution behavior relies on those facilities. IoC is one feature of Promise that is meant to not be seen or even thought about 99% of the time, but when you need to manipulate its default behavior it is a fairly broad topic, which I will cover in the next 3 or 4 posts. If you want to see code, you probably want to just go to the next post, since this one is mostly about the reasoning for inclusion of IoC in the language itself.

The evolution of managing instances

Not too long Garbage Collection was considered the domain of academic programming. My first experience with it was writing LISP on Symbolics LISP machines. And while it was a wonderful development experience, you got used to the Listener (think REPL on LISP machines) to pause and the status Genera status bar blinking with (garbage-collect). Ok, but that's back on hardware significantly less powerful than my obsolete Razor flip-phone.

These days garbage collection is pretty much a given. The kind of people that say you have to use C to get things done are the same kind of people that used to say that you have to use assembly to get things done, i.e. they really are talking about edge cases. Even games are using scripting languages for much of their game logic these days. Whether it's complex generational garbage collection or simple reference counting, most languages are memory managed at this point.

The lifespan phases of an instance

But still we have the legacy of malloc and free with us. We still new-up instances and while there's fairly little use of destructors, we still run into scenarios that require decomissioning of objects before garbage collection gets rid of them. And while on the subject of construction and destruction, we're still manually managing the lifespan from creation to when we purposely let them drop out of scope so GC can do its magic.

Somehow while moving to garbage collection so that we don't have to worry about that plumbing, we kept the plumbing of manually handling construction, initialization and disposal. That doesn't seem like work related to solving the task at hand, but rather more like ceremony we've grown used to. We now have three phases in an instance lifespan, only one of which is actually useful to problem solving:

Construction/Initialization

Depending on which language you are using, this might be a single constructor stage (Java, C#, Ruby, et al) or an allocation and initialization stage (Smalltalk, Objective-C, et al). Either way, you do not want your code to start interacting with the instance until these stages are completed

Operation

This is the useful stage of the instance, when it actually can fullfill its purpose in our program. This should really be the only stage we ever need to see.

Disposal/Destruction

We're done with the instance, so we need to clean up any references it has and resources it has a hold of and let the garbage collector do the rest. By definition, it has passed its useful life and we just want to make sure it's not interfering with anything still executing.

Complicating disposal is that most garbage collected languages have non-deterministic destructors, which are not invoked until the time of collection and may be long after use of the instance has ceased. Since there are scenarios where clean-up needs to happen in a deterministic fashion (such as closing file and network handles), C# added the IDisposable pattern. This pattern seems more like a "oh, crap, what do we do about deterministic cleanup?" add-on than a language feature. It completely puts the onus on the programmer both for calling .Dispose (unless in a using block) and for handling access to an already disposed instance.

Enter Inversion of Control

For the most part, all we should care about is that when we want an instance with certain capabilities, we should be able to get access to one. Who cares if it was freshly constructed or a shared instance or a singleton or whatever. Those are details that are important but once defined not part of the user story we set out to satisfy.

In Java and C#, this need for pushing instance management out of the business logic and into dedicated infrastructure led to the creation of Inversion of Control containers, named thus because they invert the usual procedural flow of "create an object, hand it to another object constructor as a dependency, etc." to "ask for the object you need and the depedency chain will be resolved for you". There are numerous articles on the benefits of Dependency Injection and Inversion of control. One of the simplest explanation was given by John Munch to the Stackoverflow question "How to explain Dependency Injection to a 5-year-old":

When you go and get things out of the refrigerator for yourself, you can cause problems. You might leave the door open, you might get something Mommy or Daddy doesn't want you to have. You might even be looking for something we don't even have or which has expired.

What you should be doing is stating a need, "I need something to drink with lunch," and then we will make sure you have something when you sit down to eat.

But IoC goes beyond the wiring-up of object graphs that DI provides. It is also responsible for knowing when to hand you a singleton vs. a shared instance for the current scope vs. a brand new instance and handles disposal of those instance as their governing scopes are exited.

These frameworks are build on top of the existing constructor plumbing and use reflection to figure out how to take over the tasks that used to fall to the programmer. For Promise this plumbing is considered a natural extension of what we already expect of garbage collection and tries to be automatic and invisible.

By default every "constructor" access to an instance resolves the implicit Type to the Class of the same name, and creates an instance, i.e. behavior as you expect from OO languages. However, using nested execution scopes, lifespan management and Type mapping, this behavior can be modified without touching the business logic. In the next post, I'll start by explaining how the built in IoC works by tackling Type/Class mapping.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Promise: Constructor revisionism

Only 3 posts into the definition of the language and already I'm changing previously published specs. Well, that's the way it goes.

I'm currently writing the article about language level IoC which I eluded to previously, but the syntax effects I had not fully considered yet. The key concept, tho, is that there is no construction, there is only instance resolution, which .new being a call on the Type not Class hinted at. But that does mean that what you get does not necessarily represent a new instance.

And beyond naming implications, the implications of what arguments passed into the resolution call mean is also ambiguous. The could be initialization values or they could be arguments to determine which instance of that Type to fetch (like in a Repository pattern). And if that's the case, the overloading this process becomes tricky as well, since it should access the super class, which means it only makes sense in the construction paradigm.

Basically lots of syntactic implications I'm working through right now. The only thing that is certain is that .new will not make it through that review process.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Promise: Defining Types and Classes

Once you get into TDD with statically typed languages you realize you almost always want to deal with interfaces not classes, because you are almost always looking at two implementations of the same contract: the real one and the Mock. If you are a ReSharper junkie like myself, this duplication happens without giving it much thought, but it still is a tedious verbosity of code. Add to that that prototyping and simple projects carry with them syntactic burden of static tyypes (again, a lot less so for us ReSharper afflicted) and you can understand people loving dynamic languages. Now, I personally don't want to use one language for prototyping and smaller projects and then rewrite it in another when the codebase gets complex enough that keeping the contract requirements in your head becomes a hinderance to stability.

Promise tries to address these problems by being typed only when the programmer feels it's productive and by having classes automatically generate Types to match them without tying objects that want to be cast to the type of be instantiated by the class. This means that as soon as you create a class called Song, you have access to a Type called Song whose contract mirrors all non-wildcard methods from the class. Since the Type Song can be provided by any instance that services the contract, just taking an object, attaching a wildcard method handler to it creates a mock Song.

A diversion about what would generally be called static methods

The class model of Promise is a prototype object system inspired by Ruby/Smalltalk/javascript et al. Unlike class based languages, a class is actually a singleton instance. This means that a "static" method call is a dispatch against an instance method on that class singleton, so it would be better described as a Class Method.

But even that's not quite accurate. Another difference I eluded to in my Lambda post: Whenever you see the name Song in code and it's not used to change the definition of the class, it's actually the Type Song. So what looks like a call to the Class Method is a dispatch to the singleton instance of the class that is currently registered to resolve when a new instance for the Type Song is created. So, yes, it's a Class Method but not on the Class Song, but on the class that is the default provider for the Type Song.

If you've dealt with mocking in static languages, you are constantly trying to remove statics from your code, because they're not covered by interfaces and therefore harder to mock. In Promise, Class Methods are instance methods on the special singleton instance attached to an object. And since calling that methods isn't dispatched via the class but the implicit or explicit type, Class methods are definable in a Type.

Type definition

Type definitions are basically named slots followed by the left-hand-side of a lambda:

type Jukebox {
  PlayAll:();
  FindSongByName:(name|Song);
  Add:(Song song);
  ^FindJukeboxWithSong:(Song song|Jukebox);
}

One of the design goals I have with Promise is that I want keep Promise syntax fairly terse. That means that i want to keep a low number of keywords and anything that can be constructed using the existing syntax should be constructed by that syntax. That in turn means that I'm striving to keep the syntax flexible allow most DSL-like syntax addition. The end result, I hope is a syntax that facilitates the straddling of the dynamic/static line to support both tool heavy IDE development and quick emacs/vi coding. Here, instead of creating a keyword static, I am using the caret (^) to identify Class methods. The colon (:), while superfluous in Type definitions, is used to disambiguate method invocation from definition, when method assignment happens outside the scope of Type or Class definitions.

Attaching Methods to Instances

You don't have do define a class to have methods. You can simply grab a new instance and start attaching methods to it:

// add a method to a class
Object.Ping:() { println "Ping!"; };

// create blank instance
var instance = Object.new;

// attach method to instance
instance.Say:(str) {
  println "instance said '{str}'";
};

instance.Say("hello"); // => instance said 'hello'
instance.Ping(); // => Ping!

A couple of things of note here:

First, the use of the special base class Object from which everything derives. Attaching methods to Object makes them available to any instance of any class, which means that any object created now can use Ping().

Second, there is the special method .new, which creates a new instance. .new is a special method on any Type that invokes the languate level IoC to build up a new instance. It can be called with the JSON call convention, which on a Object will initialize that object with the provided fields. If an instance from a regular class is instantiated with the JSON call convention, then only matching fields in the json document are initialized in the class, all others are ignored. I'll cover how JSON is a first class construct in Promise and used for DTOs as well as the default serialization format in another post.

Last, the method Say(str) is only available on instance, not on other instances created from Object. You can, however call instance.new to get another instance with the same methods as the prior instance.

Defining a Class

Another opinionated style from Ruby that I like is the use prefixes to separate local variables from fields from class fields. Ruby uses no prefix for local, @ for fields (instance variables) and @@ for class fields. Having spent a lot of time in perl, @ still makes me think of arrays, so it's not my favorite choice of symbol, but I'd prefer it over the name collision and this.foo disambiguation of Java/C#.

Having used a leading underscore ( _ ) for fields in C# for a while, I've opted to use it as the identifying prefix for fields in Promise. In addition, we already have the leading caret as the prefix for Class Methods, so we can use it for Class Fields as well.

class Song {

  // Class Field
  Num ^songCount;

  // Class Method
  TotalSongs:(|Num) { ^songCount; };

  // Fields
  _name;
  _filename;
  Stream _stream;

  // public Method
  Play:() {
    CheckStream();
    _driver.Read(_stream);
  };

  // protected Method
  _CheckStream:() { ... };
}

Just as the Caret is used both for marking Class Fields and Methods, underscore does the same for Methods and Fields: While Fields can only be protected, methods can be public or protected -- either way underscore marks the member as protected.

The method definition syntax is one of assigning a lambda to a named slot, such that :. The aspect of this behavior that is different from attaching functions to fields in JSON constructs in javascript is that these slots are polymorphic. In reality the slots are HashSets that can take many differeny lambda's as long as the signature is different. If you assign multiple lambdas with same signature to a single slot, the last one wins. This means that not only can you attach new methods to a class after definition, you can also overwrite them one signature at a time.

More on instance construction

Although .new is special, it is a Class Method, so if a more complex constructor is needed or the constructor should do some other initialization work, an override can easily be defined. Since Class Methods are reflected by types, this means that custom constructors can be part of the contract. The idea is that if the construction requirements are stringent enough for the class, they should be part of the Type so that alternative implemenentations can satisfy the same contract.

Song.^new:(name) {
  var this = super.new;
  this._name = name;
  return this;
}

An override to .new is just a Class method not a constructor. That means that super.new has to be called to create an instance, but since the method is in the scope of the Class definition, the instance's fields are accessible to override. There is no this in Promise, so the use of this in the above example is just a variable used by convention, similar to the perl programmer's usage of $self inside of perl instance subs.

But wait, there is more!

There are a number of special method declarations beyond simple alphanumerically named slot, such as C# style setters, operators, wildcards, etc. But there is enough detail in those cases, that I'll save that for the next post.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Promise: Lambdas

Lambdas in Promise, like other languages, are anonymous functions and first-class values. They act as closures over the scope they are defined in and can capture the free variables in their current lexical scope. Promise uses lambdas for all function definitions, going further even than javascript which has both named and anonymous functions. In Promise there are no named functions. Just slots that lambdas get assigned to for convenient access.

Straddling the statically/dynamically typed divide by allowing arguments and return values to optionally declare a Type, Promise mimicks C# lambda syntax more than say, LISP, javascript, etc. A simple lambda example looks like this:

var i = 0;
var incrementor = { ++i; };
print incrementor(); // => 1

This declaration doesn't have any input arguments to declare, so it uses a shortform of simply assigning a block to a variable. The standard form uses the lambda operator =>, so the above lambda could just as well be written as:

var incrementor = () => { ++i; };

I'm currently debating whether I really need the =>. It's mostly that i'm familiar with the form from C#. But given that there are no named functions, parentheses followed by a block can't occur otherwise, so there is no ambiguity. So, i'm still deciding whether or not to drop it:

var x = (x,y) => { x + y };
// vs.
var x = (x,y) { x + y };

Inputs

The signature definition of lambdas borrows heavily from C#, using a left-hand side in parantheses for the signature, followed by the lambda operator. Input arguments can be typed, untyped or mixed:

var untypedAdd = (x,y) => { x + y; };
var typedAdd = (Num x, Num y) => { x + y; };
var mixedtypeAdd = (Num x, y) => { x + y; };

Output

In dynamic languages, lambda definitions do not need a way to express whether they return a value--there is no type declaration so whether or not to expect a value is convention and documentatio driven. In C# on the other hand, a lambda can either not return a value, a void method, which uses one of the Action delegates, or return a value and declare it as the last type in the declaration using the Func delegates. In Promise all lambdas return a value, even if that value is nil (more about the special singleton instance nil later). Values can be returned by explicitly using the return keyword, otherwise it defaults simply to the value of the last statement executed before exiting the closure. Since return values can be typed, we need a way to declare that Type. Unlike C#, our lambdas aren't a delegate signature, so instead of reserving the last argument Type as the return Type, which would be ambiguous, Promise uses the pipe '|' character to optionally declare a return type:

var returnsUntyped = (x,y) => { x + y; };
var returnsTyped = (x,y|Num) => { x + y; };
var explicitReturn = (|Num) => { returnsTyped(1,2); };

Default values

Lambdas can also declare default values for arguments, which can be simple values or expressions:

var simple = (x=2,y=3) => { x + y; };
var complex= (x=simple()) => { x; };

Calling Conventions

Promise supports three different method calling styles. The first is the standard parentheses style as shown above. In this style, optional values can only be used by leaving out trailing arguments like this:

var f = (x=1,y=2,z=3) => { x + y +z; };
print f(2,2,2); // => 6
print f(2,2);   // => 7
print f(4);     // => 9
print f();      // => 6

If you want to omit a leading argument, you have to use the named calling style, using curly brackets, which was inspired by DekiScript. The curly bracket style uses json formatting, and since json is a first-class construct in Promise, calling the function by hand with {} or providing a json value behaves the same, allowing for simple dynamic argument construction:

print f{y: 1};       // => 5
print f{z: 1, y: 1}; // => 3
var args = {z: 5};
print f args;        // => 8

Finally there is the whitespace style, which replaces parentheses and commas with whitespace. This style exists primarily to make DSL creation more flexible:

print f 2 2 2; // => 6 print f 2 2; // => 7 print f 4; // => 9 print f; // => 6

Note the final call simply uses the bare variable f. This is possible because in Promise a lambda requiring no arguments can take the place of a value and accessing the variable executes the lambda. Sometimes it's desirable to access or pass a reference to a lambda, not execute it, in which case the reference notation '&' is needed. Using reference notation on a value is harmless (at least that's my current thinking), since Promise has no value types, so the reference of a value is the value:

var x = 2;
var y = () => { x+10; };
var x2 = &x;
var y2 = &y;
var y3 = y;
x++;
print x2; // => 3;
print y2; // => 13;
print y3; // => 12;

The output of y3 is only 12, because assignment of y3 evaluated y, capturing x before it was incremented.

Closures and Scope

As mentioned above, Lambdas can capture variables from their current scope. Scopes are nested, so a lambda can capture variables from any of the parent scopes

var x = 1;
var l1 = (y) => {
  return () => { x++; y++; x + y; };
};
print l1(2); // => 5
print l1(2); // => 6
print x;     // => 3

Similar to javascript, a block is not a new scope. This is done primarily for scoping simplicity, even if it introduces some side-effects:

() => {
  var x = 5;
  if( x ) {
    var x = 5; // illegal
    var y = 10;
  }
  return y; // legal since the variable declaration was in the same scope
};

Using anonymous functions

As I've said that lambdas are the basic building block, meaning there is no other type of function definition. You can use them as lazily evaluated values, you can pass them as blocks to be invoked by other blocks and as I will discuss next time, Methods are basically polymorphic, named slots defined inside the closure of the class (i.e. capturing the class definition's scope), which is why there is no need for explicitly named functions.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Promise: Classes aren't Types

I wanted to start with the basic unit of construction in Promise, the lambda or closure. However, since Promise's lambda's borrow from C#, allowing Type definition in the declaration, I really need to cover the concepts of the promissory type system before i can get into that.

Classes are prototypes

Borrowing from javascript and Ruby and by extension Smalltalk, Promise classes are prototypes that are used to instantiate objects with the characteristics of the class. But those instances are untyped and while they can be inspected to see the originating class, the instances can also be changed with new characteristics making them diverge from that class.

Types are contracts

Similar to Interfaces in C#, Java, etc., Types in Promise describe the characteristics of some object. Unlike those languages, classes do not implement an interface. Rather, they are similar to Interfaces in Go in that any instance that has the methods promised by the Type can be used as that Type. Go, however, is still a typed language where an instance has the Type of its class but can be cast to an Interface, while Promise doesn't require any Type.

Compile time checking of Types

This part I still need to play with syntax to figure out useful and intuitive behavior. Classes aren't static and neither are instances, so checking the Type contract against the instantiation Class' capabilities isn't always the final word, Furthermore, if the object ever is passed without a Type annotation, the compiler can't determine the class to inspect. Finally, classes can have wildcard methods to catch missing method calls. In order to have some kind of compile time checking on the promissory declarations, there will have to be some way to mark variables as promising a Type cast that the compiler will trust, so that Type annotated and pure dynamic instances can work together.

This last part is where the dynamic keyword in C# fails for me, since you cannot cross from dynamic into static without a way of statically declaring the capabilities of an instance. That means that in C# you cannot take a dynamic object and use it anywhere that expects a typed class, regardless of its capabilities. I wrote Duckpond for C# to only address casting classes to interfaces that they satisfy, but I should extend it to proxy dynamic objects as well.

Implicit Types and the Type/Class naming overlap

In order to avoid needless Type declarations, every Class definition generates a matching Type with the same name (based only on its non-wildcard methods) at compile time. If you create a class called Song and a lambda with the signature (Song song) => { ... };, it might look like the Class is the Type. What's really going on though is that the class definition generated a Type called Song that expects an instance with the capabilities of the compile time definition of the Song class.

The shadowing of Classes by Types is possible because Class and Type names do not collide. A class name is only used for Class definition and modification, while Type names are used for signature and variable type. And since the shadowing is implicit, it's also possible to create a Song Type by hand to use instead of the implicit one. This is particularly useful when the Song class has several similar methods handled by a single wildcard method, but those signatures should be captured in the Type.

A final effect of the naming overlap, that I won't get into until I talk about the language level IoC, is that Song.new() is not a call on the class, but on the Type -- this is also an artifact of how Promise treats Class/static methods and how the IoC container resolves instance creation. Can you see how this could be useful for mocking?

Enough with the theory already

Sorry for another dry, codeless post, but I couldn't see getting into the syntax that will use Types and Classes without explaining how Classes and Types interact in Promise first.

Next time, lambdas, lambdas, lambdas.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

I made this half-pony half-monkey monster to please you

I made this half-pony half-monkey monster to please you But I get the feeling that you don't like it What's with all the screaming? You like monkeys, you like ponies Maybe you don't like monsters so much Maybe I used too many monkeys Isn't it enough to know that I ruined a pony making a gift for you?

Jonathan Coulton - Skullcrusher Mountain

I'm primarily a C# developer these days, but tinker in various other languages to see what others are doing better or worse. I am clearly biased towards static languages, despite lots of features I do like in various dynamic languages. This isn't some fear of the unknown, it's preference from extensive experience -- before switching to C# by way of Java, I was a perl developer for about 8 years. But I'm not trying to start yet another static vs. dynamic flame war here.

Playing with lots of languages got me thinking about what i'd ideally like to see in a programming language. So over the next couple of posts are going to be largely a thought experiment in designing that language. I do this not because i think there aren't enough languages, but to gain getter insights into how I work, how language design works and what goes into good usability design.

The conclusion of these posts may be that I find a language that already offers what I am looking for or that I start prototyping the new language in the DLR, since that's another skill i want to expand on.

Promise

The primary feature of this language, which I'm calling Promise for now, is that it's a dynamic language with a duck-typing system to allow declaration of contracts both for discoverability and compile time (AOT or JIT) verification, but without forcing strict class inheritance hierarchies on the programmer. Its mantra is "I promise that you will get an instance that does what you expect it to do".

Below are the top-level features I find desirable:

Optional Type Annotation

Some things can be completely dynamic, but I find it a lot more useful if a method can express the type of instance it expects, rather than having that information out-of-band in the documentation. As I said, I want it to be pure duck-typing: Classes aren't types and Types aren't classes, a class can be cast to a type if it satisfies the contract. This attaching the Types at the consumption rather than the declaration side I've previously written about and it's one thing i really like about Go. I want types to aid in discovery and debugging, not be the yoke it can become in purely statically typed languages

Runs in a virtual machine

I like virtual machines that can host many languages. It allows you to write in your favorite language but still take advantage of features and libraries written in other languages without complicated and platform specific interop stories. It took a while for this promise to come to fruit but both the JVM and CLR are turning into pretty cool polyglot ecosystems. So the language must run in one of those two environments (pending the emergence of another VM that has as much adoption).

Just-in-Time compilation

While I am a big fan of compiling and deploying bytecode, for rapid development and experimentation, I really want the language to be able to run either as bytecode or as loose files that are compiled just in time.

Object-Orientation

I will continue to structure logical units as work as classes with state and methods and pass around complex data in some kind of entity for a fair amount of the work I do. So organizing code in classes that are instantiated is a fundamental building block I rely oh.

Lambdas

But just as much as object-orientation is useful for organization of responsibilities, defining anonymous functions with closures for callbacks, continuations, etc. are another essential to way I code and a fundamental building block for asynchronous programming.

Mix-ins

While C# 3.5 extension methods are a decent way of extending existing classes and attaching functionality to interfaces, I would prefer the mix-in style of attaching functionality to a class or instance without going down the multiple inheritance path.

Language level Inversion of Control

Having to know how to construct dependency hierarchies by hand or knowing about and managing the lifetime scopes of instances is, imho, an imperative programming concept that just needlessly complicates things. I just want to get an instance that I can work with and not have to deal with a myriad of constructors or have to know how to initialize a fresh instance otherwise. Nor do I want to deal with knowing when to get my own instance vs. a shared one. All that is plumbing that is common and fundamental enough that it should move into the language itself, rather than having to build an IoC framework that takes over constructors for you, or using Service Location to self-initialize.

With the 10k foot overview out of the way...

From these qualities, I have started to define the syntax for Promise and next time I'll go into some detailed specifications and start working through various syntax examples.

If you want to skip ahead and see my brainstorm notes, they can be found here.

If you think there already is a language that satisfies most if not even all my requirements, I'd love to hear about it as well.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Don't blow your stack: C# trampolining

A couple of weeks ago someone posted two links on twitter suggesting the trampolining superiority of Clojure to C#. (Can't find that tweet anymore, and yes, that tweet was what finally motivated me to migrate my blog so i could post again.)

Well, the comparison wasn't really apples to apples. The C# article "Jumping the trampoline in C# – Stack-friendly recursion" is trying to do a lot more than the Clojure article "Understanding the Clojure `trampoline'". But maybe that's the point as well: that with C# people end up building much more complex, verbose machinery when the simple style of Clojure is all that is needed. That determination wanders into the subjective, where language feuds are forged, and i'll avoid that diversion this time around.

What struck me more was that you can do trampolining a lot more simply in C# if we mimic the Clojure example given. Let's start by reproducing the stack overflow handicapped example first:

Func<int, int> funA, funB = null;
funA = n => n == 0 ? 0 : funB(--n);
funB = n => n == 0 ? 0 : funA(--n);

I'd dare say that the Lambda syntax is more compact than the Clojure example :) Ok, ok, the body is artificially small, allowing me to replace it with a single expression, which isn't a realistic scenario. Suffice it to say, you can get quite compact with expressions in C#.

Now, if we call funA(4), we'd get the same call sequence as in Clojure, i.e.

funA(4) -> funB(3) -> funA(2) -> funB(1) -> funA(0) -> 0

And if you, instead, call funA(100000), you'll get a StackOverflowException.

So far so good, but here is where we diverge from Clojure. Clojure is dynamically typed, so it can return a number or an anonymous function that produces that number. We can't do that (lest we return object, _ick), b_ut we can come pretty close.

Bring in the Trampoline

The idea behind the Trampoline is that it unrolls the recursive calls into sequential calls, by having the functions involved return either a value or a continuation. The trampoline simply does a loop that keeps executing returned continuations until it gets a value, at which point it exits with that value.

What we need for C# then is a return value that can hold either a value or a continuation and with generics, we can create one class to cover this use case universally.

public class TrampolineValue<T> {
    public readonly T Value;
    public readonly Func<TrampolineValue<T>> Continuation;

    public TrampolineValue(T v) { Value = v; }
    public TrampolineValue(Func<TrampolineValue<T>> fn) { Continuation = fn; }
}

Basically it's a container for either a value T or a func that produces a new value container. Now we can build our Trampoline:

public static class Trampoline {
    public static T Invoke<T>(TrampolineValue<T> value) {
        while(value.Continuation != null) {
            value = value.Continuation();
        }
        return value.Value;
    }
}

Let's revisit our original example of two functions calling each other:

Func<int, TrampolineValue<int>> funA, funB = null;
funA = (n) => n == 0
  ? new TrampolineValue<int>(0)
  : new TrampolineValue<int>(() => funB(--n));
funB = (n) => n == 0
  ? new TrampolineValue<int>(0)
  : new TrampolineValue<int>(() => funA(--n));

Instead of returning an int, we simply return a TrampolineResult<int> instead. Now we can invoke funA without worrying about stack overflows like this:

Trampoline.Invoke(funA(100000));

Voila, the stack problem is gone. It may require a bit more plumbing than a dynamic solution, but not a lot more than adding type declarations, which will always be the syntactic differences between statically and dynamically typed. But with lambdas and inference, it doesn't have to be much more verbose.

But wait, there is more!

Using Trampoline.Invoke with TrampolineValue<T> is a fairly faithful translation of the Clojure example, but it doesn't feel natural for C# and actually introduces needless verbosity. It's functional rather than object-oriented, which C# can handle but it's not its best face.

What TrampolineValue<T> and its invocation really represent are a lazily evaluated value. We really don't care about the intermediaries, nor the plumbing required to handle it.

What we want is for funA to return a value. Whether that is the final value or lazily executes into the final value on examination is secondary. Whether or not TrampolineValue<T> contains a value or a continuation shouldn't be our concern, neither should passing it to the plumbing that knows what to do about it.

So let's internalize all this into a new return type, Lazy<T>:

public class Lazy<T> {
    private readonly Func<Lazy<T>> _continuation;
    private readonly T _value;

    public Lazy(T value) { _value = value; }
    public Lazy(Func<Lazy<T>> continuation) { _continuation = continuation; }

    public T Value {
        get {
            var lazy = this;
            while(lazy._continuation != null) {
                lazy = lazy._continuation();
            }
            return lazy._value;
        }
    }
}

The code for funA and funB is almost identical, simply replacing TrampolineValue with Lazy:

Func<int, Lazy<int>> funA, funB = null;
funA = (n) => n == 0
  ? new Lazy<int>(0)
  : new Lazy<int>(() => funB(--n));
funB = (n) => n == 0
  ? new Lazy<int>(0)
  : new Lazy<int>(() => funA(--n));

And since the stackless chaining of continuations is encapsulated by Lazy, we can simply invoke it with:

var result = funA(100000).Value;

This completely hides the difference between a Lazy<T> that needs to have its continuation triggered and one that already has a value. Now that's concise and easy to understand.

Migrated to Wordpress

A couple of months ago google deprecated their SFTP publishing from blogger. I hadn't been excited about blogger for a long time, but not doing anything to change things won out over doing something about it. At least until i couldn't post anything anymore. To be fair, google gave me plenty warning to migrate, but I've been pretty buried with development for the Olympic release of MindTouch, so it's not until now that I've migrated my blog over to Wordpress and can finally post again.

I've got a back log of things i've been wanting to write about, so hopefully I actually can make the time to commit those thoughts here in the coming weeks.

NoSQL is the new Arcadia

Recently there's been a lot of people lamenting the sheep like mentality of picking RDBMS (and with it ORMs) as the way to model persistence, without first considering solutions that do not suffer the object-relational impedance mismatch.

Many of the arguments for having to use RDBMS' are easily shot down, such as the relentless requirements for adhoc reporting against production data (If your OLTP and OLAP are the same DB you are doing it wrong™.) But just because the arguments for picking an RDBMS are often ill-considered, the reasons for abandoning it also seem to suffer from some depth of consideration.

Let me be clear that I do my best to stay away from RDBMS' whenever i can. I have plenty of scars from supporting large production DB environments over the years and there are lots of pain points in writing web applications against RDBMS'. I, too, love schema-less, document and object databases. They make so much sense. I rabidly follow the MongoDB and Riak mailing lists and prototype projects with them and others NoSQL tech, such as Db4o. However, following those lists it is clear to me that a) they are still re-discovering lessons painfully learned by RDBMS folks and b) my knowledge of working with these systems when something goes wrong is woefully behind my knowledge of the same for RDBMS.

Pick the best tool

So yes, marvel at the simplicity of mapping your object model to a document model, or even serialize that object graph using an object or graph DB. But don't just concentrate on what they do better for development, ignoring the day-to-day production support issues. Take a minute and see if you can answer these questions for yourself:

Can you troubleshoot performance problems?

Every RDBMS has some kind of profiling tool and process list. And on the ORM side, Ayende's Uberprof is doing a fantastic job of bringing additional transparency to many ORMs. Do you have any similar tools for the alternative persistence layer? Do you know what's blocking your writes, your reads? What's slowing down your map/reduce? What indicies, if applicable, are being hit? And if you're using a sharded setup, profiling just got an order of magnitude more complicated.

What about concurrency on non-key accesses?

Key/value stores are much faster than even primary key hits on RDBMS. And document databases let you store the entire data hierarchy instead of normalizing them across foreign key tables making graph retrieval cheap too.

But as NoSQL goes beyond simple key retrieval with query APIs and map/reduce, concurrency concerns sneak back in along with the query power. Many NoSQL stores are still using single threaded concurrency per node or at least data silos (read: table locking).In RDBMS land, mysql was the last one to solve that and it did it 6-7 years ago.

What tools to you have to recover a corrupted data file?

Another set of tools you are guaranteed to find with any RDBMS are utilities for recovering corrupted data and index files. Or at the very least utilities for extracting data from them in case of catastrophic failure.

With many NoSQL stores using memory mapped files, corruption on power loss or DB crash is not uncommon. Does you persistence choice have ways to recover those files?

What's your backup strategy?

Most DBs have non-blocking DB dumps. Almost all have replication. Both are valid mechanisms.

Some NoSQL stores use replication to address the problem, others seem to punt on it by using redundant data duplication across nodes. But unless your redundant/replica nodes are geographically co-located, it's not the same as being able to go back to a backup on catastrophic loss.

How do you know your replicas are working?

So you say, you don't care if your data gets corrupted or that you can't do live backups, because it all gets replicated to a safe server. Well, much like going back to tape only to discover that your back-up process hasn't actually backed up anything, do you have the tools to ensure that your replicas are up to date and didn't get the corruption replicated into them?

Do your sysadmins share your comfort level?

A lot of these production level and back-up related issues are not even something developers think about, because with the maturity of RDBMS' their maintenance and back-up are often tightly integrated into the sysadmin's processes. If you don't think you need to care about the above questions, chances are you have others doing it for you. And in that case, it's vital that your sysadmins are versed the in NoSQL tool you are choosing before you throw the operations requirements over the wall at them.

The tool you know

Maybe you have all those questions covered for your NoSQL tool of choice. Google, Facebook, LinkedIn do. But likely, you don't. Maybe you don't have them covered for any RDBMS that you know either. But here's the difference: These problems have been tackled in painstaking detail in thousands of RDBMS production environments. So, when you hit a wall with an RDBMS, chances are you can find an answer and get yourself out of that production mess.

The relative novelty and deployment size of most NoSQL solutions means you can't easily fall back on established production experience. Until you have that same certainty when, not if, you face problems in production, you can't really say that you objectively evaluated all choices and found NoSQL to be the superior solution to your problem.

You're an administrator, not THE Administrator

I set up a new dev machine last week and decided to give win7 a try. Most recent dev setup was using win2k8 server and it's still my favorite dev environment. Fast, unobtrusive, things just worked.

Win7 appeared to be a different story, reminding me of the evil days of Vista. I had expected it to be more like Win2k8 server, but it just wasn't. I was trying to be zen about the constant UAC nagging and just get used to the way it wanted me to work. But two days in, it just came to a head and after wasting countless hours trying to work within the security circus it set up, i was ready to pave the machine.

Here's just a couple of things that were killing me:

Can't save into Program Files from the web

Had to save into my documents then move it there. Worse, it told me i had to talk to an administrator about that. I am an administrator!

Can't unzip into Program Files

Same story as above.

Have to whitelist reserve Uri's for HttpListeners and you can't wildcard ports.

This was the final straw, since my unit tests create random port listeners so that the shutdown failures of a previous test doesn't hose the registration of the next.

All these things need administrator privileges. But wait, I am an administrator, so what's going on? It appears that being an administrator is more like being in the sudoers file on unix. I have the right to invoke commands in the context of an administrator, but my normal actions aren't. I tried to work around this with registry hacks, shortcuts set to run as administrator and so on, to try to get things to start-up with administrator privs by default, but Visual Studio 2k8 just refused to play along. You cannot set it up so that you can double-click on a solution and it launch the solution as administrator in Win7. And even if you start VS as administrator, you cannot drag&drop files to it since it's now running in a different context as Explorer.

And if you ask MS Connnect about this you'll find that like anything of value the issue has been closed as "By Design.". Ok, look buddy, just because you designed a horrible user experience doesn't mean the problem can just be dismissed.

But why was win2k8 so much better an experience, a nagging voice kept asking. Turns out that on win2k8, i just run as Administrator. Win7 never gave that option (and you have to do some cmdline foo to enable the account.) Being a unix guy as well, running dev in what is root, just felt distasteful. But distaste or not, it's the key for actually being able to do productive development work in windows. As soon as I became THE Administrator, instead of an administrator, all was smooth again.

Stupid lesson learned.