Skip to content

2010

Promise: Object notation and serialization

I thought I had only one syntax post left before diving into posts about attempting to implement the language. But starting on a post about method slots and operators, I decided that there was something else i needed to cover in more detail first: The illustrious JSON object.

I've alluded to JSON objects more than a couple of times in previous posts, generally as an argument for lambda calls. Since everything in Promise is a class, JSON objects are bit of an anomaly. Simply, they are the serialization format of Promise, i.e. any object can be reduced to a JSON graph. As such it exists outside the normal class regime. It is also closer to BSON, as it will retain type information unless serialized to text, and can be serialized on the wire either as JSON or BSON. So really it looks like javascript object notation (JSON) but it's really Promise object notation. For simplicity, i'm going to keep calling it JSON tho.

Initialization

Creating a JSON object is the same as in javascript:

var a = {};
var b = [];
var c = { foo: ["bar","baz"] };
var d = { song: Song{name: "ShopVac"} };

The notation accepts hash and array initializers and their nesting, as well as object instances as values. Fields are always strings.

Serialization

The last example shows that you can put Promise objects into a JSON graph, and the object initializer itself takes another JSON object. I explained in "Promise: IoC Type/Class mapping" that passing a JSON object to the Type allows the mapped class constructor to intercept it, but in the default case, it's simply a mapping of fields:

class Song {
  _name;
  Artist _artist;

  Artist:(){ _artist; }
  ...
}

class Artist {
  _name;
  ...
}

var song = Song{ name: "The Future Soon", artist: { name: "Johnathan Coulton" } };

// get the Artist object
var artist = song.Artist;

//serialize the object graph back to JSON
print song.Serialize();
// => { name: "The Future Soon", artist: { name: "Johnathan Coulton" } };

Lacking any intercepts and maps, the initializer will assign the value name to _name, and when it maps artist to _artist, the typed nature of _artist invokes its initializer with the JSON object from the artist field. Once .Serialize() is called, the graph is reduced to the most basic types possible, i.e. the Artist object is serialized as well. Since the serialization format is meant for passing DTOs, not Types, the type information (beyond fundamental types like String, Num, etc.) is lost at this stage. Circular references in the graph would be dropped--any object already encountered in serialization causes the field to be omitted. It is omitted rather than set to nil so that its use as an initializer does not set the slot to nil, but allows the default initializer to execute.

Above I mentioned that JSON field values are typed and showed the variable d set to have an object as the value of field song. This setting does not cause Song to be serialized. When assigning values into a JSON object, they retain their type until they are used as arguments for something that requires serialization or are manually serialized.

var d = { song: Song{name: "ShopVac"} };

// this works, since song is a Song object
d.song.Play(); 

var e = d.Serialize(); // { song: { name: "ShopVac" } }

// this will throw an exception
e.song.Play();

// this clones e
var f = e.Serialize();

Serialization can be called as many times as you want and acts as a clone operation for graphs lacking anything further to serialize. The clone is a lazy operation, making it very cheap. Basically a pointer to the original json is returned and it is only fully cloned if either the original or the clone are modified. This means, the penalty for calling .Serialize() on a fully serialized object is minimal and is an ideal way to propagate data that is considered immutable.

Access and modification

JSON objects are fully dynamic and can be access and modified at will.

var x = {foo: "bar"};

// access by dot notation
print x.foo; // => "bar"

// access by name (for programatic access or access of non-symbolic names)
print x["foo"]; // => "bar"

x.foo = ["bar","baz"]; // {foo: ["bar","baz"]}
x.bar = "baz"; // {bar: "baz", foo: ["bar", "baz"]};

// delete a field via self-reference
x.foo.Delete();
// or by name
x["foo"].Delete();

The reason JSON objects exist as entities distinct from class defined objects is to provide a clear separation between objects with behavior and data only objects. Attaching functionality to data should be an explicit conversion from a data object to a classed object, rather mixing the two, javascript style.

Of course, this dichotomy could theoretically be abused with something like this:

var x = {};
var x.foo = (x) { x\*x; };
print x.foo(3); // => 9

I am considering disallowing the assignment of lambdas as field values, since they cannot be serialized, thus voiding this approach. I'll punt on the decision until implementation. If lambdas end up as first class objects, the above would have to be explictly prohibited, which may lead me to leave it in. If however, I'd have to manually support this use case, i'm going to leave it out for sure.

JSON objects exist as a convenient data format internally and for getting data in and out of Promise. The ubiquity of JSON-like syntax in most dynamic languages and it's easy mapping to object graphs makes it the ideal choice for Promise to foster simplicity and interop.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

IronRuby isn't dead, it's been set free on a farm upstate

Yesterday Jimmy Schementi posted his farewell to Microsoft and with it his thoughts on the future of IronRuby. This made it onto twitter almost immediately as "IronRuby is dead". This was shortly followed by a number of voices countering with, "No it's not, it's OSS, step up community!". I think that's rather missing the point.

Microsoft needs Ruby more than Ruby needs Microsoft

I'll even go as far as saying Ruby has no need for Microsoft. But Microsoft has a clear problem with attracting a community of passionate developers building the next generation of apps. There is a large, deeply entrenched community building enterprise apps that is going to stay loyal for a long time, but just as Office will not stay dominant as a monolithic desktop app, the future of development for Microsoft needs to be web based and there's isn't a whole lot of fresh blood coming in that way.

Fresh blood that is passionate and vocal is flocking to Ruby, Scala, Clojure, node.js, etc.Even the vocal developers on the MS stack are pretty much all playing with Ruby or another dynamic language in some capacity or other. Maybe you think that those people are just a squeaky wheel minority, and maybe you are right. But minority or not, they are people who shape the impressions that the next wave of newcomers sees first. It's the bleeding edge guys that pave the road others will follow.

Startup technology decisions are done via peer networks, not by evaluating vendor marketing messages. Instead of trying to attract and keep alpha geeks, Microsoft is pushing technologies like WebMatrix and Lightswitch, as if drag-n-drop/no-code development wasn't an already reviled stereotype of the MS ecosystem.

Some have said that IronRuby is not in the interest of Microsoft. It would just be a gateway drug that makes it even easier to jump the MS ship. Sorry, but that cat's long out of the bag. Ruby's simplicity and ecosystem make jumping ship already as easy as could be. And right now, once they jump ship, with no integration story, they will quickly loose any desire to hold on to their legacy stack.

IronRuby, while no panacea to these problems, at least offered a way for people considering .NET to not have to choose Ruby or .NET. And it had the potential to expose already devoted Ruby fans to what .NET could offer on top (I know that one is a much harder sell). If you look at the Ruby space, Ruby often benefits from having another language backing it up, which makes Ruby front-end with Scala back-end popular. And if IronRuby were competitive, being able to bridge a Ruby front-end with a C# or F# back-end and have the option to stay in-process is a story worth trying to sell.

Let the community foster IronRuby, that's what OSS is all about!

Ok, that sounds idealistic and lovely, but, um, what community are you talking about? The Ruby community? They're doing fine without IronRuby. The .NET community? Well, OSS on .NET is tiny compared to all other stacks.

Most OSS projects are 99% consumers with 1% actual comitters. And that's fine, those 99% consumers grow the community and with it the pool of one percenters increases as well. But that only works if the project has an appeal to grow the 99% pool. It is virtually impossible to reach the bulk of the .NET pool without a strong push from Microsoft. I'm always amazed how many .NET developers that I meet are completely oblivious to technology not eminating from Redmond. And these are not just people that stumbled into VB because their Excel macros didn't cut it anymore. There are good, smart developers out there that have been well served by Microsoft and have not felt the need to look outside. For a vendor that's a great captive audience, worth a lot of money, so it's been in the interest of Microsoft to keep it that way. But that also means that for a project to get momentum beyond alpha geeks on the MS stack, it's gotta be pushed by Microsoft. It needs to be a first-class citizen on the platform, built into Visual Studio, etc.

The IronRuby catch-22

IronRuby has the potential to draw fresh blood into the existing ecosystem, but won't unless it's already got a large momentum inside of the .NET ecosystem. Nobody is going to use IronRuby instead of Ruby because it's almost as good. You can't get that momemtum without leveraging the captive audience. And you can't leverage that captive audience without real support from within Microsoft. The ball's been in Microsoft's court, but apparently it rolled under a table and has been forgotten.

When cloning isn't faster than copy via serializer

Yesterday I removed Subzero's dependency on Metsys.Little. This wasn't because I have any problem with it. On the contrary, it's a great library and provides really easy serialization and deserialization. But since the whole point of Subzero is to provide very lightweight data passing, i figured cloning is preferable.

My first pass was purely a "get it working" attempt and I knew that I left a lot of performance on the table by reflecting over the type each time i cloned. Didn't think it would be this bad tho. I used two classes for my test, a Simple and a Complex object:

public class Simple {
    public int Id { get; set; }
    public string Name { get; set; }
}

public class Complex {
    public Simple Owner { get; set; }
    public IList<Simple> Friends { get; set; }
}

The results were sobering:

Simple Object:
  Incubator.Clone: 142k/sec
  BinaryFormatter:  50k/sec
  Metsys.Little:   306k/sec
  ProtoBuf:        236k/sec

Complext Object:
  Incubator.Clone: 37k/sec
  BinaryFormatter: 15k/sec
  Metsys.Little:   44k/sec
  ProtoBuf:        80k/sec

As expected, BinaryFormatter is the worst and my Incubator.Clone beats it. But clearly, I either need to do a lot of optimizing, or not bother with cloning, because Metsys.Little and ProtoBuf are far superior. My problem was reflection, so I took a look at MetSys.Litte's code, since it had to do the same things as clone + reading/writing binary. This lead me to "Reflection : Fast Object Creation" by Ziad Elmalki and "Dodge Common Performance Pitfalls to Craft Speedy Applications" by Joel Pobar, both of which provide great insight on how to avoid performance problems with Reflection.

The resulting values are

Simple Object:
  Incubator.Clone: 458k/sec

Complext Object:
  Incubator.Clone: 123k/sec

That's more like it. 1.5x over MetSys.Little on simple and 1.5x over Protobuf on complex. I still have to optimize the collection cloning logic, which should improve complex objects, since a complex, collection-less graph resulted in this:

Complext Object:
  Incubator.Clone: 226k/sec
  BinaryFormatter:  20k/sec
  Metsys.Little:   113k/sec
  ProtoBuf:         82k/sec

Metsys.Little pulled back ahead of ProtoBuf, but Incubator pulled ahead enough that it's overall ratio is now 2x. So, i should have some more performance to squeeze out of Incubator.

The common lesson from all this is that you really need to measure. Things that "ought to be fast" may turn out to be disappointing. But equally important, imho, is get things working simply first, then measure to see if optimizing is needed. Just as bad as assuming it's going to be fast is assuming that it's going to be slow and prematurely optimizing.

Reproducing Promise IoC in C

Another diversion before getting back to actual Promise language syntax description, this time trying to reproduce the Promise IoC syntax in C#. Using generics gets us a good ways there, but we do have to use a static method on a class as the registrar giving us this syntax:

$#[Catalog].In(:foo).Use<DbCatalog>.ContextScoped;
// becomes
Context._<ICatalog>().In("foo").Use<DbCatalog>().ContextScoped();

Not too bad, but certainly more syntax noise. Using a method named _ is rather arbitrary, i know, but it at least kept it more concise. Implementation-wise there's a lot of assumptions here: This approach forces the use of interfaces for Promise types, which can't be enforced by generic constraints. It would also be fairly simple to pre-initialize the Context with registrations that look for all interfaces IFoo and then find the implementor Foo and register that as the default map, mimicking the default Promise behavior by naming convention instead of Type/Class name shadowing.

Next up, instance resolution:

var foo = Foo();
// becomes
var foo = Context.Get<IFoo>();

This is where the appeal of the syntax falls down, imho. At this point you might as well just go to constructor injection, as with normal IoC. Although you do need that syntax for just plain inline resolution.

And the whole thing uses a static class, so that seems rather hardcoded. Well, at least that part we can take care of: If we follow the Promise assumption that a context is always bound to a thread, we can use [ThreadStatic] to chain context hierarchies together so that what looks like a static accessor is really just a wrapper around the context thread state. Given the following Promise syntax:

context(:inner) {
  $#[Catalog].In(:inner).ContextScoped;
  $#[Cart].ContextScoped;

  var catalogA = Catalog();
  var cartA = Cart();

  context {
    var catalogB = Catalog(); // same instance as catalogA
    var catalogC = Catalog(); // same instance as A and B
    var cartB = Cart(); // different instance from cartA
    var cartC = Cart(); // same instance as cartB
  }
}

we can write it in C# like this:

using(new Context("inner")) {
  Context._<ICatalog>().In("inner").ContextScoped();
  Context._<ICart>().ContextScoped();

  var catalogA = Context.Get<ICatalog>();
  var cartA = Context.Get<ICart>();

  using(new Context()) {
    var catalogB = Context.Get<ICatalog>(); // same instance as catalogA
    var catalogC = Context.Get<ICatalog>(); // same instance as A and B
    var cartB = Context.Get<ICart>(); // different instance from cartA
    var cartC = Context.Get<ICart>(); // same instance as cartB
  }
}

This works because Context is IDisposable. When we new up an instance, it takes the current threadstatic and stores it as it's parent and sets itself as the current. Once we leave the using() block, Dispose() is called, at which time, we set the current context's parent back as current, allowing us to build up and un-roll the context hierarchy:

public class Context : IDisposable {

  ...

  [ThreadStatic]
  private static Context _current;

  private Context _parent;

  public Context() {
    if(_current != null) {
      _parent = _current;
    }
    _current = this;
  }

  public void Dispose() {
    _current = _parent;
  }
}

I'm currently playing around with this syntax a bit more and using Autofac inside the Context to do the heavy lifting. If I find the syntax more convenient than plain Autofac, i'll post the code on github.

Freezing DTOs by proxy

In my previous post about sharing data without sharing data state, I proposed an interface for governing the transformation of mutable to immutable and back for data transfer objects (DTOs) that are shared via a cache or message pipeline. The intent was to avoid copying the object when not needed. The interface I came up with was this:

public interface IFreezable<T> where T : class {
    bool IsFrozen { get; }

    void Freeze();
    T FreezeDry();
    T Thaw();
}

The idea being that a mutable instance could be made immutable by freezing it and receivers of the data could create their own mutable copy in case they needed to track local state.

  • Freeze() freezes a mutable instance and is a no-op on a frozen instance
  • FreezeDry() returns a frozen clone of a mutable instance or the current already frozen instance. This method would be called by the container when data is submitted to make sure the data is immutable. If the originator froze it's instance, no copying is required to put the data into the container
  • Thaw() will always clone the instance, whether its frozen or not, and return a frozen instance. This is done so that no two threads accidentally get a reference to the same mutable instance.

Straight forward behavior but annoying to implement on objects that really should only be data containers and not contain functionality. You either have to create a common base class or roll the behavior for each implementation. Either is annoying and a distraction.

Freezable by proxy

What we really want is a mix-in that we can use to attach shared functionality to DTOs without their having to take on a base class or implementation burden. Since .NET doesn't have mix-ins, we do the next best thing: We wrap the instance in question with a Proxy that takes on the implementation burden.

Consider this DTO, implementing IFreezable:

public class FreezableData : IFreezable<FreezableData> {

    public virtual int Id { get; set; }
    public virtual string Name { get; set; }

    #region Implementation of IFreezable<Data>
    public virtual void Freeze() { throw new NotImplementedException(); }
    public virtual FreezableData FreezeDry() { throw new NotImplementedException(); }

    public virtual bool IsFrozen { get { return false; } }
    public virtual FreezableData Thaw() { throw new NotImplementedException(); }
    #endregion
}

This is a DTO that supports the interface, but lacks the implementation. The implementation we can provide by transparently wrapping it with a proxy.

var freezable = Freezer.AsFreezable(new FreezableData { Id = 42, Name = "Everything" });
Assert.isTrue(Freezer.IsFreezable(data);

// freeze
freezable.Freeze();
Assert.IsTrue(freezable.IsFrozen);

// this will throw an exception
freezable.Name = "Bob";

Under the hood, Freezer uses Castle's DynamicProxy to wrap the instance and handle the contract. Now we have an instance of FreezableData that supports the IFreezable contract. Simply plug the Freezer call into your IoC or whatever factory you use to create your DTOs and you get the behavior injected.

In order to provide the cloning capabilities required for FreezeDry() and Thaw(), I'm using MetSys.Little and perform a serialize/deserialize to clone the instance. This is currently hardcoded and should probably have some pluggable way of providing serializers. I also look for a method with signature T Clone() and will use that instead of serialize/deserialize to clone the DTO. It would be relatively simple to write a generic memberwise deep-clone helper that works in the same way as the MetSys.Little serializer, except that it never goes to a byte representation.

But why do I need IFreezable?

With the implementation injected and its contract really doing nothing but ugly up the DTO, why do we have to implement IFreezable at all? Well, you really don't. Freezer.AsFreezable() will work on any DTO and create a proxy that implements IFreezable.

public class Data {
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }
}

var data = Freezer.AsFreezable(new Data{ Id = 42, Name = "Everything" });
Assert.IsTrue(Freezer.IsFreezable(data);

// freeze
var freezable = data as IFreezable<Data>;
freezable.Freeze();
Assert.IsTrue(freezable.IsFrozen);

No more unsightly methods on our DTO. But in order to take advantage of the capabilties, we need to cast it to IFreezable, which from the point of view of a container, cache or pipeline, is just fine, but for general usage, might be just another form of ugly. Luckily Freezer also provides helpers to avoid the casts:

// freeze
Freezer.Freeze(data);
Assert.IsTrue(Freezer.IsFrozen(data));

// thaw, which creates a clone
var thawed = Freezer.Thaw(data);
Assert.AreEqual(42, thawed.Id);

Everything that can be done via the interface also exists as a helper method on Freezer. FreezeDry() and Thaw() will both work on an unwrapped instance, resulting in wrapped instances. That means that a container could easily take non-wrapped instances in and just perform FreezeDry() on them, which is the same as receiving an unfrozen IFreezable.

Explicit vs. Implicit Behavior

Whether to use the explicit interface implementation or just wrap the DTO with the implementation comes down to personal preference. Either methods works the same, as long as the DTO satisfies a couple of conditions:

  • All data must be exposed as Properties
  • All properties must be virtual
  • Cannot contain any methods that change data (since it can't be intercepted and prevented on a frozen instance)
  • Collections are not yet supported as Property types (working on it)
  • DTOs must have a no-argument constructor (does not have to be public)

If the DTO represents a graph, the same must be true for all child DTOs.

The code is functional but still a work in progress. It's released under the Apache license and can be found over at github.

Sharing data without sharing data state

I'm taking a break from Promise for a post or two to jot down some stuff that I've been thinking about while discussing future enhancements to MindTouch Dream with @bjorg. In Dream all service to service communication is done via HTTP (although the traffic may never hit the wire). This is very powerful and flexible, but also has performance drawbacks, which have led to many data sharing discussions.

Whether you are using data as a message payload or even just putting data in a cache, you want sender and receiver to be unable to see each others interaction with that data, which would happen if the data was a shared, mutable instance. If you were to allow shared modification on purpose or on accident can have very problematic consequences:

  1. Data corruption: Unless you wrap the data with a lock, two threads could try to modify the data at the same time
  2. Difference in distributed behavior: As soon as the payload crosses a process boundary, it ceases to be shared so changing topology, changes data behavior

There are a number of different approaches for dealing with this, each a trade-off in performance and/or usability. I'll use caching as the use case, since it's a bit more universal than message passing, but the same patterns applies.

Cloning

A naive implementation of a cache might just be a dictionary. Sure, you've wrapped the dictionary access with a mutex, so that you don't get corruption accessing the data. But multiple threads would still have access to the same instance. If you aren't aware of this sharing, expect to spend lots of time trying to debug this behavior. If you are unlucky it's not causing crashes but causes strange data corruption that you won't even know about until your data is in shambles. If you are lucky the program crashes because of an access violation of some sort.

Easy, we'll just clone the data going into the cache. Hrm, but now two threads getting the value are still messing with each other. Ok, fine, we'll clone it coming out of the cache. Ah, but if the orignal thread is still manipulating its copy data while others are getting the data, the cache keeps changing. That kind of invalidates the purpose of caching data.

So, with cloning we have to copy the data going in and coming back out. That's quite a bit of copying and in the case that the data goes into the cache and expires before someone uses it, it's a wasted copy to boot.

Immutability

If you've paid any attention to concurrency discussions you've heard the refrain from the functional camp that data should be immutable. Every modfication of the data should be a new copy with the orginal unchanged. This is certainly ideal for sharing data without sharing state. It's also a degenerative version of the cloning approach above, in that we are constantly cloning, whether we need to or not.

Unless your language supports immutable objects at a fundamental level, you are likely to be building this by hand. There's certainly ways of mitigating its cost, using lazy cloning, journaling, etc. i.e. figuring out when to copy what in order to stay immutable. But likely you are going to be building a lot of plumbing.

But if the facilities exist and if the performance characteristics are acceptable, Immutability is the safest solution.

Serialization

So far I've ignored the distributed case, i.e. sending a message across process boundaries or sharing a cache between processes. Both Cloning and Immutability rely on manipulating process memory. The moment the data needs to cross process boundaries, you need to convert it into a format that can be re-assembled into the same graph, i.e. you need to serialize and deserialize the data.

Serialization is another form of Immutability, since you've captured the data state and can re-assemble it into the original state with no ties to the original instance. So Serialization/Deserialization is a form of Cloning and can be used as an engine for immutability as well. And it goes across the wire? Sign me up, it's all i need!

Just like Immutability, if the performance characteristics are acceptable, it's a great solution. And of course, all serializers are not equal. .NET's default serializer, i believe, exists as a schoolbook example of how not to do it. It's by far the slowest, biggest and least flexible ones. On other end of scale, google's protobuf is the fastest and most compact I've worked with, but there are some flexibility concessions to be made. BSON is a decent compromise when more flexibility is needed. A simple, fast and small enough serializer for .NET that i like is @karlseguin's Metsys.Little. Regardless of serializer, even the best serializer is still a lot slower than copying in-process memory, never mind not even having to copy that memory.

Freeze

It would be nice to avoid the implicit copies and only copy or serialize/deserialize when we need to. What we need is for a way for the originator to be able to declare that no more changes will be made to the data and for the receivers of the data to declare whether they intend to modify the retrieved data, providing the folowing usage scenarios:

  • Originator and receiver won't change the data: same instance can be used
  • Originator will change data, receiver won't: need to copy in, but not coming out
  • Originator won't change the data, receiver will: can put instance in, but need to copy on the way out

In Ruby, freeze is a core language concept (a_lthough I profess my ignorance of not knowing how to get a mutable instance back again or whether this works on object graphs as well._) To let the originator and receiver declare their intended use of data in .NET, we could require data payloads to implement an interface, such as this:

public interface IFreezable<T> {
  bool IsFrozen { get; }

  void Freeze(); // freeze instance (no-op on frozen instance)
  T FreezeDry(); // return a frozen clone or if frozen, the current instance
  T Thaw();      // return an unfrozen clone (regardless whether instance is frozen)
}

On submitting the data, the container (cache or message pipeline) will always call FreezeDry() and store the returned instance. If the originator does not intend to modify the instance submitted further, it can Freeze() it first, turning the FreezeDry() that the container does into a no-op.

On receipt of the data, the instance is always frozen, which is fine for any reference use. But should the receiver need to change it for local state tracking, or submitting the changed version, it can always call Thaw() to get a mutable instance.

While IFreezable certainly offers some benefits, it'd be a pain to add to every data payload we want to send. This kind of plumbing is a perfect scenario for AOP, since its a concern of the data consumer not of the data. In my next post, I'll talk about some approaches to avoid the plumbing. In the meantime, the WIP code for that post can be found on github.

Promise: Building the repository pattern on the language IoC

Before I get into the code samples, I should point out one more "construction" caveat and change from my previous writing: Constructors don't have to be part of the Type. What does that mean? If you were to explictly declare the Song Type and excluded the Song:(name) signature from the Type, it would still get invoked if someone were to call Song{name: "foo"}, i.e. given a JSON resolution call, the existence of fields is used to try to resolve to a constructor, resulting in a call to Song:(name). Of course that's assuming that instance resolution actually hits construction and isn't using a Use lambda or returning an existing ContextScoped instance.

A simple Repository

Let's assume we have some persistence layer session and that it can already fetch DTO entities, a la ActiveRecord. Now we want to add a repository for entities fetched so that unique entities from the DB always resolve to the same instance. A simple solution to this is just a lookup of entities at resolution time:

$#[Session].In(:session).ContextScoped;
$#[Dictionary].In(:session).ContextScoped;
$#[User].Use {
  var rep = Dictionary<string,User>();
  var name = $_.name;
  rep.Get(name) ?? rep.Set(name,Session.GetByName<User>(name));
};

In the above the $_ implicit JSON initializer argument is used to determine the lookup value. I.e. given a JSON object, we can use dot notation to get to its fields, such as $_.name. This name is then used to do a lookup against a dictionary. Promise adopts the C# ?? operator to mean "if nil, use this value instead", allowing us to call .Set on the dictionary with the result from the Session. There is no return since the last value of a lambda is returned implicitly and Set returns the value set into it.

One other thing to note is the registration of Dictionary as ContextScoped. Since Dictionary is a generic type, each variation of type arguments will create a new context instance of Dictionary. For our example this means that the lambda executed for User resolution always gets the same instance of the dictionary back here.

context(:session) {
  var from = User{ name: request.from };
  var to = User{ name: request.to };
  var msg = Message{ from: from, to: to, body: request.body };
  msg.Send();
}

The usage of our setup stays nice and declarative. Gettting User instances has no knowledge how the instance is created and just passes what instance it wants, i.e. one named :name. Swapping out the resolution behavior for a service layer to get users, a mock layer to test the code, a different DB layer, all can be done without changing the business logic operating on the User instances.

A better Repository

Of course the above repository is just a dictionary and only supports getting. It assumes that Session<User>.GetByName will succeed and even then only acts as a session cache. So let's create a simple Respository class that also creates new entities and let's them be saved.

class Repository<TEntity> {
  Session _session = Session();             // manual resolve/init
  +Dictionary<String,Enumerable> _entities; // automatic resolve/init

  Get:(name|TEntity) {
    var e = entities[name] ?? _entities.Set(name,_session.GetByName<TEntity>(name) ?? TEntity{name});
    e.Save:() { _session.Save(e); };
    return e;
  }
}

Since the Repository class has dependencies of its own, this class introduces dependency injection as well. The simplest way is to just initialize the field using the empty resolver. In other languages this would be hardcoding construction, but with Promise this is of course implicit resolution against the IoC. Still, that's the same extraneous noise as C# and Java that I want to stay away from, even if the behavior is nicer. Instead of explicitly calling the resolver, Promise provides the plus (+) prefix to indicate that a field should be initialized at construction time.

The work of the repository is done in Get, which takes the name and returns the entity. As before, it does a lookup against the dictionary and otherwise set an instance into the dicitionary. However, now if the session returns nil, we call the entity's resolver with an initializer. But if we set up the resolver to call the repository, doesn't that just result in an infinite loop? To avoid this, Promise will never call the same resolver registration twice for one instance. Instead, resolution bubbles to next higher context and its registration. That means, lacking any other registration, this call will just create a new instance.

Finally, we attach a Save() method to the entity instance, which captures the session and saves the entity back to the DB. This last bit is really just there to show how entities can be changed at runtime. As repositories goes, it's actually a bad pattern and we'll fix it in the next iteration.

$#[Repository].In(:session).ContextScoped;
$#[User].Use { Repository().Get($_.name); };

The registration to go along with the Repository has gotten a lot simpler as well. Since the repository is context scoped and gets a dictionary and session injected, these two Types do not need to be registered as context scoped themselves. And User resolution now just calls the Repository getter.

context(:session) {
  var user= User{ name: request.name };
  user.email = request.email
  user.Save();
}

The access to the instance remains unchanged, but now we can change its data and persist it back using the Save() method.

Now with auto-commit

As I mentioned, the attaching of Save() was mostly to show off monkey-patching and in itself is a bad pattern. A true repository should just commit for us. So let's change the repository to reflect this:

class Repository<TEntity> {
  +Session _session;
  +Dictionary<String,Enumerable> _entities;
  _rollback = false;

  Get:(name|TEntity) {
    var e = entities[name] ?? _entities.Set(name,_session.GetByName<TEntity>(name) ?? TEntity{name});
    return e;
  };

  Rollback:() { _rollback = true; };

  ~ {
    _entities.Each( (k,v) { _session.Save(v) } ) unless _rollback;
  }
}

By attaching a Disposer to the class, we get the opportunity to save all instances at context exit. But having automatic save at the end of the :session context, begs for the ability to prevent commiting data. For this the Rollback() method simply sets a _rollback flag that governs whether we call save on the entities in the dictionary.

context(:session) {
  var user= User{ name: request.name };
  user.email = request.email
}

We've iterated over our repository a couple of times, each time changing it quite a bit. The important thing to note, however, is that the repository itself, as well as the session, have stayed invisible from the business logic. Both are an implementation detail, while the business logic itself just cared about retrieving and manipulating users.

I hope that these past posts give a good overview of how language level IoC is a simple, yet powerful way to control instance lifespan and mapping without cluttering up code. Next time, i'll return to what can be done with methods, since fundamentally Promise tries to keep keywords to a minimum and treat everything as a method/lambda call.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Public Static Void - Go at OSCON

Rob Pike - Public Static VoidMy favorite Keynote at OSCON was Rob Pike's "Public Static Void", which in a nutshell is "Why we created Go". But it is also a history lesson of how we got from C to C++ to Java and C# and how that spawned a new dynamic language revolutionin against the sometimes ridiculously complex syntax the big static languages have. Rob very succinctly illustrates that this fight has been falsely characterized as static vs. dynamic typing, when it's really more a revolt against repetitive, explicit syntax. Yes, he might be preaching to the converted here, but one of my primary motivators for playing with Promise language design, is that I fundamentally believe in the benefits of strong type systems, but also dislike the overly verbose and restrictive syntax used by the industry leaders.

The reception of this talk was likely the cause for the attendance spike in Rob's Go talk (sorry, can't find a video link for it). It started out in one of the smallest rooms at OSCON, really not a good choice to start. It quickly filled until there wasn't even standing room in the aisles. We then changed rooms to one more than twice as big and when the dust settled there were still people around the edge finding only standing room.

I'd looked at Go when it first came out and again about 6 months ago. It had lots of things I really liked and its interface duck-typing was the primary inspiration for Promise's type system. So I filed it once more in the stack of languages i need to write some simple projects in to get a better feel, but it was behind my current work with Ruby and planned excursions into Scala, Clojure and node.js.

However Rob's talk has moved Go to the top of the stack. It is a language that I philosophically agree with more than any other language. The question is whether this philosophic agreement translates into enjoying its practical use. And that can only be done by writing real code. I currently have an ASP.NET/sqlserver application i'm moving to Ruby/mongo so that i can shut down my remaining win2k8 virtual machine. My plan was to immediately turn around and rewrite it in node.js, Scala and Clojure for comparison. I will have to investigate web.go and gomongo, see how far along they are to turn this around using Go, or whether plumbing is something that still needs be be manually cobbled together.

One of the main things i still need to figure out about Go is how a modern static type system can get away with not having generics. For creating such essentials containers, enumerables, etc. i can't see a way around having generics or templating without needlessly copying code. The FAQ claims that with the use of the empty interface, this is largely not needed. Have to see how this plays out in practice, since i can't visualize it just now.

Overall, despite attending some great javascript talks at Oscon, I am most excited about Go right now. Need to harness that excitement and turn it into experience quickly.

Promise: IoC Lifespan management

In my last post about Promise i explained how a Type can be mapped to a particular Class to override the implicit Type/Class mapping like this:

$#[User].Use<MockUser>;

This registration is global and always returns a new instance, i.e. it acts like a factory for the User Type, producing MockUser instances. In this post I will talk about the creation and disposal of instances and how to control that behavior via IoC and nested execution contexts.

Dependency Injection

So far all we have done is Type/Class mapping. Before I talk about Lifespan's I want to cover Dependency Injection, both because it's one of the first motivators for people to use an IoC container and because Lifespan is affected by your dependencies as well. Unlike traditional dependency injection via constructors or setters, Promise can inject dependencies in a way that looks a lot more like Service Location without its drawbacks. We don't have constructors, just a resolution mechanism. We do not inject dependencies through the initialization call of the class, we simply declare fields and either execute resolution manually or have the language take care of it for us:

class TwitterAPI {
  _stream = NetworkStream{host: "api.twitter.com"};   // manual resolution
  +AuthProvider _authProvider;                        // automatic resolution
  ...
}

stream simply calls the Stream resolver as its initializer, which uses the IoC to resolve the instance, while authProvider uses the plus (+) prefix on the field type to tell the IoC to initialize the field. The only difference in behavior is that the first allows the passing of an initialzer JSON block, but using the resolver with just (); is identical to the + notiation.

Instance Disposal and Garbage Collection

Promise eschews destructors and provides Disposers in their stead. What, you may ask, is the difference? Instance destruction does not happen until garbage collection which happens at the discretion of the garbage collector. But disposal happens at context exit which is deterministic behavior.

class ResourceHog {
   +Stream _stream; // likely automatic disposal promotion because of disposable field

  ~:{
      // explicit disposer
   };
}

Instances go through disposal if they either have a Disposer or have a field value that has a Disposer. The Disposer is a method slot named by a tilda (~). Of course the above example would only need a disposer if Stream was mapped to a non-disposing implementation. Accessing a disposed instance will throw an exception. Disposers are not part of the Type contract which means that deciding whether or not to dispose an instance at context exit is a runtime decision made by the context.

Having deterministic clean-up behavior is very useful, but does mean that if you capture an instance from an inner context in an outer context, it may suddenly be unusable. Not definining a Disposer may not be enough, since an instance with fields does not know until runtime if one of the fields is disposable and the instance may be promoted to disposable. The safest path for instances that need to be created in one context and used in another is to have them attached to either a common parent or the root context, both options covered below.

Defining instance scope

FactoryScoped

This default scope for creating a new instance per resolver invocation is called FactoryScoped and can also be manually set (or reset on an existing registration) like this:

// Setup (or reset) the default lifespan to factory
$#[User].FactoryScoped;

// two different instances
var bob = User{name: "bob"};
var mary = User{name: "mary"};

A .FactoryScoped instance may be garbage collected when no one is holding a reference to it anymore. Disposal will happen either at garbage collection or when its execution context is exited, whichever comes first.

ContextScoped

The other type of lifespan scoping is .ContextScoped:

// Setup lifespan as singleton in the current context
$#[Catalog].ContextScoped;

// both contain the same instance
var catalogA = Catalog();
var catalogB = Catalog();

This registration produces a singleton for the current execution context, giving everyone in that context the same instance at resolution time. This singleton is guaranteed to stay alive throughout the context's life and disposed at exit.

Definining execution contexts

All code in Promise runs in an execution context, i.e. at the very least there is always he default root context. If you never define another context, a context scoped instance will be a process singleton.

You can start a new execution scope at any time with a context block:

context {
  ...
}

Context scoped instances are singletons in the current scope. You can define nested contexts, each of which will get their own context scoped instances, providing the following behavior:

$#[Foo].ContextScoped;
context {
  var fooA = Foo();

  context {
    var fooB = Foo(); // a different instance from fooA
  }
}

Since the context scope is tied to context the instance was resolved in, each nested context will get it's own singleton.

Context names

But what if i'm in a nested context, and want the instance to be a singleton attached to one of the parent contexts, or want a factory scoped instance to survive the current context? For finer control, you can target a specific context by name. The root context is always named :root, while any child context can be manually named at creation time. If not named, a new context is assigned a unique, random symbol.

println context.name; // => :root

context(:inner) {
  $#[Catalog].In(:inner).ContextScoped;
  $#[Cart].ContextScoped;

  var catalogA = Catalog();
  var cartA = Cart();

  context {
    var catalogB = Catalog(); // same instance as catalogA
    var catalogC = Catalog(); // same instance as A and B
    var cartB = Cart(); // different instance from cartA
    var cartC = Cart(); // same instance as cartB
  }
}

While, .Use and .(Factory|Context)Scoped can be used in any order, the .In method on the registration should generally be the first method called in the chain. When omitted, the global version of the Type registration is modified, but when invoked with .In, a shadow registration is created for that Type in the specified context. The reason for the deterministic ordering is that registration is just chaining method calls, each modifying a registration instance and returning the modified instance. But .In is special in that it accesses one registration instance and returns a different one. Consider these three registrations:

$#[Catalog].In(:foo).Use<DbCatalog>.ContextScoped;
// vs.
$#[Catalog].ContextScoped.In(:foo).Use<DbCatalog>;
// vs.
$#[Catalog].ContextScoped.Use<DbCatalog>;

These registrations mean, in order:

  • "for the type Catalog in context :foo, make it context scoped and use the class DbCatalog,"
  • "for the type Catalog__, make it context scoped, and in context :foo__, use class DbCatalog," and
  • "for the type Catalog, make it context scoped, use the class DBCatalog and in context :foo ..."

The first is what is intended 99% of the time. The second one might have some usefulness, where a global setting is attached and then additional qualifications are added for context :foo. The last, however, is just accidental, since we set up the global case and then access the context specific one based on the global, only to not do anything with it.

This ambiguity of chained method could be avoided by making the chain a set of modifications that are pending until some final command like:

$#[Catalog].ContextScoped.In(:foo).Use<DbCatalog>.Build;

Now it's a set of instructions that are order independent and not applied to the registry until the command to build the registration. I may revisit this later, but for right now, I prefer the possible ambiguity to the extraneous syntax noise and the possibility of unapplied registrations because .Build was left off.

What about thread isolation?

One other common scope in IoC is one that has thread affinity. I'm omitting it because as of right now I plan to avoid exposing threads at all. My plan is to use task based concurrency with messaging between task workers and the ability to suspend and resume execution of methods a la coroutines instead. So the closest analog to thread affinity i can think of is that each task will be fired off with its own context. I haven't fully developed the concurrency story for Promise but the usual thread spawn mechanism is just too imperative where I'd like to stay declarative.

Putting it all together

With scoping and context specific registration, it is fairly simple to produce very custom behavior on instance access without leaking the rules about mapping and lifespan into the code itself. Next time I will show how all these pieces can be put together, to easily build the Repository Pattern on top of the language level IoC.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Promise: IoC Type/Class mapping

Before I can get into mapping, I need to changed the way I defined getting an instance in Type and Class definition:

Getting an instance in Promise, revisited

When I talked about Object.new, I eluded to it being a call on the Type, not the Class and the IoC layer taking over, but I was still trapped in the constructor metaphor so ubiquitous in Object Oriented programming. .new is really not appropriate, since we don't know if what we are accessing is truly new. You never call a constructor, there is no access to such a beast, instead it can best be thought of an instance accessor or instance resolution. To avoid confusing things further with a loaded term like new, I've modified the syntax to this:

// get an instance
var instance = Object();

We just use the Type name followed by empty parentheses, or in the case that we want to pass a JSON initializer to the resolution process we can use:

// get an instance w/ initializer
var instance = Object{ foo: "bar" };

As before, this is a call against the implicit Type Object, not the Class Object. And, also as before, creating your own constructor intercept is still a Class Method, but now one without a named slot. The syntax looks like this (using the previous post's example):

Song:(name) {
  var this = super;
  this._name = name;
  return this;
}

The important thing to remember is that the call is against the Type, but the override is against the Class. As such we have access to the constructor super, really the only place in the language where this is possible. Being a constructor overload does mean, that a call to Song{ ... } will not necessarily result in a call to the Song class constructor intercept, either because of type mapping or lifespan managment, but i'm getting ahead of myself.

How an instance is resolved

Confused yet? The Type/Class overlapping namespace approach does seem needlessly confusing when you start to dig into the details, but I feel it's a worthwhile compromise, since for the 99% use case it's an invisible distinction. Hopefully, once I work through everything, you shouldn't even worry about there being a difference between Type and Class -- things should just work, or my design is flawed.

In the spirit of poking into the guts of the design and explaining how this all should work, I'll stop hinting at the resolution process and instead dig into the actual usage of the context system.

The Default Case

// creates new instance of class User by default
var song = User{name: "bob"};

This call uses the implicit mapping of the User type to class and creates a new User class instance. If there is no intercept for the User() Class Method, the deserializer construction path is used and if there exists a field called _name, it would be initialized with "bob".

Type to Class mapping

// this happens implicitly
$#[User].Use<User>; // first one is the Type, the second is the Class

// Injecting a MockUser instance when someone asks for a User type
$#[User].Use<MockUser>;

Promise uses the $ followed by a symbol convention for environment variables popularized by perl and adopted by php and Ruby. In perl, $ actually is the general scalar variable prefix and there just exist certain system populated globals. In Promise, like Ruby, $ is used for special variables only, such as the current environment, regex captures, etc. $# is the IoC registry. Using the array accessor with the Type name accesses the registry value for that Type, which we call the method Use<> on it.

The Use<> call betrays that Promise support a Generics system, which is pretty much a requirement the moment you introduce a Type system. Otherwise you can't create classes that can operate on a variety of other typed instances without the caller having to cast instances coming out to what they expect. Fortunately Generics only come into play when you have chosen typed instances, otherwise you just treat them as dynamic duck-typed instances that you can call whatever you want on.

Type to lambda mapping

The above mapping is a straight resolution from a Type of a class. But sometimes, you don't want a one-to-one mapping, but rather want a way to dynamically execute some code to make runtime decisions about construction. For this, you can use the lambda signature of .Use:

$#[User].Use {
  var this = Object $_;
  this:Name() { return _name; };
  return this;
};

The above is a simple example of how a dynamic type can be built at runtime to take the place of a typed instance. Of course any methods promised by User not implemented on that instance will result in a MethodMissing runtime exception on access.

The $_ environment variable is the implict capture of the lambda's signature as a JSON construct. This allows our mapping to access whatever initializer was passed in at resolution access.

$#[User].Use { return MockUser $_; };

The above example looks like it's the same as the $#[User].Use<MockUser> example, but it has the subtle difference that MockUser in this scenario is the Type, not the Class. If MockUser were mapped as well, the resolved instance would be of another class.

Doing more than static mapping

But you don't have to create a new instance in the .Use lambda, you could do something like this:

// Don't do this!
var addressbook = AddressBook();
$#[AddressBook].Use { return addressbook; };

This will create a singleton for AddressBook, but it's a bad pattern, being a process-wide global. The example only serves to illustrate that .Use can take any lambda.

So far, mapping just looks like a look-up table from Type to Class, and worse, one that is statically defined across the executing process. Next time I will show how the IoC container isn't just a globally defined Class mapper. Using nested execution context, context specific mappings and lifespan mappings, you can easily created factories, singletons and shared services, including repositories, and have those definitions change depending on where in your code they are accessed.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.