c#

Using log4net without requiring it as a dependency

Logging, to me, is a basic requirement of any code I write. There's nothing more useful to track down an odd bug than being able to turn on debug logging in production, potentially filter it down to the area of interest and get some real live data. When that code is causing trouble at a customer location, this capability becomes worth its weight in gold.

And while I am a big Interface abstraction and IoC nut, logging is so fundamental to me that I do use a static logger instance in every class and just have a hard requirement for log4net. I just want this to work and not have people worry about having to inject an ILog into my classes, forcing them to think about a concern of mine. It's far less obtrusive this way, imho.

But it does mean that no matter how small and compact my code is, i suddenly tack on a ~240KB dll requirement, often several times larger than my own code. And there goes my "i don't want them to worry about a concern of mine" out the window.

Depending on log4net only when it's there

That got me thinking about how I can avoid this dependency. I knew that if my code path doesn't hit a code in a referenced DLL, the absense of the DLL does not cause a problem. Of course that means i can't even reference their interfaces.

Instead I had to create a duplicate of the log4net ILog interface and proxy my calls to it. The result is a single file, Logger.cs with the following wire-up:

internal static class Logger {

  private static readonly bool _loggingIsOff = true;

  static Logger() {
    try {
        Assembly.Load("log4net");
        _loggingIsOff = false;
    } catch {}
  }

  public static ILog CreateLog() {
    var frame = new System.Diagnostics.StackFrame(1, false);
    var type = frame.GetMethod().DeclaringType;
    return _loggingIsOff ? (ILog)new NoLog() : new Log4NetLogger(type);
  }
  ...
}

(Full code is here)

Because it's not supposed to create a dependency, it's marked internal. Yes, that means that every assembly has a copy of it, but that's really not significant in the grand scheme of things. In returns for a couple of redundant bytes, I can now do add a logger to every class simply with:

private static readonly Logger.ILog _log = Logger.CreateLog();

And if log4net is present and configured, commence the flow of debug statements. Meanwhile I can ship my code without the log4net DLL.

By arne on | .net | A comment?
Tags: , ,

Implementing “exports” in C#

The other day i was musing about improving readability by liberating verbs from the containing objects that own them. Similar things are accomplished with mix-ins or traits in other languages, but I wanted to go another step further and allow for very context specific mapping, rather than just mixing ambigiously named methods from other objects. Since the exports construct looked a lot like a map of expressions, I decided to see what replicating the behavior with existing C# would look like.

To review, this is what I want (in C#-like pseudo code + exports):

class PageWorkflow {
  UserService _userService exports { FindById => FindUserById };
  PageService _pageService exports {
    FindById        => FindPageById,
    UpdatePage(p,c) => Update(this p,c)
  };
  AuthService _authService exports {
    AuthorizeUserForPage(u,p,Permissions.Write) => UserCanUpdatePage(u,p)
  };

  UpdatePage(userid, pageid, content) {
    var user = FindUserById(userid);
    var page = FindPageById(pageid);
    if(UserCanUpdatePage(user,page)) {
      page.Update(content);
    } else {
      throw;
    }
  }
}

And this is C# implementation of the above:

public class PageWorkflow {

    private readonly Func<int, User> FindUserById;
    private readonly Func<int, Page> FindPageById;
    private readonly Action<Page, string> Update;
    private readonly Func<User, Page, bool> UserCanUpdatePage;

    public PageWorkflow(IUserService userService, IPageService pageService, IAuthService authService) {
        FindUserById = (id) => userService.FindById(id);
        FindPageById = (id) => pageService.FindById(id);
        Update = (page, content) => pageService.UpdatePage(page, content);
        UserCanUpdatePage = (user, page) => authService.AuthorizeUserForPage(user, page, Permissions.Write);
    }

    public void UpdatePage(int userid, int pageid, string content) {
        var user = FindUserById(userid);
        var page = FindPageById(pageid);
        if(UserCanUpdatePage(user, page)) {
            Update(page, content);
        } else {
            throw new Exception();
        }
    }
}

As I mentioned, it's all possible, short of the context sensitive extension method on Page. Lacking extension methods, I was going to name the imported method UpdatePage, but since it would be a field, it conflicts with the UpdatePage workflow method despite functionally having different signatures.

All in all, the public workflow UpdatePage is pretty close to what I had wanted, but the explicit type declaration of the each exports makes it boilerplate that is likely not worth the trouble of writing, and, no, i won't even consider code generation.

This exercise along with every other language feature I've dreamt up for Promise does illustrate one thing quite clearly to me: My ideal language should provide programatic access to its parser so that the syntax of the language can be extended by the libraries. Internal DSLs are a nice start with most languages, but often they fall short of being able to reduce default boilerplate and just create new boilerplate. Sure, designing the language to be more succinct is desirable, but if anything, that only covers what the language creators could imagine. Being able to tweak a language into a DSL for the task at hand, a la MPS, seems a lot more flexible.

Is this added flexibility worth the loss of a common set of constructs that is shared by all programmers knowing language X? It certainly could be abused to become incomprehensible, but I would suggest that even knowing language X joining any Team working on a project of sufficient complexity adds its own wealth of implicit patterns and constructs and worse than a language whose compiler is extended, these constructs are communicated via comments and documentation that are not part of language X, i.e. the compiler and IDE lack the ability to aid someone learning these constructs.

Considering this, I really need to do a survey of languages that already offer this capability as well as take a closer look at MPS and Nemerle to see if the language I want is just a few parser rules a way from an existing meta programming language.

By arne on | geek, Promise | A comment?
Tags: ,

Func/Action vs. Delegate

A while back I wrote that you really never have to write another delegate again, since any delegate can easily be expressed as an Action or Func. After all what's preferable? This:

var work = worker.ProcessTaskWithUser(delegate(Task t, User u) {
  // define the work callback
});

or this:

var work = worker.ProcessTaskWithUser((t, u) => {
  // define the work callback
});

I know I prefer lambda's over delegates. But this is just on the consuming end. The signature for the above could be either:

delegate Task TaskUserDelegate(Task inputTask, User contextUser);
IEnumerable<Task> ProcessTaskWithUser( TaskUserDelegate processCallback );

or:

IEnumerable<Task> ProcessTaskWithUser( Func<Task,User,Task> processCallback );

Either one can be used with the same lambda, so using the delegate doesn't inconvenience us in consumption. But writing the Func version is certainly more concise so it seems like the winner once again. But In terms of consumption of that API, we've lost the signature of the method which would explain what each parameter is used for. Sure, .Where(Func<T,bool> filter) is pretty self-explanatory, but .WhenDone(Func<T,V,string,T> callback) really doesn't tell us much of anything.

So there seems to be straight forward usability rule of thumb: Use a delegate if the parameter's meaning isn't obvious from the usage of the lambda. But if the goal here is to make it easier for the consumer of the API, unfortunately it's not that simple, since the primary tool for communicating the API's documentation, intellisense, actually makes things worse.

Usability of delegate

For maximum usability, let's document the the API so it's meaning is discoverable:

/// <summary>
/// The task user delegate is meant to transform a given task into a new one in the context of a user.
/// </summary>
/// <param name="inputTask">The task to transform.</param>
/// <param name="activeUser">The user context to use for the transform.</param>
/// <returns>A new task in the user's context.</returns>
delegate Task TaskUserDelegate(Task inputTask, User activeUser);

/// <summary>
/// Transform all tasks for a set of users.
/// </summary>
/// <param name="processCallback">Callback for transforming each task for a specific user</param>
/// <returns>Sequence of transformed tasks</returns>
IEnumerable<Task> ProcessTaskWithUser(TaskUserDelegate processCallback) {
    //...
}

And this is what it looks like on code completion:

While TaskUserDelegate is well documented, this does not get exposed via intellisense. Worse, this signature tells us nothing about the arguments for our lambda. So, yeah, we created a better documented API, but made it's discovery worse.

Usability of func

Now, let's do the same for the func signature:

/// <summary>
/// Transform all tasks for a set of users.
/// </summary>
/// <param name="processCallback">Callback for transforming each task for a specific user</param>
/// <returns>Sequence of transformed tasks</returns>
IEnumerable<Task> ProcessTaskWithUserx(Func<Task, User, Task> processCallback) {
    //...
}

which gives us this completion:

Now we at least know the exact signature of the lambda we're creating, even if we don't know what the purpose of the arguments is.

Best usability: Plain Old Documentation

In both cases, the best discoverability ends up being plain old textual documentation of the parameter and even though the delegate provides extra documentation possibilities, their access is not convenient that for expediency i'd still have to vote for the Func signature.

The one exception to the rule would be a lambda that is meant as a dependency. I.e. a class or method that has a callback that you attach for later use, rather than immediate dispatch. In that case the lambda really functions as a single method interface and should be treated like any other dependency and be as explicit as possible.

By arne on | .net, geek | 1 comment
Tags: , , ,

I need a pattern for extending a class by an unknown other

I'm currently splitting up some MindTouch Dream functionality into client and server assemblies and running into a consistent pattern where certain classes have augmented behavior when running in the server context instead of the client context. Since the client assembly is used by the server assembly, and these classes are part of the client assembly, they do not even know about the concept of the server context. Which leads me to the dilemma, how do I inject different behavior into these classes when they run under the server context?

Ok, the short answer to this is probably "you're doing it wrong", but let's just go with it and accept that this is what's happening. Let's also assume that the server context can be discovered by static means (i.e. a static singleton accessor) which also lives in the server assembly.

Here's what I've come up with: Extract the desired functionality into an interface and chain implementations together. Each class that needs this facility would have to create it's own interface and default implementation and looks something like this:

public interface IFooHandler {
  ushort Priority { get; }
  bool TryFoo(string input, out string output);
}

TryFoo gives the handler implementation a chance to look at the inputs and decide whether to handle it, or whether to pass on it. The usage of collection of handlers takes the following form:

public string Foo(string input) {
  string output = null;
  _handlers.Where(x => x.TryFoo(input, out output)).First();
  return output;
}

This assumes that _handlers is sorted by priority. It return the result of the first handler report true on invocation. Building up the _handlers happens in the static constructor:

static Bar() {
  _handlers = typeof(IUriTranslator)
    .DiscoverImplementors()
    .Instantiate()
    .Cast<IUriTranslator>()
    .OrderBy(x => x.Priority)
    .ToArray(); 
}

where DiscoverImplementors and Instantiate are extension methods i'll leave as an exercise to the reader.

Now the server assembly simply creates its implementation of IFooHandler, gives it a higher priority and on invocation checks its static accessor to see if it's running in the server context and if not, lets the chain fall through to the next (likely the default) implementation.

This works just fine. I don't really like the static discovery of handlers, and if it weren't for backwards compatibility, I'd move the injection of handlers into a factory class and leave all that wire-up to IoC. Since that's not an option, I think this is the most elegant and performant solution.

It still feels clunky, tho. Anyone have a better solutions for changing the behavior of a method in an existing class that doesn't required the change to be in a subclass (since the instance will be of the base type)? Is there a pattern i'm overlooking?

By arne on | .net, mindtouch | 1 comment
Tags: ,

libBeanstalk.NET, a Beanstalkd client for .NET/mono

Image curtesy of jepeters74A couple of years back I wrote a store-and-forward message queue called simpleMQ for vmix. A year later, Vmix was kind enough to let me open source it and I put it up on sourceforge (cause that was the place back in the day). But it never got any documentation or promotion love from me, being much too busy building notify.me and using simpleMQ as its messaging backbone. Over those last couple of years, simpleMQ was served us incredibly well at notify.me, passing what must be billions of messages by now. But it does have warts and I have a backlog of fixes/features i've been meaning to implement.

Beanstalkd: simple & fast workqueue

Rather than invest more time on simpleMQ, i've been watching other message/work queues to see whether there is another product i could use instead. I've yet to find a product that i truly like, but Beanstalkd is my favorite of the bunch. Very simple, fast and with optional persistence, it addresses most of my needs.

Beanstalkd's protocol is inspired by memcached. It uses a persistent TCP connection, but relies on connection state only to determine which "tube" (read: workqueue) you are using and to act as a work timeout safeguard. The protocol is ASCII verbs with binary payloads and uses yaml for structured responses.

Tubes are created on demand and destroyed once empty. By default beanstalkd is in-memory only, but can use a binary log to store items and recover the in-memory state by log playback. I had briefly looked at zeromq, but after finding out that its speed relies on no persistence, I decided to give it a skip. zeromq might be web scale, but i prefer a queue that doesn't degrade to behaving like /dev/null :) Maybe my transactional RDBMS roots are showing, but I have a soft spot for at least journaling data to disk.

One concept of beanstalkd that i'm still conflicted about is that work is given a processing time-out (time-to-run) by the producer of the work, rather than having the consumer of the work declare its intended responsiveness. Since the producer doesn't know how soon the work gets picked up, i.e. time-to-run is not a measure of work item staleness, I don't see a great benefit for having the producer dictate the work terms.

The other aspect of work distribution beanstalkd lacks for my taste is the idea of being able to produce work in one instance and have it forwarded to another instance when that instance is available, i.e. store-and-forward queues. I like to keep my queues on the current host so i can produce work without having to rely on the uptime of the consumer or some central facility. However, store-and-forward is an implementation detail I can easily fake with a daemon on each machine that acts as a local consumer and distributor of work items, so it's not something i hold against beanstalkd.

libBeanstalk.NET

Notify.me being a mix of perl and C#, i needed a .NET client. A protocol complete one not existing and given the simplicity of the Beanstalkd protocol, I opted to write my own and have released it under Apache 2.0 on github.

I've not put DLLs up for downlad, since the API is still somewhat in flux as I continue to add features, but the current release supports the entire 1.3 protocol. By default it considers all payloads as binary streams, but I've included extension methods to handle simple string payloads:

  // connect to beanstalkd
  using(var client = new BeanstalkClient(host, port)) {

    // put some data
    var put = client.Put("foo");
 
    // reserve data from queue
    var reserve = client.Reserve();
    Console.Writeline("data: {0}", reserve.Data);
    
    // delete reserved data from queue
    client.Delete(reserve.JobId);
  }

The binary surface is just as simple:

  // connect to beanstalkd
  using(var client = new BeanstalkClient(host, port)) {

    // put some data
    var put = client.Put(100, 0, 120, data, data.Length);
 
    // reserve data from queue
    var reserve = client.Reserve();
    
    // delete reserved data from queue
    client.Delete(reserve.JobId);
  }

I've tried to keep the interface IBeanstalkClient to be a close as possible to the protocol verb signatures and rely on extension methods to create simpler versions on top of that interface. To facilitate extensions that provide smart defaults, the client also has an instance of a Defaults member that can be used to initialize those values.

The main deviation from the protocol is how I handle producer and consumer tubes. Rather than have a separate getter and setter for the tube that put will enter work into, I simply have a settable property CurrentTube. And rather than surfacing watch, ignore and listing of consumer tubes, the client includes a special collection, WatchedTubes, with the following interface:

interface IWatchedTubeCollection : IEnumerable<string> {
    int Count { get; }
    void Add(string tube);
    bool Remove(string tube);
    bool Contains(string tube);
    void CopyTo(string[] array, int arrayIndex);
    void Refresh();
}

I was originally going to use ICollection<string>, but Clear() did not make sense and I wanted to have a manual method to reload the list from the server, which is exposed via Refresh(). Under the hood, watched tubes is a hashset, so adding the same tube multiple times has no effect, neither is order of tubes in the collection guaranteed.

Future work

The client is functional and can do everything that Beanstalkd offers, but it's really just a wire protocol, akin to dealing with files as stream. To make this a useful API, it really needs to take the 90% use cases and remove any friction and repetition they would encounter.

Connection pooling

BeanstalkClient isn't, nor is it meant to be, thread-safe. It assumes you create a client when you need it and govern access to it, rather than sharing a single instance. This was motivated by Beanstalkd's behavior of storing tube state as part of the connection. Given that I encourage clients to be created on the fly to enqueue work, it makes sense that under the hood clients should use a connection pool both to re-use existing connections rather than constantly open and close sockets and to limit the maximum sockets a single process tries to open to Beanstalkd. Pooling wouldn't mean sharing of a connection by clients, but handing off connections to new clients and putting them in a pool to be closed on idle timeout once the client is disposed.

Most of this work is complete and on a branch, but i want to put it through some more testing before merging it back to master, especially since it will introduce client API changes.

Distributed servers

The Beanstalkd FAQ has this to say about distribution:

Does beanstalk inherently support distributed servers?

Yes, although this is handled by the clients, just as with memcached. The beanstalkd server doesn’t know anything about other beanstalkd instances that are running.

I need to take a look at the clients that do implement this and determine what that means for me. I.e. do they use some kind of consistent hashing to determine which node to use for a particular tube, etc. But I do want to have parity with other clients on this.

POCO Producers and Consumers

For me, the 90% use case for a work queue is produce work on some threads/processes/machines and consume that work on a number of workers. Generally that item will have some structured fields describing the work to be done and producers and consumers will use designated tubes for specific types of work. These use cases imply that producers and consumers are separate user stories, that they are tied to specific tubes and deal with structured data. My current plan is to address these user stories with two new interfaces that will look similar to these:

public interface IBeanstalkProducer<T> {
  BeanstalkProducerDefaults Defaults { get; }
  PutResponse Put(T);
}

public interface IBeanstalkConsumer<T> {
  BeanstalkConsumerDefaults Defaults { get; }
  Job<T> Reserve();
  bool Delete(Job<T> job);
  Release(Job<T> job);
}

The idea with each is that it's tied to a tube (or tubes for the consumer) at construction time and that the implementation will have a simple way of associating a serializer for the entity T (will provide protobuf and MetSys.Little support out of the box).

Rx support via IObservable<Job<T>>

Once there is the concept of a Job<T>, it makes sense that reservation of jobs should be exposed as a stream of work that can be processed via link. Although since items should only be reserved when the subscriber accepts the work, it should probably be encapsulated in something like this:

public interface Event<T> {
  Job<T> Take();
  Job<T> Job { get; }
}

This way, multiple subscribers can try to reserve an item and items not reserved by anyone are released automatically.

As I work on the future work items, I will also use the library in production so i can get better educated about the real world behavior of Beanstalkd and what uncovered scenarios the client runs into. There is ok test coverage over the provided behavior but I certainly want to increase that signficantly as i keep working on it.

For the time being, I hope the library proves useful to other .NET developers and would love to get feedback, contributions and issues you may encounter.

By arne on | .net, geek, mono | 3 comments
Tags: , , ,

Loading per solution settings with Visual Studio 2008

If you ever work on a number of projects (company, oss, contracting), you're likely familiar with coding style issues. At MindTouch we use a number of naming and formatting styles that are different from the Visual Studio defaults, so when I work on github and other OSS projects my settings usually cause formatting issues. One way to address this is to use ReSharper and its ability to store per solution settings. But I still run into some formatting issues with Visual Studio settings that are not being overriden by ReSharper, especially when using Ctrl-K, Ctrl-D, which has become a muscle memory keystroke for me.

While Visual Studio has only per install global settings, it does at least let you import and export them. You'd figure that that could be a per solution setting, but after looking around a bit and getting the usual re-affirmation that Microsoft Connect exists purely to poke developers in the eye, i only found manual or macro solutions. So i decided to write a Visual Studio 2008 Add-in to automate this behavior.

Introducing: SolutionVSSettings Add-in

The goal of the Add-in is to be able to ship your formatting rules along with your solution (which is why I also highly recommend using ReSharper, since you can set up naming conventions and much more). I wanted to avoid any dialogs or user interaction requirements with the process, but wanted to leave open options for overriding settings in case you do use the Add-in but have one or two projects you don't want accept settings for or want to have a default setup you want to use without a per solution setting. The configuration options are listed in order of precedence:

Use the per solution solutionsettings.config config xml file

The only option for the config file right now is an absolute or relative path (relative to the solution) to the .vssettings file to load as the solution is loaded. You can check the config file in with your solution, or keep it as a local file ignored by source control and point to a settings file in the solution or a common one somewhere else on your system. Currently the entirely of configuration looks like this:

<config>
  <settingsfile>{absolute or relative path to settingsfile}</settingfile>
</config>

The purpose of this method, even though it is the highest precendence, is to easily set up an override for a project that already has a settings file that the Add-In would otherwise load.

Recommended default: Include 'solution.vssettings' in Solution

If no solutionsettings.config is found, the Add-in will look for a solution item named 'solution.vssettings' and load it as the solution settings. Since this file is part of the solution and will be checked in with the code, this is the recommended default for sharing settings.

Use environment variable 'solutionsettings.config'

Finally, if no settings are found by the other methods, the Add-in will look for an environment variable 'solutionsettings.config' to find an absolute or relative path (relative to the solution) to a config file (same as above) from which to get the settings file path. This is particularily useful if you have a local standard and don't include it in your own solutions, but need to make sure that local standard is always loaded, even if another solution previously loaded its own settings.

How does it work?

The workhorse is simply a call to:

DTE2.ExecuteCommand("Tools.ImportandExportSettings", "/import:{settingfile}");

The rest is basic plumbing to set up the Add-in and find the applicable settings file.

The Add-In subclasses IDTExtensibility2 and during initialization subscribes itself to receive all solution loaded events:

public void OnConnection(object application, ext_ConnectMode connectMode, object addInInst, ref Array custom) {
    _applicationObject = (DTE2)application;
    _addInInstance = (AddIn)addInInst;
    _debug = _applicationObject.ToolWindows.OutputWindow.OutputWindowPanes.Add("Solution Settings Loader");
    Output("loaded...");
    _applicationObject.Events.SolutionEvents.Opened += SolutionEvents_Opened;
    Output("listening for solution load...");
}

I also setup an OutputWindow to report what the Add-In is doing. OutputWindows are a nice, unobtrusive ways in Visual studio to report status and not be seen unless the user cares.

The event handler for solutions being opened does the actual work of looking for possible settings and if one is found to load it:

void SolutionEvents_Opened() {
    var solution = _applicationObject.Solution;
    Output("loaded solution '{0}'", solution.FileName);

    // check for solution directory override
    var configFile = Path.Combine(Path.GetDirectoryName(solution.FileName), "solutionsettings.config");
    string settingsFile = null;
    if(File.Exists(configFile)) {
        Output("trying to load config from '{0}'", configFile);
        settingsFile = GetSettingsFile(configFile, settingsFile);
        if(!string.IsNullOrEmpty(settingsFile)) {
            Output("unable to find override '{0}'", settingsFile);
        } else {
            Output("using solutionsettings.config override");
        }
    }

    // check for settings in solution
    if(string.IsNullOrEmpty(settingsFile)) {
        var item = _applicationObject.Solution.FindProjectItem(SETTINGS_KEY);
        if(item != null) {
            settingsFile = item.get_FileNames(1);
            Output("using solution file '{0}'", settingsFile);
        }
    }

    // check for environment override
    if(string.IsNullOrEmpty(settingsFile)) {
        configFile = Environment.GetEnvironmentVariable("solutionsettings.config");
        if(!string.IsNullOrEmpty(configFile)) {
            settingsFile = GetSettingsFile(configFile, settingsFile);
            if(string.IsNullOrEmpty(settingsFile)) {
                Output("unable to find environment override '{0}'", settingsFile);
            } else {
                Output("using environment config override");
            }
        }
    }
    if(string.IsNullOrEmpty(settingsFile)) {
        Output("no custom settings for solution.");
        return;
    }
    var importCommand = string.Format("/import:\"{0}\"", settingsFile);
    try {
        _applicationObject.ExecuteCommand("Tools.ImportandExportSettings", importCommand);
        Output("loaded custom settings\r\n");
    } catch(Exception e) {
        Output("unable to load '{0}': {1}", settingsFile, e.Message);
    }
}

And that's all there is to it.

More work went into figuring out how to build an installer than building the Add-In…. I hate MSIs. At least i was able to write the installer logic in C# rather than one of the more tedious extensibility methods found in InstallShield (the least value for your money I've yet to find in any product) or Wix (a vast improvment over other installers, but it's still the victim of MSIs.)

Installation, source and disclaimer

The source can be found under Apache license at GitHub, which also has an MSI for those just wishing to install it.

NOTE: The MSI and source code come with no express or implied guarantees. It may screw things up in your environment. Consider yourself warned!!

This Add-In does blow away your Visual Studio settings with whatever settings file is discovered via the above methods. So, before installing this you should definitely back up your settings. I've only tested it on my own personal setup, so it may certainly misbehave on someone else's setup. It's certainly possible it screws up your settings or even Visual Studio install. I don't think it will, but I certainly can't call this well tested across environment, so backup and use at your own risk.

By arne on | .net, geek | 1 comment
Tags: , , , ,

Reproducing Promise IoC in C#

Another diversion before getting back to actual Promise language syntax description, this time trying to reproduce the Promise IoC syntax in C#. Using generics gets us a good ways there, but we do have to use a static method on a class as the registrar giving us this syntax:

$#[Catalog].In(:foo).Use<DbCatalog>.ContextScoped;
// becomes
Context._<ICatalog>().In("foo").Use<DbCatalog>().ContextScoped();

Not too bad, but certainly more syntax noise. Using a method named _ is rather arbitrary, i know, but it at least kept it more concise. Implementation-wise there's a lot of assumptions here: This approach forces the use of interfaces for Promise types, which can't be enforced by generic constraints. It would also be fairly simple to pre-initialize the Context with registrations that look for all interfaces IFoo and then find the implementor Foo and register that as the default map, mimicking the default Promise behavior by naming convention instead of Type/Class name shadowing.

Next up, instance resolution:

var foo = Foo();
// becomes
var foo = Context.Get<IFoo>();

This is where the appeal of the syntax falls down, imho. At this point you might as well just go to constructor injection, as with normal IoC. Although you do need that syntax for just plain inline resolution.

And the whole thing uses a static class, so that seems rather hardcoded. Well, at least that part we can take care of: If we follow the Promise assumption that a context is always bound to a thread, we can use [ThreadStatic] to chain context hierarchies together so that what looks like a static accessor is really just a wrapper around the context thread state. Given the following Promise syntax:

context(:inner) {
  $#[Catalog].In(:inner).ContextScoped;
  $#[Cart].ContextScoped;

  var catalogA = Catalog();
  var cartA = Cart();

  context {
    var catalogB = Catalog(); // same instance as catalogA
    var catalogC = Catalog(); // same instance as A and B
    var cartB = Cart(); // different instance from cartA
    var cartC = Cart(); // same instance as cartB
  }
}

we can write it in C# like this:

using(new Context("inner")) {
  Context._<ICatalog>().In("inner").ContextScoped();
  Context._<ICart>().ContextScoped();

  var catalogA = Context.Get<ICatalog>();
  var cartA = Context.Get<ICart>();

  using(new Context()) {
    var catalogB = Context.Get<ICatalog>(); // same instance as catalogA
    var catalogC = Context.Get<ICatalog>(); // same instance as A and B
    var cartB = Context.Get<ICart>(); // different instance from cartA
    var cartC = Context.Get<ICart>(); // same instance as cartB
  }
}

This works because Context is IDisposable. When we new up an instance, it takes the current threadstatic and stores it as it's parent and sets itself as the current. Once we leave the using() block, Dispose() is called, at which time, we set the current context's parent back as current, allowing us to build up and un-roll the context hierarchy:

public class Context : IDisposable {

  ...

  [ThreadStatic]
  private static Context _current;

  private Context _parent;

  public Context() {
    if(_current != null) {
      _parent = _current;
    }
    _current = this;
  }

  public void Dispose() {
    _current = _parent;
  }
}

I'm currently playing around with this syntax a bit more and using Autofac inside the Context to do the heavy lifting. If I find the syntax more convenient than plain Autofac, i'll post the code on github.

By arne on | .net, geek | A comment?
Tags: , , ,

Freezing DTOs by proxy

In my previous post about sharing data without sharing data state, I proposed an interface for governing the transformation of mutable to immutable and back for data transfer objects (DTOs) that are shared via a cache or message pipeline. The intent was to avoid copying the object when not needed. The interface I came up with was this:

public interface IFreezable<T> where T : class {
    bool IsFrozen { get; }

    void Freeze();
    T FreezeDry();
    T Thaw();
}

The idea being that a mutable instance could be made immutable by freezing it and receivers of the data could create their own mutable copy in case they needed to track local state.

  • Freeze() freezes a mutable instance and is a no-op on a frozen instance
  • FreezeDry() returns a frozen clone of a mutable instance or the current already frozen instance. This method would be called by the container when data is submitted to make sure the data is immutable. If the originator froze it's instance, no copying is required to put the data into the container
  • Thaw() will always clone the instance, whether its frozen or not, and return a frozen instance. This is done so that no two threads accidentally get a reference to the same mutable instance.

Straight forward behavior but annoying to implement on objects that really should only be data containers and not contain functionality. You either have to create a common base class or roll the behavior for each implementation. Either is annoying and a distraction.

Freezable by proxy

What we really want is a mix-in that we can use to attach shared functionality to DTOs without their having to take on a base class or implementation burden. Since .NET doesn't have mix-ins, we do the next best thing: We wrap the instance in question with a Proxy that takes on the implementation burden.

Consider this DTO, implementing IFreezable:

public class FreezableData : IFreezable<FreezableData> {

    public virtual int Id { get; set; }
    public virtual string Name { get; set; }

    #region Implementation of IFreezable<Data>
    public virtual void Freeze() { throw new NotImplementedException(); }
    public virtual FreezableData FreezeDry() { throw new NotImplementedException(); }

    public virtual bool IsFrozen { get { return false; } }
    public virtual FreezableData Thaw() { throw new NotImplementedException(); }
    #endregion
}

This is a DTO that supports the interface, but lacks the implementation. The implementation we can provide by transparently wrapping it with a proxy.

var freezable = Freezer.AsFreezable(new FreezableData { Id = 42, Name = "Everything" });
Assert.isTrue(Freezer.IsFreezable(data);

// freeze
freezable.Freeze();
Assert.IsTrue(freezable.IsFrozen);

// this will throw an exception
freezable.Name = "Bob";

Under the hood, Freezer uses Castle's DynamicProxy to wrap the instance and handle the contract. Now we have an instance of FreezableData that supports the IFreezable contract. Simply plug the Freezer call into your IoC or whatever factory you use to create your DTOs and you get the behavior injected.

In order to provide the cloning capabilities required for FreezeDry() and Thaw(), I'm using MetSys.Little and perform a serialize/deserialize to clone the instance. This is currently hardcoded and should probably have some pluggable way of providing serializers. I also look for a method with signature T Clone() and will use that instead of serialize/deserialize to clone the DTO. It would be relatively simple to write a generic memberwise deep-clone helper that works in the same way as the MetSys.Little serializer, except that it never goes to a byte representation.

But why do I need IFreezable?

With the implementation injected and its contract really doing nothing but ugly up the DTO, why do we have to implement IFreezable at all? Well, you really don't. Freezer.AsFreezable() will work on any DTO and create a proxy that implements IFreezable.

public class Data {
    public virtual int Id { get; set; }
    public virtual string Name { get; set; }
}

var data = Freezer.AsFreezable(new Data{ Id = 42, Name = "Everything" });
Assert.IsTrue(Freezer.IsFreezable(data);

// freeze
var freezable = data as IFreezable<Data>;
freezable.Freeze();
Assert.IsTrue(freezable.IsFrozen);

No more unsightly methods on our DTO. But in order to take advantage of the capabilties, we need to cast it to IFreezable, which from the point of view of a container, cache or pipeline, is just fine, but for general usage, might be just another form of ugly. Luckily Freezer also provides helpers to avoid the casts:

// freeze
Freezer.Freeze(data);
Assert.IsTrue(Freezer.IsFrozen(data));

// thaw, which creates a clone
var thawed = Freezer.Thaw(data);
Assert.AreEqual(42, thawed.Id);

Everything that can be done via the interface also exists as a helper method on Freezer. FreezeDry() and Thaw() will both work on an unwrapped instance, resulting in wrapped instances. That means that a container could easily take non-wrapped instances in and just perform FreezeDry() on them, which is the same as receiving an unfrozen IFreezable.

Explicit vs. Implicit Behavior

Whether to use the explicit interface implementation or just wrap the DTO with the implementation comes down to personal preference. Either methods works the same, as long as the DTO satisfies a couple of conditions:

  • All data must be exposed as Properties
  • All properties must be virtual
  • Cannot contain any methods that change data (since it can't be intercepted and prevented on a frozen instance)
  • Collections are not yet supported as Property types (working on it)
  • DTOs must have a no-argument constructor (does not have to be public)

If the DTO represents a graph, the same must be true for all child DTOs.

The code is functional but still a work in progress. It's released under the Apache license and can be found over at github.

By arne on | .net, geek | 2 comments
Tags: , ,

Promise: Inversion of Control is the new garbage collection

Before continuing with additional forms of method defintions, I want to take a detour through the Inversion of Control facilities, since certain method resolution behavior relies on those facilities. IoC is one feature of Promise that is meant to not be seen or even thought about 99% of the time, but when you need to manipulate its default behavior it is a fairly broad topic, which I will cover in the next 3 or 4 posts. If you want to see code, you probably want to just go to the next post, since this one is mostly about the reasoning for inclusion of IoC in the language itself.

The evolution of managing instances

Not too long Garbage Collection was considered the domain of academic programming. My first experience with it was writing LISP on Symbolics LISP machines. And while it was a wonderful development experience, you got used to the Listener (think REPL on LISP machines) to pause and the status Genera status bar blinking with (garbage-collect). Ok, but that's back on hardware significantly less powerful than my obsolete Razor flip-phone.

These days garbage collection is pretty much a given. The kind of people that say you have to use C to get things done are the same kind of people that used to say that you have to use assembly to get things done, i.e. they really are talking about edge cases. Even games are using scripting languages for much of their game logic these days. Whether it's complex generational garbage collection or simple reference counting, most languages are memory managed at this point.

The lifespan phases of an instance

But still we have the legacy of malloc and free with us. We still new-up instances and while there's fairly little use of destructors, we still run into scenarios that require decomissioning of objects before garbage collection gets rid of them. And while on the subject of construction and destruction, we're still manually managing the lifespan from creation to when we purposely let them drop out of scope so GC can do its magic.

Somehow while moving to garbage collection so that we don't have to worry about that plumbing, we kept the plumbing of manually handling construction, initialization and disposal. That doesn't seem like work related to solving the task at hand, but rather more like ceremony we've grown used to. We now have three phases in an instance lifespan, only one of which is actually useful to problem solving:

Construction/Initialization

Depending on which language you are using, this might be a single constructor stage (Java, C#, Ruby, et al) or an allocation and initialization stage (Smalltalk, Objective-C, et al). Either way, you do not want your code to start interacting with the instance until these stages are completed

Operation

This is the useful stage of the instance, when it actually can fullfill its purpose in our program. This should really be the only stage we ever need to see.

Disposal/Destruction

We're done with the instance, so we need to clean up any references it has and resources it has a hold of and let the garbage collector do the rest. By definition, it has passed its useful life and we just want to make sure it's not interfering with anything still executing.

Complicating disposal is that most garbage collected languages have non-deterministic destructors, which are not  invoked until the time of collection and may be long after use of the instance has ceased. Since there are scenarios where clean-up needs to happen in a deterministic fashion (such as closing file and network handles), C# added the IDisposable pattern. This pattern seems more like a "oh, crap, what do we do about deterministic cleanup?" add-on than a language feature. It completely puts the onus on the programmer both for calling .Dispose (unless in a using block) and for handling access to an already disposed instance.

Enter Inversion of Control

For the most part, all we should care about is that when we want an instance with certain capabilities, we should be able to get access to one. Who cares if it was freshly constructed or a shared instance or a singleton or whatever. Those are details that are important but once defined not part of the user story we set out to satisfy.

In Java and C#, this need for pushing instance management out of the business logic and into dedicated infrastructure led to the creation of Inversion of Control containers, named thus because they invert the usual procedural flow of "create an object, hand it to another object constructor as a dependency, etc." to "ask for the object you need and the depedency chain will be resolved for you". There are numerous articles on the benefits of Dependency Injection and Inversion of control. One of the simplest explanation was given by John Munch to the Stackoverflow question "How to explain Dependency Injection to a 5-year-old":

When you go and get things out of the refrigerator for yourself, you can cause problems. You might leave the door open, you might get something Mommy or Daddy doesn't want you to have. You might even be looking for something we don't even have or which has expired.

What you should be doing is stating a need, "I need something to drink with lunch," and then we will make sure you have something when you sit down to eat.

But IoC goes beyond the wiring-up of object graphs that DI provides. It is also responsible for knowing when to hand you a singleton vs. a shared instance for the current scope vs. a brand new instance and handles disposal of those instance as their governing scopes are exited.

These frameworks are build on top of the existing constructor plumbing and use reflection to figure out how to take over the tasks that used to fall to the programmer. For Promise this plumbing is considered a natural extension of what we already expect of garbage collection and tries to be automatic and invisible.

By default every "constructor" access to an instance resolves the implicit Type to the Class of the same name, and creates an instance, i.e. behavior as you expect from OO languages. However, using nested execution scopes, lifespan management and Type mapping, this behavior can be modified without touching the business logic. In the next post, I'll start by explaining how the built in IoC works by tackling Type/Class mapping.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.

Promise: Lambdas

Lambdas in Promise, like other languages, are anonymous functions and first-class values. They act as closures over the scope they are defined in and can capture the free variables in their current lexical scope. Promise uses lambdas for all function definitions, going further even than javascript which has both named and anonymous functions. In Promise there are no named functions. Just slots that lambdas get assigned to for convenient access.

Straddling the statically/dynamically typed divide by allowing arguments and return values to optionally declare a Type, Promise mimicks C# lambda syntax more than say, LISP, javascript, etc. A simple lambda example looks like this:

var i = 0;
var incrementor = { ++i; };
print incrementor(); // => 1

This declaration doesn't have any input arguments to declare, so it uses a shortform of simply assigning a block to a variable. The standard form uses the lambda operator =>, so the above lambda could just as well be written as:

var incrementor = () => { ++i; };

I'm currently debating whether I really need the =>. It's mostly that i'm familiar with the form from C#. But given that there are no named functions, parentheses followed by a block can't occur otherwise, so there is no ambiguity. So, i'm still deciding whether or not to drop it:

var x = (x,y) => { x + y };
// vs.
var x = (x,y) { x + y };

Inputs

The signature definition of lambdas borrows heavily from C#, using a left-hand side in parantheses for the signature, followed by the lambda operator. Input arguments can be typed, untyped or mixed:

var untypedAdd = (x,y) => { x + y; };
var typedAdd = (Num x, Num y) => { x + y; };
var mixedtypeAdd = (Num x, y) => { x + y; };

Output

In dynamic languages, lambda definitions do not need a way to express whether they return a value–there is no type declaration so whether or not to expect a value is convention and documentatio driven. In C# on the other hand, a lambda can either not return a value, a void method, which uses one of the Action delegates, or return a value and declare it as the last type in the declaration using the Func delegates. In Promise all lambdas return a value, even if that value is nil (more about the special singleton instance nil later). Values can be returned by explicitly using the return keyword, otherwise it defaults simply to the value of the last statement executed before exiting the closure. Since return values can be typed, we need a way to declare that Type. Unlike C#, our lambdas aren't a delegate signature, so instead of reserving the last argument Type as the return Type, which would be ambiguous, Promise uses the pipe '|' character to optionally declare a return type:

var returnsUntyped = (x,y) => { x + y; };
var returnsTyped = (x,y|Num) => { x + y; };
var explicitReturn = (|Num) => { returnsTyped(1,2); };

Default values

Lambdas can also declare default values for arguments, which can be simple values or expressions:

var simple = (x=2,y=3) => { x + y; };
var complex= (x=simple()) => { x; };

Calling Conventions

Promise supports three different method calling styles. The first is the standard parentheses style as shown above. In this style, optional values can only be used by leaving out trailing arguments like this:

var f = (x=1,y=2,z=3) => { x + y +z; };
print f(2,2,2); // => 6
print f(2,2);   // => 7
print f(4);     // => 9
print f();      // => 6

If you want to omit a leading argument, you have to use the named calling style, using curly brackets, which was inspired by DekiScript. The curly bracket style uses json formatting, and since json is a first-class construct in Promise, calling the function by hand with {} or providing a json value behaves the same, allowing for simple dynamic argument construction:

print f{y: 1};       // => 5
print f{z: 1, y: 1}; // => 3
var args = {z: 5};
print f args;        // => 8

Finally there is the whitespace style, which replaces parentheses and commas with whitespace. This style exists primarily to make DSL creation more flexible:

print f 2 2 2; // => 6
print f 2 2;   // => 7
print f 4;     // => 9
print f;       // => 6

Note the final call simply uses the bare variable f. This is possible because in Promise a lambda requiring no arguments can take the place of a value and accessing the variable executes the lambda. Sometimes it's desirable to access or pass a reference to a lambda, not execute it, in which case the reference notation '&' is needed. Using reference notation on a value is harmless (at least that's my current thinking), since Promise has no value types, so the reference of a value is the value:

var x = 2;
var y = () => { x+10; };
var x2 = &x;
var y2 = &y;
var y3 = y;
x++;
print x2; // => 3;
print y2; // => 13;
print y3; // => 12;

The output of y3 is only 12, because assignment of y3 evaluated y, capturing x before it was incremented.

Closures and Scope

As mentioned above, Lambdas can capture variables from their current scope. Scopes are nested, so a lambda can capture  variables from any of the parent scopes

var x = 1;
var l1 = (y) => {
  return () => { x++; y++; x + y; };
};
print l1(2); // => 5
print l1(2); // => 6
print x;     // => 3

Similar to javascript, a block is not a new scope. This is done primarily for scoping simplicity, even if it introduces some side-effects:

() => {
  var x = 5;
  if( x ) {
    var x = 5; // illegal
    var y = 10;
  }
  return y; // legal since the variable declaration was in the same scope
};

Using anonymous functions

As I've said that lambdas are the basic building block, meaning there is no other type of function definition. You can use them as lazily evaluated values, you can pass them as blocks to be invoked by other blocks and as I will discuss next time, Methods are basically polymorphic, named slots defined inside the closure of the class (i.e. capturing the class definition's scope), which is why there is no need for explicitly named functions.

More about Promise

This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.