geek

August 9, 2012
in geek
1 min read

Moq'ing a Func or Action

Just ran into trying to test a method that takes a callback in the form of Func. Initially i figured, it's a func, dead simple, i'll just roll my own func. But the method calls it multiple times with different input and expecting different output. At that point, I really didn't want to write a fake that that could check input on those invocations and return the right thing and let me track invocations. That's what Moq does so beautifully. Except it doesn't for Func or Action.

Fortunately the solution was pretty simple. All i needed was a class that implemented a method with the same signature as the Func, mock it and pass a reference to the mocked method in as my func, voila, mockable Func:

public class FetchStub {
  public virtual PageNodeData Fetch(Title title) { return null; }
}

[Test]
public Can_moq_func() {

  // Arrange
  var fetchMock = new Mock<FetchStub>();
  var title = ...;
  var node = ...;
  fetchMock.Setup( x => x.Fetch(It.Is(y => y == title))
    .Returns(node)
    .Verifiable();
  ...

  // Act
  var nodes = titleResolver.Resolve(titles, fetchMock.Object.Fetch);

  // Assert
  fetchMock.VerifyAll();
  ...
}

Now i can set up as many expectations for different input as i want and verify all invocations at the end.

August 1, 2012
in net, geek
1 min read

Using lambda expression with .OrderBy

It's always bugged me that I could use a lambda expression with List<T>.Sort(), but not with IEnumerable<T>.OrderBy, i.e. given data object Data:

public class Data {
  public int Id;
  public string Name;
  public int Rank;
}

you can sort List<Data> in place like this:

data.Sort((left, right) => {
var cmp = left.Name.CompareTo(right.Name);
  if(cmp != 0) {
    return cmp;
  }
  return right.Rank.CompareTo(left.Rank);
});

But if I had an IEnumerable<Data>, orderby needs a custom IComparer and can't take a lambda.

Fortunately, it is fairly easy to create a generic IComparer that does take a lambda and an extension method to simplify the signature to just a lambda expression:

public static class OrderByExtension {
    private class CustomComparer<T> : IComparer<T> {
        private readonly Func<T,T,int> _comparison;

        public CustomComparer(Func<T,T,int> comparison) {
            _comparison = comparison;
        }

        public int Compare(T x, T y) {
            return _comparison(x, y);
        }
    }

    public static IEnumerable<T> OrderBy<T>(this IEnumerable<T> enumerable, Func<T,T,int> comparison) {
        return enumerable.OrderBy(x => x, new CustomComparer<T>(comparison));
    }

    public static IEnumerable<T> OrderByDescending<T>(this IEnumerable<T> enumerable, Func<T,T,int> comparison) {
        return enumerable.OrderByDescending(x => x, new CustomComparer<T>(comparison));
    }
}

Now I can use a lambda just like in sort:

var ordered = unordered.OrderBy((left, right) => {
    var cmp = left.Name.CompareTo(right.Name);
    if(cmp != 0) {
        return cmp;
    }
    return right.Rank.CompareTo(left.Rank);
});

Of course, .OrderByDescending is kind of redundant, since you can invert the order in the lambda just as easily, but is added simply for completeness.

July 27, 2012
in geek, happenstance
10 min read

Towards a decentralized, federated status network

So everyone is talking about join.app.net and I agree with almost everything they say. I'm all for promoting services in which the user is once again the customer rather than, as in ad supported systems, the product. But it seems that in the end, everyone wants to build a platform. From the user's perspective, though, that just pushes the ball down the court: You are still beholden to a (hopefully benevolent) overlord. Especially since the premise that such an overlord is required to create a good experience is faulty. RSS, Email, the web itself, are just a couple of techs we rely on that have no central control, are massively accepted, work pretty damn well and continue to enable lots of successful and profitable businesses without anyone owning "the platform". So can't we have a status network truly as infrastructure, built on the same principles and enabling companies to build successful services on top?

Such a system would have to be decentralized, distributed and federated, so that there is no barrier for anyone to start producing or consuming data in the network. As long as there is a single gatekeeper owning the pipeline, the best everyone else can hope to do it innovate on the API surface that owner has had the foresight to expose.

The Components

Thinking about a simple spec for such a status I started on a more technical spec over at github, but wanted to cover the motivations behind the concepts separately here. I'm eschewing defining the network on obviously sympathetic technologies such as ATOM and PubSubHubub, because they provide lots of legacy that is not needed while still not being a proper fit.

I'm also defininig the system as a composition of independent components to avoid requiring the development of a large stack -- or platform -- to participate. Rather implementers should be able to pick only the parts of the ecosystem for which they believe they can offer a competitive value.

Identity

A crucial feature of centralized systems is that of being the guardian of identity: You have an arbiter of people's identity and can therefore trust the origin of an update (at least to a degree). But conversely, the closed system also controls that Identity, making it vulnerable to changes in business models.

A decentralized ecosystem must solve the problems of addressing, locating and authenticating data. It should also be flexible enough that it does not suffer from excessive lock-in with any Identity provider

The internet already provides a perfect mechnism for creating a unique names in the user@host scheme most commonly used by email. The host covers locating the authority, while the user identifies the author. Using a combination of convention and DNS SRV records, a simple REST API can provide a universal whois look-up for any author, recipient or mentioned user in status feeds:

GET:/who/joe

{
  _sig: '366e58644911fea255c44e7ab6468c6a5ec6b4d9d700a5ed3a810f56527b127e',
  id: 'joe@droogindustries.com',
  name: 'Joe Smith',
  status_uri: 'http://droogindustries.com/status',
  feed_uri: 'http:/droogindustries.com/status/feed'
}

That solves the addressing.

As shown above, the identity document includes the uri to the canonical status location (as well as the feed location -- more on that difference later). That solves the locating.

Finally, public/private key encryption is used sign the identity document (as well as any status updates), allowing the authenticity of the document to be verified and allowing all status updates to also be authenticated. The public key (or keys -- allowing for creating new keys without loosing authentication ability for old messages) is part of the feed, while the nameservice record is signed by it. And that solves authentication.

Note that name resolution, name lookup and feed hosting are independent pieces. They certainly could be implemented in a single system by any vendor, but do not have to be and provide transparent interoperation. This separation has the side effect of allowing the user control over their data.

Status Feed

The actual status uri, much like a blog uri, is just an HTML page, not the feed document itself. The feed is identified by a link tag in the status page:

<link rel="alternate" type="application/hsf+json" href="http://droogindustries.com/status/feed"/>

The reason for using an HTML page is that it allows you to separate your status page from the place where the data lives. You could plop it on a plain home page that just serves static files and point to a third party that manages your actual feed. It also allows your status page to provide a user friendly representation of your feed.

The feed document, as the content-type betrays, is simply json. While XML is more flexible, that same flexibility (especially via namespaces) has made XML rather broadly reviled by developers. In contrast, JSON maps more easily to commonly used storage engines, is easier to read and write by hand or tooling, and is readily consumed by javascript, making browser consumption a snap.

While the feed provides meta data about the author and the feeds the author follows, the focus is of course status update entries, which have a minimal form of:

{
  _sig: 'aadefbd0d0bc39a062f87107de126293d85347775152328bf464908430712789',
  id: '4AQlP4lP0xGaDAMF6CwzAQ'
  href: 'http://droogindustries.com/joe/status/feed/4AQlP4lP0xGaDAMF6CwzAQ',
  created_at: '2012-07-30T11:31:00Z',
  author: {
    id: 'joe@drooginstustries.com',
    name: 'Joe Smith',
    profile_image.uri: 'http://droogindustries.com/joe.jpg',
    status_uri: 'http://droogindustries.com/joe/status',
    feed_uri: 'http://droogindustries.com/joe/status/feed',
  },
  text: 'Hey #{bob}, current status #{beach} #{vacation}',
  entities: {
    beach: 'http://droogindustries.com/images/beach.jpg',
    bob: 'bob@foo.com',
    vacation '#vacation'
  }
}

In essence, the status network works by massively denormalized propagation of json messages. Even thoug each message has a canonical uri, the network will be storing many copies, realize that updates are basically read-only. There is a mechanism for advisory updates and deletes, but of course there is no guarantee that it such messages will be respected.

As for the message itself, I fully subscribe to the philosophy that status updates need to be short and contain little to no mark-up. Status feeds are a crappy way to have a conversation, and trying to extend them to allow it (looking at you g+) is a mess. In addition, since everybody is storing copies, arbitrary length can become an adoption issue with unreasonable storage requirements. For these reasons, I'm currently setting the size of the message body to be limited to 1000 characters (not bowing to SMS here). For the content format, i'm leaning towards a template format for entity substitution, but also considering a severely limited subset of html.

While feed itself is simply json, there needs to exist an authoring tool if only to be able to push updates to PubSub servers. Since the only publicly visible part of the feed is the json document, the implementers have complete freedom over the experience, just as wordpress, livejournal, tumblr, etc. all author RSS.

Aggregation

If the status feed is about the publishing of ones updates, the aggregators are the maintainers of ones timeline. The aggregator service provides two public functions, receiving updates from subscriptions (described below) and receiving mentions. Both are handled by a single required REST endpoint, which must be published in ones feed.

Aggregators are the the most likely component to grow into applications, as they are the basis for consumption of status. For example, if someone wanted to create mobile applications for status updates, it would need to rely on an aggregator as its backend. Likely an aggregation service provider would integrate the status feed authoring as well and maybe even take on the naming services, since combining them allows for significant economies of scale. The important thing is that these parts do not have to be a single system.

This separation also allows application authors to use a third parties to act as their aggregator. Since aggregators will receive a lot of real-time traffic and are likely to be the arbiters of what is spam, being able to offload this work to a dedicated service provider may be desirable for many status feed applications.

PubSub

A status network relies on timely dissemination of updates, so a push system to followers is a must. Taking a page from PubSubHubub, I propose subscription hubs that feed publishers push their updates into. These hubs in return push the updates to subscribers and mentioned users.

The specification assumes these hubs to be publicly accessible, at least for the subscriber and only requires a rudimentary authentication scheme. It is likely that the ecosystem will require different usage and pricing models with more strenuous authorization and rate-limiting,but at the very least the subscription part will need to be standardized so that aggregators can easily subscribe. It is, however, too early to try to further specialize that part of the spec, and the public subscription mechanism should be sufficient for now.

In addition to the REST spec for publish and subscribe, PubSub services are ideal candidates for other transports, such as RabbitMQ, XMPP, etc. Again, as the traffic needs of the system grow, so will the delivery options, and extension should be fairly painless, as long as the base REST pattern remains as a fallback.

In addition to delivering updates to subscribers, subscription hubs are also responsible for the distribution of mentions. Since each message posted by a publisher already includes the parsed entities and a user's feed and thereby aggregators can be resolved from a name, the hub can deliver mentions in the same way as subscriptions, as POSTs against the aggregator uri.

Since PubSub services know about the subscribers, they are also the repository of followers for a user -- although only available to the publisher, who may opt share this information on their status page.

Discovery

Discovery in existing status networks generally happens in one of the following ways: Reshared entries received from one of the subscribed feeds, browsing the subscriptions of someone you are subscribed to or search (usually search for hashtags).

The first is implicitly implemented by subscribing and aggregating the subscribed feeds.

The second can be exposed in any timeline UI, given the mechanisms of name resolution and feed location.

The last is the only part that is not covered in the existing infrastructure. It comes down to the collection and indexing of users and feeds, which is already publicly available and can even be pushed directly to the indexer.

Discovery of feeds happens via following mentions and re-posts in existing feeds, but does mean that there is no easy way to advertise ones existence to the ecosystem.

All of these indexing/search use cases are opportunities for realtime search services offered by new or existing players. There is little point in formalizing APIs for these, as they will be organic in emergence and either be human digestible or APIs for application developers to integrate with the specific features of the provider.

To name just a few of the unmet needs:

Deep search of only your followers
Indexing of hashtags or other topic identifiers with or without trending
Indexing of name services and feed meta data for feed and user discovery
Registry of feeds with topic meta data

As part of my reference implementation I will set up a registry and world feed (consuming all feeds registered) as a public timeline to give the ecosystem something to get started with.

How it all fits together

As outlined above there are 5 components that in combination create the following workflow:

Users are created by registering a name with a name service and creating a feed
Users create feed entries in their Status Feed
The Status Feed pushes entries to PubSub
PubSub pushes entries to recipients -- subscribers and mentions
Aggregation services collect entries into user timelines
Discovery services also collect entries and expose mechansims to explore the ecosystems data

It's possible to implement all of the above as a single platform that interfacts with the ecosystem via the public endpoints of name service, subscription and aggregation, or to just implement one component and offer it as a service to others that also implement a portion. Either way, the ecosystem as a whole can function wholy decentralized and by adhering to the basic specification put forth herein, implementers can benefit from the rest of the ecosystem.

I believe that sharing the ecosystem will have cumulative benefits for all providers and should discouragw walled garden communities. Sure, someone could set up private networks, but I'm sure an RBL like service for such bad actors would quickly come into existence to prevent them from leeching of the system as a whole.

Who is gonna use it?

The worth of a social network is measured by its size. Clearly there is an adoption barrier to creating a new one. For this reason, I've tried to define a specification as only the simplest thing that could possibly work, given the goals of basic feature parity with existing networks of this type while remaining decentralized, federated and distributed. I hope that by allowing many independent implementations without the burden of a large, complicated stack or any IP restrictions, the specification is easy enough for people to see value in implementing while providing enough opportunities for implementers to find viable niches.

Of course simplicity to implement is meaningless to endusers. While all this composability provides freedom to grow the network, endusers will demand applications (web/mobile/desktop) that bring the pieces together and offer a single, unified experience, hiding the nature of the ecosystem. In the end I see all this a lot like XMPP or SMTP/POP/IMAP. Most people don't anything about it, but use it every day when they use gmail/gtalk, etc.

I'm currently working on a reference implementation in node.js, less to provide infrastructure than to show how simple implementation should be. I was hesitant to post this before having the implementation ready to go, since code talks a lot louder than specs, but I also wanted the opportunity to get feedback on the spec before a defacto implementation enshrines it. I also don't want my implementation to be mistaken as "the platform". We'll see how that approach works out.

In the meantime, the repo, where the code will be found and where the spec is being documented in a less prosaic form, lives at https://github.com/sdether/happenstance.

March 11, 2012
in geek
3 min read

Ducks Unlimited

Most complaints levied against static type systems come down to verbosity and complexity required to express intent, all of which are usually the fault of the language not of type system. Type inference, parametric types, etc. are all refinements to make type systems more fluid and, err, dynamic.

One language capability that reduces a lof of ceremony is statically verifiable duck-typing. While interfaces are the usual approach to letting independent implementations serve the same role, it always bugs me that interfaces put the contract at the wrong end of the dependency, i.e. I have to implement your interface in order for you to recognize that I can quack!

I had hoped that C# 4.0's dynamic keyword would fix this, but this is the best you can do:

public class AVerySpecialDuck {
  public void Quack() {
    Console.Writeline("quack");
  }
}

public class DuckFancier {
  public void GimmeADuck(dynamic duck) {
    duck.Quack();
  }
}

new DuckFancier().GimmeADuck(new AVerySpecialDuck());

Sure, I can now call .Quack() without having to require a concrete type or even an interface, but I also could have provided a rabbit to GimmeADuck without compilation errors. And as the user of GimmeADuck there is no machine discoverable way to determine what it expects.

I solved half of this problem with Duckpond:

public interface IQuacker {
  void Quack();
}

public class DuckFancier {
  public void GimmeADuck(IQuacker duck) {
    duck.Quack();
  }
}

new DuckFancier().GimmeADuck(new AVerySpecialDuck().AsImplementationOf<IQuacker>());

Yes, this is back to interfaces to give us discoverability, but with AsImplementationOf<T>, any class with Quack() could be turned into that interface without ever knowing about it. However, whether I satisfy that contract is still a runtime check.

Duck, Duck, go

What I really want is what go does:

type Quacker interface {
  Quack();
}

func GimmeADuck(Quacker duck) {
  duck.Quack();
}

I.e. true duck-typing. If the type provided satisfies the contract, it can get passed in and the compiler makes sure. Unfortunately, other than this feature go doesn't entice me all that much.

By trait or structure

A language that's a bit more after my own heart than go and has even greater flexibility in regards to duck-typing is scala. It actually provides two different approaches to this problem.

The first is a structural type, which uses function literals to define the contract for an expected argument type:

class AVerySpecialDuck {
  def quack() = println("Quack!")
}

object DuckFancier {
  type Quacker = { def quack() }
  def GimmeADuck(Quacker duck) = duck.quack()
}

DuckFancier.GimmeADuck(new AVerySpecialDuck())

A structural type is simply a contract of what methods a type should have in order to satisfy the constraints. In this case the singleton DuckFancier defines the structural type Quacker as any object with a quack method.

The second way we could have achieved this is with a trait bound to an instance:

class AVerySpecialDuck {
  def quack() = println("Quack!")
}

trait Quacker {
  def quack : Unit
}

object DuckFancier {
  def GimmeADuck(Quacker duck) = duck.quack()
}

val duck = new AVerySpecialDuck extends Quacker
DuckFancier.GimmeADuck(duck);

A trait is basically an Interface with an optional implementation (think Interface plus extension methods), but in this case I'm treating it purely as an interface to take advantage of scala's ability to attach a trait to an instance. This lets me add the contract on when used rather than at class definition.

Duck-typing in statically typed languages allows for syntax that can be concise and still statically verifiable and discoverable, avoiding the usual ceremony and coupling that inheritance models impose. It's a feature I hope more languages adopt and along with type inference is invalidating a lot of the usual gripes levied at static typing.

January 11, 2012
in net, geek
2 min read

Implicits instead of Extension Methods?

After the latest round of scala is teh complex, I started thinking a bit more about the similar roles implicit conversions in scala play to extension methods in C#. I can't objectively comment on whether Extension Methods are simpler (they are still a discoverability issue), but I can comment that the idea of implicit conversion to attach new functionality to existing code really intrigued me. It does seem less obvious but also a lot more powerful.

Then i thought, wait, C# has implicit.. Did they just invent extension methods without needing to? I've never used implicit conversion in C# for anything but automatic casting, but maybe I was just not imaginative enough. So here goes nothing:

public class RichInt {
  public readonly int Value;

  public static implicit operator RichInt(int i) {
    return new RichInt(i);
  }

  public static implicit operator int(RichInt i) {
    return i.Value;
  }

  private RichInt(int i) {
    Value = i;
  }

  public IEnumerable<int> To(int upper) {
    for(int i = Value;i<=upper;i++) {
      yield return i;
    }
  }
}

Given the above I was hoping I could mimic scala's RichInt to() method and write code like this:

foreach(var i in 10.To(20)) {
  Console.WriteLine(i);
}

Alas that won't compile. Implicit conversion only works on assignment, so i had to write

foreach(var i in ((RichInt)10).To(20)) {
  Console.WriteLine(i);
}

So I do have to use an extension method to create To() in C#.

And, yes, i'm grossly simplifying what scala's implicts can accomplish. Also, I wish I could have block scope using statements to import extension methods for just a block the way scala's import allows you to handle local scope implicits.

December 28, 2011
in geek, javascript
3 min read

Node 0.6.x & AWS EC2 Micro troubles

Tried to upgrade node from 0.4.5 to 0.6.x and my my micro kept falling over dead. I know it's an edge case, but it's an annoying set of symptoms that I figured I should post in case someone else runs into the same issue.

tl;dr => It's not a node problem its an AWS kernel issue with old AWS AMIs and Micro instances

So I have a micro that's about a year old, i.e. beta AWS-AMI, but i gather the same problem happens with pretty much every AMI prior to 2011.09. I was running node 0.4.5, but had started using 0.6.4 on my dev and some modules were now dependent on it. Since micro instances go into throttle mode when building anything substantial, i hope to use the build from my dev server. The dev machine is centos, so i crossed my fingers, copied the build over and ran make install. No problem. Then i tried npm install -g supervisor and it locked up. Load shot up, the process wouldn't let itself be killed and i got a syslogd barf all over my console:

Message from syslogd@ at Wed Dec 28 00:58:19 2011 ...
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293407] ------------[ cut here ]------------
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293418] invalid opcode: 0000 [#1] SMP
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293424] last sysfs file: /sys/kernel/uevent_seqnum
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293501] Process node (pid: 1352, ti=e599c000 task=e60371a0 task.ti=e599c000)
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293508] Stack:
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293545] Call Trace:
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293589] Code: ff ff 8b 45 f0 89 ....
ip-\*\*-\*\*-\*\*-\*\* klogd: [  440.293644] EIP: [] exit_mmap+0xd5/0xe1 SS:ESP 0069:e599cf08

So i killed the instance. Figuring it was config diffs between centos and the AMI, i cloned my live server and fired it up as a small to get decent build perf. Tested 0.6.4, all worked, brought it back up as a micro and, blamo, same death spiral. Back to small instance, tried 0.6.6 and and once again as a small instance it worked, but back as a micro it still had the same problem.

Next up was a brand new AMI, build node 0.6.6 and run as micro. Everything was happy. So it must be something that's gotten fixed along the way. Back to the clone and yum upgrade. Build node, try to run, death spiral. Argh! So finally i thought i'd file a ticket with node.js, but first looked through existing issues and found this:

Node v0.6.3 crashes EC2 instance

which pointed me at the relevant Amazon release notes which had this bit in it:

After using yum to upgrade to Amazon Linux AMI 2011.09, t1.micro 32-bit instances fail to reboot.

There is a bug in PV-Grub that affects the handling of memory pages from Xen on 32bit t1.micro instances. A new release of PV-Grub has been released to fix this problem. Some manual steps need to be performed to have your instance launch with the new PV-Grub.

As of 2011-11-01, the latest version of the PV-Grub Amazon Kernel Images (AKIs) is 1.02. Find the PV-Grub AKI's for your given region by running:

ec2-describe-images -o amazon --filter "manifest-location=*pv-grub-hd0_1.02-i386*" --region REGION.

Currently running instances need to be stopped before replacing the AKI. The following commands point an instance to the new AKI:

ec2-stop-instance --region us-east-1 i-#####
ec2-modify-instance-attribute --kernel aki-805ea7e9 --region us-east-1 i-#####
ec2-start-instance --region us-east-1 i-#####.

If launching a custom AMI, add a --kernel parameter to the ec2-run-instances command or choose the AKI in the kernel drop-down of the console launch widget.

Following these instructions finally did the trick and 0.6.6 is happily running on my old micro instance. Hope this helps someone else get this resolved more smoothly.

September 21, 2011
in geek, javascript
3 min read

Whopper of a javascript extension

I consider the start of my programming carreer to be when I learned Genera LISP on Symbolics LISP machines. Sure I had coded in Basic, Pascal and C, and unfortunately Fortran, before this, but it had always just been a hobby. With LISP, I got serious about languages, algorithms, etc.

Genera LISP had its own object system called Flavors, much of which eventually made it into CLOS, the Common Lisp Object System. Flavors had capabilities called Wrappers and Whoppers, which provided aspect oriented capabilities before that term was even coined. Both achieved fundamentally the same goals, to wrap a function call with pre and post conditions, including preventing the underlying function call from occuring. Wrappers achieved this via LISP macros, i.e. the calls they wrapped were compiled into new calls, each call using the same wrapper sharing zero code. Whoppers did the same thing except dynamically, allowing the sharing of whopper code, but also requiring at least two additional function calls at runtime for every whopper.

So what's all this got to do with javascript? Well, yesterday I got tired of repeating myself in some CPS node coding and just turn my continuation into a new continuation wrapped with my common post condition, and so I wrote the Whopper capability for javascript. But first a detour through CPS land and how it can force you to violate DRY.

CPS means saying the same thing multiple times

So in a normal synchronous workflow you might have some code like this:

function getManifest(refresh) {
  if(!refresh && _manifest) {
    return _manifest;
  }
  var manifest = fetchManifest();
  if(!_manifest) {
    var pages = getPages();
    _manifest = buildManifest(pages);
  } else {
    _manifest = manifest;
    if(refresh) {
      var pages = getPages();
      updateManifest(pages);
    }
  }
  saveManifest(manifest);
  return _manifest;
};

But with CPS style asynchrony you end up with this instead:

function getManifest(refresh, continuation, err) {
  if(!refresh && _manifest) {
    continuation(_manifest);
    return;
  }
  fetchManifest(function(manifest) {
    if(!_manifest) {
      getPages(function(pages) {
        _manifest = buildManifest(pages);
        saveManifest(_manifest,function() {
          continuation(_manifest);
        });
      }, err);
      return;
    }
    _manifest = manifest;
    if(refresh) {
      getPages(function(pages) {
        updateManifest(pages);
        saveManifest(_manifest,function() {
          continuation(_manifest);
        });
      }, err);
    } else {
      saveManifest(_manifest,function() {
        continuation(_manifest);
      });
    }
  }, err);
};

Because the linear flow is interrupted by asynchronous calls with callbacks, our branches no longer converge, so the common exit condition, saveManifest & return the manifest, is repeated 3 times.

While I can't stop the repetition entirely, I could at least reduce it by capturing the common code into a new function. But even better, how about I wrap the original continuation with the additional code so that I can just call the continuation and it runs the save as a precondition:

function getManifest(refresh, continuation, err) {
  if(!refresh && _manifest) {
    continuation(_manifest);
    return;
  }
  continuation = continuation.wrap(function(c, manifest) { saveManifest(manifest, c); });
  fetchManifest(function(manifest) {
    if(!_manifest) {
      getPages(function(pages) {
        _manifest = buildManifest(pages);
        continuation(_manifest);
      }, err);
      return;
    } else {
    _manifest = manifest;
    if(refresh) {
      getPages(function(pages) {
        updateManifest(pages);
        continuation(_manifest);
      }, err);
    } else {
      continuation(_manifest);
    }
  }, err);
};

Finally! The wrap function...

What makes this capture possible is this extension to the Function prototype:

Object.defineProperty(Function.prototype, "wrap", {
  enumerable: false,
  value: function(wrapper) {
    var func = this;
    return function() {
      var that = this;
      var args = arguments;
      var argsArray = [].slice.apply(args);
      var funcCurry = function() {
        func.apply(that, args);
      };
      argsArray.unshift(funcCurry);
      wrapper.apply(that, argsArray);
    };
  }
});

It rewrites the function as a new function that when called will call the passed wrapper function with a curried version of the original function and the arguments passed to the function call. This allows us to wrap any pre or post conditions, including pre-conditions that initiate asynchronous calls themselves, and even lets the wrapper function inspect the arguments that the original function will be passsed (assuming the wrapper decides to call it via the curried version.

  continuation = continuation.wrap(function(c, manifest) {
    saveManifest(manifest, c);
  });

The above overwrite the original continuation with a wrapper version of itself. The wrapper is passed c, the curried version of the original function and the argument that continuation is called with, which we know will be the manifest. The wrapper in turn calls the async function saveManifest with the passed manifest and passes the curried continuation as its continuation. So when we call continuation(_manifest), first saveManifest is called which then calls the original continuation with the _manifest argument as well.

September 13, 2011
in geek
1 min read

Building emacs-23 on AWS Linux AMI to get jade-mode

So there I was trying to get jade-mode running and it kept dropping into fundamental mode. The error it gave in *Messages* was:

File mode specification error: (void-function whitespace-mode)

Problem was i was running emacs-22.2.3, the latest in centos and thereby the AWS Linux AMI. And whitespace-mode, the major mode that jade-mode and stylus-mode rely on requires emacs-23. Fine, tell me what repo to add and let me get on with my life. What, no rpm's?

That's where i usually draw the line and say to myself that it's something that i probably don't need. Over the years, i've developed an aversion to building from source, if only because then i have software on my machine that other rpm's can't count on as pre-requisites. But this time, that wasn't gonn fly. I wanted jade-mode!

As my usual build recipies go, this is what i had to do on an existing AWS Linux AMI, so some of the yum deps are missing. Don't worry, when you run ./configure it'll bitch and it'll usually be some *-devel package. So here goes building from source:

yum -y install ncurses-devel
cd /tmp
wget http://ftp.gnu.org/pub/gnu/emacs/emacs-23.3a.tar.gz
tar zxf emacs-23.3a.tar.gz
cd emacs-23.3
./configure --prefix=/usr/local --with-xpm=no
make
make install

Done!

May 25, 2011
in geek, promise
3 min read

Implementing "exports" in C

The other day i was musing about improving readability by liberating verbs from the containing objects that own them. Similar things are accomplished with mix-ins or traits in other languages, but I wanted to go another step further and allow for very context specific mapping, rather than just mixing ambigiously named methods from other objects. Since the exports construct looked a lot like a map of expressions, I decided to see what replicating the behavior with existing C# would look like.

To review, this is what I want (in C#-like pseudo code + exports):

class PageWorkflow {
  UserService _userService exports { FindById => FindUserById };
  PageService _pageService exports {
    FindById        => FindPageById,
    UpdatePage(p,c) => Update(this p,c)
  };
  AuthService _authService exports {
    AuthorizeUserForPage(u,p,Permissions.Write) => UserCanUpdatePage(u,p)
  };

  UpdatePage(userid, pageid, content) {
    var user = FindUserById(userid);
    var page = FindPageById(pageid);
    if(UserCanUpdatePage(user,page)) {
      page.Update(content);
    } else {
      throw;
    }
  }
}

And this is C# implementation of the above:

public class PageWorkflow {

    private readonly Func<int, User> FindUserById;
    private readonly Func<int, Page> FindPageById;
    private readonly Action<Page, string> Update;
    private readonly Func<User, Page, bool> UserCanUpdatePage;

    public PageWorkflow(IUserService userService, IPageService pageService, IAuthService authService) {
        FindUserById = (id) => userService.FindById(id);
        FindPageById = (id) => pageService.FindById(id);
        Update = (page, content) => pageService.UpdatePage(page, content);
        UserCanUpdatePage = (user, page) => authService.AuthorizeUserForPage(user, page, Permissions.Write);
    }

    public void UpdatePage(int userid, int pageid, string content) {
        var user = FindUserById(userid);
        var page = FindPageById(pageid);
        if(UserCanUpdatePage(user, page)) {
            Update(page, content);
        } else {
            throw new Exception();
        }
    }
}

As I mentioned, it's all possible, short of the context sensitive extension method on Page. Lacking extension methods, I was going to name the imported method UpdatePage, but since it would be a field, it conflicts with the UpdatePage workflow method despite functionally having different signatures.

All in all, the public workflow UpdatePage is pretty close to what I had wanted, but the explicit type declaration of the each exports makes it boilerplate that is likely not worth the trouble of writing, and, no, i won't even consider code generation.

This exercise along with every other language feature I've dreamt up for Promise does illustrate one thing quite clearly to me: My ideal language should provide programatic access to its parser so that the syntax of the language can be extended by the libraries. Internal DSLs are a nice start with most languages, but often they fall short of being able to reduce default boilerplate and just create new boilerplate. Sure, designing the language to be more succinct is desirable, but if anything, that only covers what the language creators could imagine. Being able to tweak a language into a DSL for the task at hand, a la MPS, seems a lot more flexible.

Is this added flexibility worth the loss of a common set of constructs that is shared by all programmers knowing language X? It certainly could be abused to become incomprehensible, but I would suggest that even knowing language X joining any Team working on a project of sufficient complexity adds its own wealth of implicit patterns and constructs and worse than a language whose compiler is extended, these constructs are communicated via comments and documentation that are not part of language X, i.e. the compiler and IDE lack the ability to aid someone learning these constructs.

Considering this, I really need to do a survey of languages that already offer this capability as well as take a closer look at MPS and Nemerle to see if the language I want is just a few parser rules a way from an existing meta programming language.

May 23, 2011
in geek, promise
5 min read

Of Workflows, Data and Services

This is yet another in my series of posts musing about what my ideal language would look like. This one is about readability.

Most code I write these days seems to utilize three types of classes: Data, Services and Workflow.

Data Classes

These are generally POCO object hierarchies with fields/accessors but minimal logic for manipulating that data. They should not have any dependencies. If an operation on a data object has a dependency, it's really a service for that data. Data objects don't get mocked/stubbed/faked, since we can just create and populate them.

Service Classes

These are really containers for verbs. The verbs could have been methods on the calling object, but by pulling them into these containers we enable a number of desirable capabilities:

re-use -- different workflows can use the same logic without creating their own copy
testing -- by faking the service we get greater control over testing different responses from the service
dependency abstraction -- there might be a number of other bits of logic that have to be invoked in order to provide the work the verb does but isn't a concern of the workflow
organization -- related verbs

Workflow Classes

These end up being classes, primarily because in most OO languages everything's a class, but really workflow classes are organizational constructs used to collected related workflows as procedural execution environments. They can be set up with pre-requisites and promote code re-use via shared private members for common sub-tasks of the workflow. They are also responsible for condition and branching logic to do the actual work.

Actions (requests from users, triggered tasks, etc.) start at some entry point method on a workflow object, such as a REST endpoint, manipulate data via Data objects using services, the results of which trigger paths defined by the workflow.

Same construct, radically different purposes

Let's look how this works out for a fictional content management scenario. I'm using a C#-like pseudo syntax to avoid unecessary noise (ironic, since this post is all about readibility):

class PageWorkflow {
  ...
  UpdatePage(userid, pageid, content) {
    var user = _userService.FindById(userid);
    var page = _pageService.FindById(pageid);
    if(_authService.AuthorizeUserForPage(user,page,Permissions.Write)) {
      _pageService.UpdatePage(page,content);
    } else {
      throw;
    }
  }
}

UpdatePage is part of PageWorkflow, i.e a workflow in our workflow class. It is configured with _userService, _pageService and _authService as our service classes. Finally user and page are instances of our data classes. Nice for maintainability and separation of concerns, but awkward from a readibility perspective. It would be much more readable with syntax like this:

class PageWorkflow {
  ...
  UpdatePage(userid, pageid, content) {
    var user = FindUserById(userid);
    var page = FindPageById(pageid);
    if(UserCanUpdatePage(user,page)) {
      page.Update(content);
    } else {
      throw;
    }
  }
}

Much more like we think of the flow. Of course this could easily be done by creating those methods on PageWorkflow, but that's the beginning of the end of building a god object, and don't even get me started on putting Update on the Page data object.

Importing verbs

So let's assume that this separation of purposes is desirable -- i'm sure there'll be plenty of people who will disagree with that premise, but the premise isn't the topic here. What we really want to do here is alias or import the functionality into our execution context. Something like this:

class PageWorkflow {
  UserService _userService exports { FindById => FindUserById };
  PageService _pageService exports {
    FindById        => FindPageById,
    UpdatePage(p,c) => Update(this p,c)
  };
  AuthService _authService exports {
    AuthorizeUserForPage(u,p,Permissions.Write) => UserCanUpdatePage(u,p)
  };
  ...
}

Do not confuse this with VB's or javascript's with keywords. Both import the entirety of the referenced object into the current scope. The much maligned javascript version does this by importing it into the global namespace, which, given the dynamic nature of those objects, makes the variable use completely ambiguous. While VB kept scope ambiguity in check by forcing a . (dot) preceeding the imported object's members, it is a shorthand that is only questionably more readable.

The above construct is closer to the @EXPORT syntax of the perl Exporter module. Except instead of exporting functions, exports exports methods on an instance as methods on the current context. It also extends the export concept in three ways:

Aliasing

Instead of just blindly importing a method from a service class, the exports syntax allows for aliasing. This is useful because imported method likely defered some of its functionality context to the owning class and could collide with other imported methods, e.g. FindById on PageService and UserService.

Argument rewriting

As the methodname is rewritten, the argument order may no longer be appropriate, or we may want to change the argument modifiers, such as turn a method into an extension method.

UpdatePage(p,c) => Update(this p,c)

The above syntax captures arguments into p and c and then aliases the method into the current class' context and turns it into a method attached to p, i.e. the page, so that we can call page.Update(content)

Currying

But why stop at just changing the argument order and modifiers. We're basically defining expressions that translate the calls from one to the other, so why shouldn't we be able to make every argument an expression itself?

AuthorizeUserForPage(u,p,Permissions.Write) => UserCanUpdatePage(u,p)

This syntax curries the Permissions.Write argument so that we can define our aliases entrypoint without the last argument and instead name it to convey the write permissions implcitly.

Writing workflows more like we think

Great, some new syntactic sugar. Why bother? Well, most language constructs are some level of syntactic sugar over the raw capabilities of the machine to let us express our intend more clearly. Generally syntactic sugar ought to meet two tests: make code easier to read and more compact to write.

The whole of the import mechanism could easily be accomplished (except maybe for the extension method rewrite) by creating those methods on PageWorkflow and calling the appropriate service members from there. The downside to this approach is that the methods are not differentiated from other methods in the body of PageWorkflow therefore not easily recognizable as aliasing constructs. In addition the setup as wrapper methods is syntactically a lot heavier.

The exports mechanism allows for code to be crafted more closely to how we would talk about accomplishing the task without compromising on the design of the individual pieces or tying their naming and syntax to one particular workflow. It is localized to the definition of the service classes and provides a more concise syntax. In this way it aids the readibility as well as theauthoring of a common task.