I just had a lovely object lesson in lazy evaluation of Iterators. I wanted to have method that would return an enumerator over an encapsulated set after doing some sanity checking:
public IEnumerable<Subscription> Filter(Func<Subscription, bool> filter) {
if(filter == null) {
throw new ArgumentNullException("filter","cannot execute with a null filter");
}
foreach(var subInfo in _subscriptions.ToArray()) {
Subscription sub;
try {
var subDoc = XDocFactory.LoadFrom(subInfo.Path, MimeType.TEXT_XML);
sub = new Subscription(subDoc );
if(filter(sub) {
continue;
}
} catch(Exception e) {
_log.Warn(string.Format("unable to retrieve subscription for path '{0}'", subInfo.Path), e);
continue;
}
yield return sub;
}
}
I was testing registering a subscription in the repository with this code:
IEnumerable<Subscription> query;
try {
query = _repository.Filter(handler);
} catch(ArgumentException e) {
return;
}
foreach(var sub in query) {
...
}
And the test would throw a ArgumentNullException because handler was null. What? But, but i clearly had a try/catch around it! Well, here's where clever bit me. By using yield, the method had turned into an enumerator instead of a method call that returned an enumerable. That means that the method body would get squirreled away into an enumerator closure that would not get executed until the first MoveNext(). And that in turn meant that my sanity check on handler didn't happen at Filter() but at the first iteration of the foreach.
Instead of doing "return an Iterator for subscriptions", I needed to do "check the arguments" and then "return an Iterators for subscriptions" as a separate action. This can be accomplished by factoring the yield into a method called by Filter() instead of being in Filter() itself:
public IEnumerable<Subscription> Filter(Func<Subscription, bool> filter) {
if(filter == null) {
throw new ArgumentException("cannot execute with a null filter");
}
return BuildSubscriptionEnumerator(Func<Subscription, bool> filter);
}
public IEnumerable<Subscription> BuildSubscriptionEnumerator(Func<Subscription, bool> filter) {
foreach(var subInfo in _subscriptions.ToArray()) {
Subscription sub;
try {
var subDoc = XDocFactory.LoadFrom(subInfo.Path, MimeType.TEXT_XML);
sub = new Subscription(subDoc );
if(filter(sub) {
continue;
}
} catch(Exception e) {
_log.Warn(string.Format("unable to retrieve subscription for path '{0}'", subInfo.Path), e);
continue;
}
yield return sub;
}
}
Now the sanity check happens at Filter() call time, while the enumeration of subscription still only occurs as its being iterated over, allowing for additional filtering and Skip/Take additions without having to traverse the entire possible set.
Lambdas in Promise, like other languages, are anonymous functions and first-class values. They act as closures over the scope they are defined in and can capture the free variables in their current lexical scope. Promise uses lambdas for all function definitions, going further even than javascript which has both named and anonymous functions. In Promise there are no named functions. Just slots that lambdas get assigned to for convenient access.
Straddling the statically/dynamically typed divide by allowing arguments and return values to optionally declare a Type, Promise mimicks C# lambda syntax more than say, LISP, javascript, etc. A simple lambda example looks like this:
var i = 0;
var incrementor = { ++i; };
print incrementor(); // => 1
This declaration doesn't have any input arguments to declare, so it uses a shortform of simply assigning a block to a variable. The standard form uses the lambda operator =>, so the above lambda could just as well be written as:
var incrementor = () => { ++i; };
I'm currently debating whether I really need the =>. It's mostly that i'm familiar with the form from C#. But given that there are no named functions, parentheses followed by a block can't occur otherwise, so there is no ambiguity. So, i'm still deciding whether or not to drop it:
var x = (x,y) => { x + y };
// vs.
var x = (x,y) { x + y };
The signature definition of lambdas borrows heavily from C#, using a left-hand side in parantheses for the signature, followed by the lambda operator. Input arguments can be typed, untyped or mixed:
var untypedAdd = (x,y) => { x + y; };
var typedAdd = (Num x, Num y) => { x + y; };
var mixedtypeAdd = (Num x, y) => { x + y; };
In dynamic languages, lambda definitions do not need a way to express whether they return a value–there is no type declaration so whether or not to expect a value is convention and documentatio driven. In C# on the other hand, a lambda can either not return a value, a void method, which uses one of the Action delegates, or return a value and declare it as the last type in the declaration using the Func delegates. In Promise all lambdas return a value, even if that value is nil (more about the special singleton instance nil later). Values can be returned by explicitly using the return keyword, otherwise it defaults simply to the value of the last statement executed before exiting the closure. Since return values can be typed, we need a way to declare that Type. Unlike C#, our lambdas aren't a delegate signature, so instead of reserving the last argument Type as the return Type, which would be ambiguous, Promise uses the pipe '|' character to optionally declare a return type:
var returnsUntyped = (x,y) => { x + y; };
var returnsTyped = (x,y|Num) => { x + y; };
var explicitReturn = (|Num) => { returnsTyped(1,2); };
Lambdas can also declare default values for arguments, which can be simple values or expressions:
var simple = (x=2,y=3) => { x + y; };
var complex= (x=simple()) => { x; };
Promise supports three different method calling styles. The first is the standard parentheses style as shown above. In this style, optional values can only be used by leaving out trailing arguments like this:
var f = (x=1,y=2,z=3) => { x + y +z; };
print f(2,2,2); // => 6
print f(2,2); // => 7
print f(4); // => 9
print f(); // => 6
If you want to omit a leading argument, you have to use the named calling style, using curly brackets, which was inspired by DekiScript. The curly bracket style uses json formatting, and since json is a first-class construct in Promise, calling the function by hand with {} or providing a json value behaves the same, allowing for simple dynamic argument construction:
print f{y: 1}; // => 5
print f{z: 1, y: 1}; // => 3
var args = {z: 5};
print f args; // => 8
Finally there is the whitespace style, which replaces parentheses and commas with whitespace. This style exists primarily to make DSL creation more flexible:
print f 2 2 2; // => 6 print f 2 2; // => 7 print f 4; // => 9 print f; // => 6
Note the final call simply uses the bare variable f. This is possible because in Promise a lambda requiring no arguments can take the place of a value and accessing the variable executes the lambda. Sometimes it's desirable to access or pass a reference to a lambda, not execute it, in which case the reference notation '&' is needed. Using reference notation on a value is harmless (at least that's my current thinking), since Promise has no value types, so the reference of a value is the value:
var x = 2;
var y = () => { x+10; };
var x2 = &x;
var y2 = &y;
var y3 = y;
x++;
print x2; // => 3;
print y2; // => 13;
print y3; // => 12;
The output of y3 is only 12, because assignment of y3 evaluated y, capturing x before it was incremented.
As mentioned above, Lambdas can capture variables from their current scope. Scopes are nested, so a lambda can capture variables from any of the parent scopes
var x = 1;
var l1 = (y) => {
return () => { x++; y++; x + y; };
};
print l1(2); // => 5
print l1(2); // => 6
print x; // => 3
Similar to javascript, a block is not a new scope. This is done primarily for scoping simplicity, even if it introduces some side-effects:
() => {
var x = 5;
if( x ) {
var x = 5; // illegal
var y = 10;
}
return y; // legal since the variable declaration was in the same scope
};
As I've said that lambdas are the basic building block, meaning there is no other type of function definition. You can use them as lazily evaluated values, you can pass them as blocks to be invoked by other blocks and as I will discuss next time, Methods are basically polymorphic, named slots defined inside the closure of the class (i.e. capturing the class definition's scope), which is why there is no need for explicitly named functions.
This is a post in an ongoing series of posts about designing a language. It may stay theoretical, it may become a prototype in implementation or it might become a full language. You can get a list of all posts about Promise, via the Promise category link at the top.
This may be ancient news, but I just came across an article that strongly implied that the reason .NET came about was because Sun didn’t like Microsoft’s addition of delegates to J++. That surely is a condensation of events, but it’s certainly an interesting yarn.
I got into this whole subject because I needed an object to subscribe to the state change of another object in my java project. And I didn’t want to create a tight coupling between otherwise unrelated objects. In C#, I would have created an event for the state change and have the second object subscribe to it. Done.
Alas, I’m in java and don’t have delegates, e.g. no events in the way i’m used to. I remembered dealing with creating lots of anonymous inner classes during some long ago experiences with swing programming, which I always found to be less than transparent in presentation. So I figured I’d do some digging to see how delegation and events should be handled in java. I don’t like seeing getFoo()/setFoo() from someone not used to C# properties, and I don’t want to subject someone else to my C#-ing of java code in return.
I started googling java, delegates and inner classes and came across a plethora of interesting articles, including the obligatory C# is better better than java because it has delegates and java kicks C#’s ass because isn’t littered with atrocious syntactic sugar like delegates when inner classes will do variety. Of these, the most entertaining was a rather testy condemnation of delegates in J++ in the form of a white paper on Sun’s site. Considering the stance Sun has taken over the years on java, i.e. binding the language, runtime and philosophy into a single indivisible unit, having MS try subvert the language with what to many looks like a procedural programming throwback certainly could have been a significant motivator for the lawsuit that revoked Microsoft’s java license. Looking at C# and the CLR, Microsoft obviously saw a lot of things it liked in the java language and the jvm. So the result that .NET came about because Sun rejected delegates doesn’t seem too far fetched and my favorite version of the story so far.
Now, back to the problem at hand, how does one do delegation in java and the answer does appear to be with inner classes functioning as callbacks. This certainly does the trick. But that’s a bunch of code and interfaces to create which in the end doesn’t improve the readability of the code. As an illustration, here is the C# code and the java code I created to get the same effect. Note: I didn’t need the extra information that C# events provide, i.e. the event source and event arguments, so I left them out of the java version to keep the code more concise. I also didn’t do any checking if there are subscribers, etc — read: this is an illustration not production code
// C# Publisher public class Publisher { // create the event, which implicitly gives us add/delete subscribers public event EventHandler someAction; public void DoAction() { Console.WriteLine("Start action"); //implictly call all subscribers someAction(this, EventArgs.Empty); Console.WriteLine("End action"); } }
// java Publisher public class Publisher { private List<EventHandler> subscribers = new ArrayList<EventHandler>(); public void subscribeToAction(EventHandler notifier) { subscribers.add(notifier); } public void doAction() { System.out.println("Start action"); for( EventHandler subscriber : subscribers ) { subscriber.handle(this); } System.out.println("End action"); } }
//C# Subscriber public class Subscriber { private string name; public Subscriber(string name) { this.name = name; } public void AttachToPublisher(Publisher publisher) { // subscribe to the event. This creates a closure for this particular // instance of Subscriber. publisher.someAction += new EventHandler(RespondToAction); } void RespondToAction(object sender, EventArgs e) { Console.WriteLine("Responding to action for '" + name + "'"); } }
// java Subscriber public class Subscriber { private String name; public Subscriber(String name) { this.name = name; } public void attachToPublisher(Publisher publisher) { // create a new anonymouse instance of the EventHandler // as a closure for this instance of Subscriber publisher.subscribeToAction(new EventHandler() { public void handle(Publisher publisher) { respondToAction(); } } ); } private void respondToAction() { System.out.println("Responding to action for '" + name + "'"); } }
In C# this is just built in plumbing. In java we create a simple interface that our anonymous inner class will implement:
public interface EventHandler { void handle(Publisher publisher); }
Now we exercise the code:
Publisher p = new Publisher();
Subscriber s1 = new Subscriber("abc");
Subscriber s2 = new Subscriber("xyz");
s1.AttachToPublisher(p);
s2.AttachToPublisher(p);
p.DoAction();
The java code is virtually identical just with different casing for code style and both produce this output:
Start action Responding to action for 'abc' Responding to action for 'xyz' End action
Now add lost of different events and unsubscribing of events, plus more complex EventHandlers and the amount of code you end up writing quickly becomes significant. If there is one thing object oriented programming encourages us to do is to take repetitive code patterns and formulate reusable objects. Plenty of people in the java community have created delegate-like helpers that make delegation easier to read and maintain than simple inner classes, my favorite so far being Alex Winston’s strongly typed approach.
So are delegates just syntactic sugar or a throw-back to procedural coding? For those with only a cursory understanding of delegates, they do just look like function pointers, like C, or at best, type-safe function pointers. But just like inner classes they create instance specific closures, plus they throw in functionality for handling multi-casting and handling synchronous and asynchronous invocation of the closure. I, at least, think delegates as a first-class citizen of the runtime make life easier, improve readability and do not detract from the object oriented nature of the surrounding code.
Just after starting to play with closures in javascript (to fake delegates), i run across an excellent series of articles on closures and anonymous functions in C# 2.0, complete with pitfalls. Cool stuff…
The implementation of anonymous methods in C# and its consequences