ILoggable

A place to keep my thoughts on programming

 Subscribe

geekblog
[at]
claassen [dot] net

Powered by Blogger

Wednesday, November 28, 2007

Searching a Tree of Objects with Linq

I've finally had legitimate use for LINQ to Objects, not just to make the syntax cleaner, but also to simplify the underlying code and provide me a lot of flexibility without significant work.

The scenario

I have a tree of objects that have both a type and a name. The name is unique, the Type is not. The interface is this:

public interface INode
{
  int Id { get; }
  string Name { get; set; }
  string Type { get; set; }
  List<INode> Children { get; }
}

I want to be able to find a single named Node in the tree and I want to be able to retrieve a collection of all nodes for a particular type. The searchable interface could be expressed as this:

public interface ISearchableNode : INode
{
  INode FindByName(string name);
  IEnumerable<INode> FindByType(string name);
}
Both require me to walk the tree and examine each node, so clearly I just want to have one walk routine and generically evaluate the node in question. In C# 2.0 parlance, that means I could pass an anonymous delegate into my generic find routine and have it recursively iterate through all the children. I also pass along a resultset to be populated.

The signature for the evaluation delegate looks like this:

delegate bool FindDelegate(INode node);
but since I'm using C# 3.0 (i.e. .NET 3.5) I can use lambda expressions to avoid creating a delegate and simplify my syntax. Instead of FindDelegate, I can simply use Func<INode,bool>:
// Instead of this:
private void Find(INode node, List<INode> resultSet, FindDelegate findDelegate);
// called like this for a Name search:
Find(this, resultSet, delegate(INode node) { return node.Name == name; });

// I can use this:
private void Find(INode node, List<INode> resultSet, Func<INode, bool> f)
// called like this:
Find(this, resultSet, node => node.Name == name);

Thus giving me the following implementation for ISearchableNode:

public INode FindByName(string name)
{
  List<INode> resultSet = new List<INode>();
  Find(this, resultSet, x => x.Name == name);
  return resultSet.FirstOrDefault();
}

public IEnumerable<INode> FindByType(string type)
{
  List<INode> resultSet = new List<INode>();
  Find(this, resultSet, x => x.Type == type);
  return (IEnumerable<INode>)resultSet;
}

private void Find(INode node, List<INode> resultSet, Func<INode, bool> f)
{
  if (f(node))
  {
    resultSet.Add(node);
  }
  foreach (INode child in node.Children)
  {
    Find(child, resultSet, f);
  }
}

Problem solved, move on... Well, except there is significant room for improvement. Here are the two main issues that ought to be resolved:

  1. Syntax is limited to two types of searches and exposing the generic find makes for an ugly syntax. It would be much nicer if queries to the tree could be expressed in LINQ syntax.
  2. It's also inefficient for the Name search, since I'm walking the entire tree, even if the first node matched the criteria.

LINQ to Hierarchical Data

In order to use LINQ to objects, I need to either create a custom query provider or implement IEnumerable. The latter is significantly simpler and could be expressed using the following interface:

public interface IQueryableNode : IEnumerable<INode>
{
}
Ok, ok, I don't even need an interface, I could just implement IEnumerable<INode>... But what does that actually mean? In the simplest sense, I'm iterating over the node's children, however, I also with do descend into the children's children and so on. So a simple foreach won't do. I could just do the same tree walking with a resultset as I did above and return the Enumerator of the resulting list to implement the interface, but C# 2.0 introduced a much more useful way to implement non-linear Iterators, i.e. the yield keyword. Instead of building a list to be interated over, yield let's the iterating code return values as they are found, which means it can be used for recursive iteration. Thus the GetEnumerator is implemented simply as follows
#region IEnumerable<Node> Members
public IEnumerator<INode> GetEnumerator()
{
  yield return this;
  foreach (Node child in Children)
  {
    foreach (Node subchild in child)
    {
      yield return subchild;
    }
  }
}
#endregion

#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
  return this.GetEnumerator();
}
#endregion
Nice and simple and ready for LINQ.

Searching for all Nodes of a type becomes

var allBar = from n in x
                where n.Type == "bar"
                select n;
foreach (Node n in allBar)
{
  // do something with that node
}
and the search for a specifically named node becomes
INode node = (from n in x
              where n.Name == "i"
              select n).FirstOrDefault();
But the real benefit of this approach is that I don't have hard-coded search methods, but can express much more complex queries in a very natural syntax without any extra code on the Node.

Deferred execution

As it turns out, using yield for the recursive iteration also solved the second issue. As yield returns values as it encounters them during iteration, the search doesn't happen until the query is executed. And one of the side effects of LINQ syntax is that creating a query does not execute it until the result set is iterated over. Therefore, FirstOrDefault() actually short-circuits the query as soon as the first match (and in case of Name, it's going to be the only match) is hit.

Labels: , , , , ,

Thursday, November 22, 2007

Threading: Mail.app vs. Thunderbird

I generally prefer message forums over mailing lists for community discussions because of the better separation of topics and implicit threading. Not that forums are ideal for threads, since they generally cannot easily spawn sub-threads.

So, when reading discussions on mailing lists, I try to use the threading mode of the mail client to bring some clarity to the discussion and filter out discussions I don't care about. I am a big Imap proponent and read mail on PC with Thunderbird and Mail.app on Mac. Overall i like Mail.app better but sometimes it does exhibit the Mac app tendency of "if you don't like the way we do things, well, sucks for you", vs. Thunderbird's more liberal configurability.

I'll leave the whole subject of how well clients group messages into threads.... Ok, just one stab at that subject. Subject of "Hey" is not too uncommon. So all messages of "Hey", which are almost guaranteed to be unrelated, get lumped into a thread together. Happens in both readers. I know that threading in mail is ad-hoc, so I am not faulting the clients. It's just a silly artifact.

But here's some behavior that I find not only unintuitive but downright tedious:

On both Mail.app and Thunderbird, i can collapse and expand threads by using left and right arrows. So when I decide that a discussion is not of interest to me, i just collapse the thread and hit delete. On Mail.app, the thread is deleted, as I'd expect. On Thunderbird, the current message in the thread is deleted and the next message becomes the head of the thread. So to delete a thread i have to open the thread, select all messages manually and then delete. Bah!

I've dug around the config and even advanced config, but can't find a way to change this behavior.

Labels: , ,

Tuesday, November 20, 2007

Stupid ExtensionMethod tricks

I have yet to decide whether Extension Methods in C# are a boon or bane. I've already several times, been frustrated by Intellisense not showing me a method that was legal somewhere else, until I could figure out what using statement brought that extension method into scope. On one hand Extension Methods can be used to simplify code, on the other, I see them as the source of much confusion as they become popular.

Worse yet, they have potential for more about than the blink tag, imho.

The one place I see extension methods being instrumental is in defining fluent interfaces, yet another practice I have yet to decide whether I am in favor of or not. Partially, because I don't see them as intrinsically easier to read. Partially because they allow for much syntactic abuse.

So today, I created a fluent interface for an operation that I wish was just support in the language in the first place -- the between operator. It exists in some SQL dialects and is a natural part of so many math equations. I wish I could just write:

if( 0.5 < x < 1.0 )
{
  // do something
}
Instead, I'll settle for this:
if( x.Between(0.5).And(1.0) )
{
  // do something
}
The first part is easy, it's just an Extension Method on double. And if I just had it take the lower and upper bound, then we would have been done. But this is where the fluent interface bug bites me and I want to say And. This means, that Between can't return a boolean. It needs to return the result of the lower bound test and the value to be tested. That means that Between returns a helper class, which has one method And, which finally returns the boolean value.
public static class DoubleExtensions
{
  
  public static BetweenHelper Between(this double v, double lower)
  {
    return new BetweenHelper(v > lower, v);
  }

  public struct BetweenHelper
  {
    public bool passedLower;
    public double v;

    internal BetweenHelper(bool passedLower, double v)
    {
      this.passedLower = passedLower;
      this.v = v;
    }

    public bool And(double upper)
    {
      if (passedLower && v < upper)
      {
        return true;
      }
      else
      {
        return false;
      }
    }
  }
}
That's a lot of code for a simple operation and it's still questionable whether it really improves readability. But it is a common enough operation if you have a lot of bounds checking, that it might be worth throwing into a common code dll. I've yet to make up my mind, I mostly did it because i wanted to play with the syntax.

Labels: , ,

Saturday, November 03, 2007

One week of Windows on Macbook

Spent this past week doing development in XP under VMWare Fusion. I hooked up a windows keyboard and mouse when I'm stationary and when in the Windows Space, there was no way to tell that wasn't on a native machine. I actually had a harder time on the Mac side, since I had to remeber to hit the Windows key to get the normal Apple-Command behavior. The latest thing I tried, that I seriously didn't expect to work was hook up my HTC Apache. As much trouble as I've had with ActiveSync on my desktop machine, I figured that either the USB connection wouldn't even be seen by Windows or that ActiveSync just bombed. But instead, ActiveSync found the phone and synced everything. Now I even have Standard Time with me on a laptop instead of my old setup of Desktop and phone for time tracking.

Labels: , , ,