Extension method

Searching a Tree of Objects with Linq, Revisited

A while back, I wrote about searching through a tree using linq to objects. That post was mostly snippets of code about delegates, lambda’s, yield and how it applies to linq — more a technical exploration than an example. So I thought I’d follow it up with concrete extension methods to make virtually any tree searchable by Linq.

Linq, IEnumerable<T>, yield

All that is required to search a tree with Linq is creating a list of all nodes in the tree. Linq to Objects can operate on IEnumerable<T>. Really, Linq to objects is a way of expressing operations we’ve been doing forever in loops with if/else blocks. That means there isn’t any search magic going on, it is a linear traversal of all elements in a set and examining each to determine whether it matches our search criteria.

To turn a tree into a list of node we need to walk and collect all children of every node. A simple task for a recursive list that carries along a list object to stuff every found node into. But there is a better way, using yield to return each item as it is encountered. Now we don’t have to carry along a collection. Iterators using yield implement a pattern in which a method can return more than once. For this reason, a method using yield in C# must return an IEnumerable, so that the caller gets a handle to an object it can traverse the result of the multiple return values.

IEnumerable is basically an unbounded set. This is also the reason why unlike collections, it does not have a Count Property. It is entirely possible for an enumerator to return an infinite series of items.

Together IEnumerable<T> and yield are a perfect match for our problem, i.e. recursively walking a tree of nodes and return an unknown number of nodes.

Two types of Tree Traversal

Depth First

In depth-first traversal, the algorithm will dig continue to dig down a nodes children until it reaches a leaf node (a node without children), before considering the next child of the current parent node.

Breadth First

In breadth-first traversal, the algorithm will return all nodes at a particular depth first before considering the children at the next level. I.e. First return all the nodes from level 1, then all nodes from level 2, etc.

Tree to IEnumerable<T> Extension methods

public static class TreeToEnumerableEx
{
 public static IEnumerable<T> AsDepthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
 {
  yield return head;
  foreach (var node in childrenFunc(head))
  {
   foreach (var child in AsDepthFirstEnumerable(node, childrenFunc))
   {
    yield return child;
   }
  }
 }

 public static IEnumerable<T> AsBreadthFirstEnumerable<T>(this T head, Func<T, IEnumerable<T>> childrenFunc)
 {
  yield return head;
  var last = head;
  foreach(var node in AsBreadthFirstEnumerable(head,childrenFunc))
  {
   foreach(var child in childrenFunc(node))
   {
    yield return child;
    last = child;
   }
   if(last.Equals(node)) yield break;
  }
 }
}

This static class provides two extension methods that can be used on any object, as long as it’s possible to express a function that returns all children of that object, i.e. the object is a node in some type of tree and has a method or property for accessing a list of its children.

An Example

Let’s use a hypothetical Tree model defined by this Node class:

public class Node
{
 private readonly List<Node> children = new List<Node>();

 public Node(int id)
 {
  Id = id;
 }

 public IEnumerable<Node> Children { get { return children; } }

 public Node AddChild(int id)
 {
  var child = new Node(id);
  children.Add(child);
  return child;
 }

 public int Id { get; private set; }
}

Each node simply contains a list of children and has an Id, so that we know what node we’re looking at. The AddChild() method is a convenience method so we don’t expose the child collection and no node can ever be added as a child twice.

The calling convention for a depth-first collection is:

IEnumerable<Node> = node.AsDepthFirstEnumerable(n => n.Children);

The lambda expression n => n.Children is the function that will return the children of a node. It simply states given n, return the value of the Children property of n. A simple test to verify that our extension works and to show us using the extension in linq looks like this:

[Test]
public void DepthFirst()
{
 // build the tree in depth-first order
 int id = 1;
 var depthFirst = new Node(id);
 var df2 = depthFirst.AddChild(++id);
 var df3 = df2.AddChild(++id);
 var df4 = df2.AddChild(++id);
 var df5 = depthFirst.AddChild(++id);
 var df6 = df5.AddChild(++id);
 var df7 = df5.AddChild(++id);

 // find all nodes in depth-first order and select just the Id of each node
 var IDs = from node in depthFirst.AsDepthFirstEnumerable(x => x.Children)
        select node.Id;

 // confirm that this list of IDs is in depth-first order
 Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}

For breadth-first collections, the calling convention is:

IEnumerable<Node> = node.AsBreadthFirstEnumerable(n => n.Children);

Again, we can test that the extension works like this:

[Test]
public void BreadthFirst()
{
 // build the tree in breadth-first order
 var id = 1;
 var breadthFirst = new Node(id);
 var bf2 = breadthFirst.AddChild(++id);
 var bf3 = breadthFirst.AddChild(++id);
 var bf4 = bf2.AddChild(++id);
 var bf5 = bf2.AddChild(++id);
 var bf6 = bf3.AddChild(++id);
 var bf7 = bf3.AddChild(++id);

 // find all nodes in breadth-first order and select just the Id of each node
 var IDs = from node in breadthFirst.AsBreadthFirstEnumerable(x => x.Children)
       select node.Id;

 // confirm that this list of IDs is in depth-first order
 Assert.AreEqual(new int[] { 1, 2, 3, 4, 5, 6, 7 }, IDs.ToArray());
}

Searching Trees

The tree used in the example is of course extremely simple, i.e. it doesn’t even have any worthwhile data to query attached to a node. But these extension methods could be used on a node of any kind of tree, allowing the full power of Linq, grouping, aggregation, sorting, projection, etc. to be used on the tree.

As a final note, you may wonder, why bother with depth-first vs. breadth first? After all, in the end we do examine every node! There is however one particular case where the choice of algorithm can be very important: You are looking for one match or a particular number of matches. Since we are using yield, we can terminate the traversal at any time. Using the FirstOrDefault() extension on our Linq expression, the traversal would stop as soon as one match is found. And if have any knowledge where that node might be in the tree, the choice of search algorithm can be a significant performance factor.

IEnumerable.ForEach()

I keep wanting to do ForEach() on a collection, and noticed that it was inconsistent whether that extension method was available or not. I always figured that i wasn’t importing the right namespace, and blamed ReSharper for not being smart enough to figure out what namespace I was missing. So I finally googled the problem only to find this. Turns out ForEach() isn’t on IEnumerable, only on Array and List. Meh. But thanks to Dave, i now have it in my core assembly.

By arne on | .net | A comment?
Tags: , ,

Epoch DateTime conversion Extension Methods

Interop with unix often requires dealing with Epoch or Unix time, or the number of seconds since January 1st, 1970 UTC. And if you throw java in the mix, then epoch becomes milliseconds, since they define System.currentTimeMillis() as the number milliseconds since the Epoch. Anyway, i figured, this was the perfect use of Extension methods. Well, almost… There’s still the issue that getting a DateTime object from seconds requires a static method added to DateTime, which extension methods do not currently support. This means that instead of

DateTime utcDateTime = DateTime.FromEpochSeconds(seconds)

we have to be content with

DateTime utcDateTime = DateTimeEx.FromEpochSeconds(seconds)

Now i also added extensions on long and int but, really, that falls into the realm of stupid extension method tricks. I left them in there, only because they are part of DateTimeEx, therefore will only be available if the appropriate namespace is included and so is at least tangentially relevant in the current scope. Well, that’s my rationalization, at least. With this extra extension method you now can do

DateTime utcDateTime = 1205003848.DateTimeFromEpochSeconds();

The one thing to be aware of with all these helpers is that it always deals with UTC time, i.e. the DateTime that you convert to epoch time needs to have a DateTimeKind that is not Unspecified. Conversely, the DateTime you get back is UTC and if you want to deal with it with localtime, you need to call ToLocalTime() on it first.

Anyway, here’s the class:

using System;

namespace Droog.DateTime
{
  public static class DateTimeEx
  {
    public static DateTime FromEpochMilliseconds(long milliseconds)
    {
      return new DateTime(1970, 1, 1,0,0,0,DateTimeKind.Utc).AddMilliseconds(milliseconds);
    }

    public static DateTime FromEpochSeconds(int seconds)
    {
      return FromEpochMilliseconds((long)seconds * 1000);
    }

    public static DateTime DateTimeFromEpochMilliSeconds(this long milliseconds)
    {
      return FromEpochMilliseconds(milliseconds);
    }

    public static DateTime DateTimeFromEpochSeconds(this int seconds)
    {
      return FromEpochSeconds(seconds);
    }

    public static int ToEpochSeconds(this DateTime dt)
    {
      return (int)(ToEpochMilliseconds(dt)/1000);
    }

    public static long ToEpochMilliseconds(this DateTime dt)
    {
      return (long)(dt.ToUniversalTime() - new DateTime(1970, 1, 1)).TotalMilliseconds;
    }
  }
}
By arne on | .net | A comment?
Tags: , , ,

Stupid ExtensionMethod tricks

I have yet to decide whether Extension Methods in C# are a boon or bane. I’ve already several times, been frustrated by Intellisense not showing me a method that was legal somewhere else, until I could figure out what using statement brought that extension method into scope. On one hand Extension Methods can be used to simplify code, on the other, I see them as the source of much confusion as they become popular.

Worse yet, they have potential for more about than the blink tag, imho.

The one place I see extension methods being instrumental is in defining fluent interfaces, yet another practice I have yet to decide whether I am in favor of or not. Partially, because I don’t see them as intrinsically easier to read. Partially because they allow for much syntactic abuse.

So today, I created a fluent interface for an operation that I wish was just support in the language in the first place — the between operator. It exists in some SQL dialects and is a natural part of so many math equations. I wish I could just write:

if( 0.5 < x < 1.0 )
{
  // do something
}

Instead, I’ll settle for this:

if( x.Between(0.5).And(1.0) )
{
  // do something
}

The first part is easy, it’s just an Extension Method on double. And if I just had it take the lower and upper bound, then we would have been done. But this is where the fluent interface bug bites me and I want to say And. This means, that Between can’t return a boolean. It needs to return the result of the lower bound test and the value to be tested. That means that Between returns a helper class, which has one method And, which finally returns the boolean value.

public static class DoubleExtensions
{

  public static BetweenHelper Between(this double v, double lower)
  {
    return new BetweenHelper(v > lower, v);
  }

  public struct BetweenHelper
  {
    public bool passedLower;
    public double v;

    internal BetweenHelper(bool passedLower, double v)
    {
      this.passedLower = passedLower;
      this.v = v;
    }

    public bool And(double upper)
    {
      if (passedLower && v < upper)
      {
        return true;
      }
      else
      {
        return false;
      }
    }
  }
}

That’s a lot of code for a simple operation and it’s still questionable whether it really improves readability. But it is a common enough operation if you have a lot of bounds checking, that it might be worth throwing into a common code dll. I’ve yet to make up my mind, I mostly did it because i wanted to play with the syntax.