ILoggable

A place to keep my thoughts on programming

January 8, 2011 .net, geek , ,

Materializing an Enumerable

Yesterday I posted the question "Is there a way to Memorize or Materialize an IEnumerable?" on stackoverflow, hoping that there was already a built in way in the BCL. The answers and comments showed, that there wasn't but also challenged my existing assumptions as well as illustrated that materializing and/or memorizing could be interpreted in a number of ways. I figured that amount of ambiquity required a deeper dive into the subject.

What's this for anyway?

I try to use IEnumerable<T> as the return value for any method that is supposed to return a sequence meant purely for consumption. I choose IEnumerable<T> over an array or list because T[] exposes an unneeded implementation details while returning IList<T> or ICollection<T> allow modification of the sequence which is almost always undesirable behavior. And that doesn't even address that the enumerable might be a stream of items coming from an external source like a database cursor, a file stream or from executing a linq AST.

The drawback of this is that making multiple calls on an IEnumerable<T> that enumerate it under the hood may either incur a large cost, in the case of executing a linq AST repeatedly, or fail, in the case of stream or cursor. In order to be able to do something like the below, you really want to be certain that you have a finite sequence to query:

if(enumberable.Any() {
  foreach(var item in enumerable) {
    ...
  }
} else {
  ...
}

.Any() has to get an enumerator and call .MoveNext() once to see if it returns true and foreach, of course, gets the enumerator and iterates over it until the end. In order to safely write the above code, you really want the IEnumerable<T> converted into a computed collection.

The usual solution is to just call either .ToList() or .ToArray() and be done with it. But both have undesirable side-effects. Both will always create a new copy of the collection, which may have a non-insignifcant cost. And both change the type from IEnumerable<T>. Sure you can cast it back, but because both are not idempotent, casting to IEnumerable<T> hides the only clue that you don't want to call .ToList()/.ToArray() again. In addition, .ToList() also produces a mutable collection.

Most of the time, none of these side-effects are significant detractors, but should you return the memorized version from a method, you probably would want to cast it back to IEnumerable<T> and then the cost of this behavior can start to add up. Having a method that lets you memorize or materialize in an idempotent fashion would be useful.

Memorize()

What is the expected behavior of .Memorize()? It should capture the current state of sequence at the time of call and return an immutable sequence and it should force that sequence into memory so that multiple enumerations are relatively cheap. This one is fairly simple to implement:

public static IEnumerable<T> Memorize<T>(this IEnumerable<T> enumerable) {
  return enumerable.GetType().IsArray ? enumerable : enumerable.ToArray();
}

Arrays are already immutable sequences, so we can use them reliably as our memorized collection. And if the source already is an array, we can safely return it unmodified. Now we can pass the resultant enumerable arround without concern that someone else calling .Memorize() again needlessly copies it.

Materialize()

Unlike .Memorize(), .Materialize() does not imply that the enumerable becomes a private, immutable copy. It only wants to make certain that the type can be safely enumerated. This lesser requirement actually complicates the idempotency scenario, requiring a internmediate collection class to be created:

public static class LinqEx {

    public static IEnumerable<T> Materialize<T>(this IEnumerable<T> enumerable) {
        if(enumerable is MaterializedEnumerable<T> || enumerable.GetType().IsArray) {
            return enumerable;
        }
        return new MaterializedEnumerable<T>(enumerable);
    }

    private class MaterializedEnumerable<T> : IEnumerable<T> {
        private readonly ICollection<T> _collection;
        public MaterializedEnumerable(IEnumerable<T> enumerable) {
            _collection = enumerable as ICollection<T> ?? enumerable.ToArray();
        }

        public IEnumerator<T> GetEnumerator() {
            return _collection.GetEnumerator();
        }

        IEnumerator IEnumerable.GetEnumerator() {
            return GetEnumerator();
        }
    }
}

The purpose of MaterializedEnumerable<T> is as marker for a previous materialization that can wrap or coerce a collection, so that no unnecessary copying is done.

A word on the use of .ToArray() instead of .ToList(): I've always leaned towards .ToArray(), both because it creates an immutable collection and because I thought arrays to be more lightweight than lists. After cracking them both open in Reflector, it became apparent that they should be about the same and confirmed that there is no significant difference with some simple tests. 

While memorize and materialize have subtly different meaning, both intending to optimize access to an enumerable idempotently, in day to day use simply using .ToArray() will usually be just fine.

1 to “Materializing an Enumerable”

Trackbacks/Pingbacks

  1. […] Materializing an Enumerable […]

  1. More on .ToList() vs. .ToArray() says...

    […] Materializing an Enumerable […]

Leave a comment