When cloning isn't faster than copy via serializer

Yesterday I removed Subzero's dependency on Metsys.Little. This wasn't because I have any problem with it. On the contrary, it's a great library and provides really easy serialization and deserialization. But since the whole point of Subzero is to provide very lightweight data passing, i figured cloning is preferable.

My first pass was purely a "get it working" attempt and I knew that I left a lot of performance on the table by reflecting over the type each time i cloned. Didn't think it would be this bad tho. I used two classes for my test, a Simple and a Complex object:

public class Simple {
    public int Id { get; set; }
    public string Name { get; set; }
}

public class Complex {
    public Simple Owner { get; set; }
    public IList<Simple> Friends { get; set; }
}

The results were sobering:

Simple Object:
  Incubator.Clone: 142k/sec
  BinaryFormatter:  50k/sec
  Metsys.Little:   306k/sec
  ProtoBuf:        236k/sec

Complext Object:
  Incubator.Clone: 37k/sec
  BinaryFormatter: 15k/sec
  Metsys.Little:   44k/sec
  ProtoBuf:        80k/sec

As expected, BinaryFormatter is the worst and my Incubator.Clone beats it. But clearly, I either need to do a lot of optimizing, or not bother with cloning, because Metsys.Little and ProtoBuf are far superior. My problem was reflection, so I took a look at MetSys.Litte's code, since it had to do the same things as clone + reading/writing binary. This lead me to "Reflection : Fast Object Creation" by Ziad Elmalki and "Dodge Common Performance Pitfalls to Craft Speedy Applications" by Joel Pobar, both of which provide great insight on how to avoid performance problems with Reflection.

The resulting values are

Simple Object:
  Incubator.Clone: 458k/sec

Complext Object:
  Incubator.Clone: 123k/sec

That's more like it. 1.5x over MetSys.Little on simple and 1.5x over Protobuf on complex. I still have to optimize the collection cloning logic, which should improve complex objects, since a complex, collection-less graph resulted in this:

Complext Object:
  Incubator.Clone: 226k/sec
  BinaryFormatter:  20k/sec
  Metsys.Little:   113k/sec
  ProtoBuf:         82k/sec

Metsys.Little pulled back ahead of ProtoBuf, but Incubator pulled ahead enough that it's overall ratio is now 2x. So, i should have some more performance to squeeze out of Incubator.

The common lesson from all this is that you really need to measure. Things that "ought to be fast" may turn out to be disappointing. But equally important, imho, is get things working simply first, then measure to see if optimizing is needed. Just as bad as assuming it's going to be fast is assuming that it's going to be slow and prematurely optimizing.