A case for XML

XML gets maligned a lot. It's enterprisey, bloated, overly complex, etc. And the abuses visited upon it, like trying to express flow control or whole DSLs in it or being proposed as some sort of panacea for all interop problems only compound this perception. But as long as you treat it as what it is, data storage, I generally can find little justification to use something else. Not because it's the best, but because it's everywhere.

If you are your own consumer and you want a more efficient data storage, just go binary already. If you're not, then I bet your data consumers are just tickled that they have to add another parser to their repository of data ingestors. Jim Clark probably put it best when he said:

"For the payload format, XML has to be the mainstay, not because it's technically wonderful, but because of the extraordinary breadth of adoption that it has succeeded in achieving. This is where the JSON (or YAML) folks are really missing the point by proudly pointing to the technical advantages of their format: any damn fool could produce a better data format than XML."

Ok, I won't get religious on the subject, but mostly wanted to give a couple of examples, where the abilities and the adoption of XML have been a godsend for me. All this does assume you have a mature XML infrastructure. If you're dealing with XML via SAX or even are doing the parsing and writing by hand, then you are in a world of hurt, I admit. But unless it's a memory constraint there really is no reason to do that. Virtually every language has an XML DOM lib at this point.

I love namespaces

One feature a lot of people usually point to when they decry XML to me is namespaces. They can be tricky, i admit, and a lot of consumers of XML don't handle them right, causing problems. Like Blend puking on namespaces that weren't apparently hardcoded into its parser. But very simply, namespaces let you annotate an existing data format without messing with it.

<somedata droog:meta="some info about somedata">
  <droog:metablock>And a whole block of extra data</droog:metablock>
</somedata>

Here's the scenario. I get data in XML and need to reference metadata for processing further down the pipeline. I could have ingested the XML and then written out my own data format. But that would mean I'd have to also do the reverse if I wanted to pass the data along or return it after some modifications and I have to define yet another data format. By creating my own namespace, I am able to annotate the existing data without affecting the source schema and I can simply strip out my namespace when passing the processed data along to someone else. Every data format should be so versatile.

Transformation, Part 1: Templating

When writing webapps, there are literally dozens of templating engines and there's constantly new ones emerging. I chose to learn XSLT some years back because I liked how Cocoon and AxKit handled web pages. Just create your data in XML and then transform it using XSLT according to the delivery needs. So far, nothing especially unique compared to other templating engines. Except unlike most engines, it didn't rely on some program creating the data and then invoking the templating code. XSLT works with dynamic Apps as easily as with static XML or third party XML without having.

Since those web site roots, I've had need for email templating and data transformation in .NET projects and was able to leverage the same XSLT knowledge. That means I don't have to pick up yet another tool to do a familiar task just a little differently.

What's the file format?

When I first started playing with Xaml, I was taking Live For Speed geometry data and wanted to render it in WPF and Silverlight. Sure, I had to learn the syntax of the geometry constructs, but I didn't have to worry about figuring out the data format. I just used the more than familiar XmlDocument and was able to concentrate on geometry, not file formats.

Transformation, Part 2: Rewriting

Currently I'm working with Xaml again for a Silverlight project. My problem was that I had data visualization in Xaml format (coming out of Illustrator), as well as associated metadata (a database of context data) and I needed to attach the metadata to the geometry, along with behavior. Since the first two are output from other tools I needed a process that could be automated. One way would be to walk the Visual tree once loaded, create a parallel hierarchy of objects containing the metadata and behavior and attach their behavior to the visual tree. But i'd rather have the data do this for itself.

<Canvas x:Name="rolloverContainer_1" Width="100" Height="100">
  <!-- Some geometry data -->
</Canvas>

<!-- becomes -->

<droog:RolloverContainer x:Name="rolloverContainer_1" Width="100" Height="100">
  <!-- Some geometry data -->
</droog:RolloverContainer>

So I created custom controls that subclassed the geometry content containers. I then created a post-processing script that simply loaded the Xaml into the DOM and rewrote the geometry containers as the appropriate custom controls using object naming as an identifying convention. Now the wiring happens automatically at load, courtesy of Silverlight. Again, no special parser required, just using the same XmlDocument class I've used for years.

And finally, Serialization

I use XML serialization for over the wire transfers as well as data and configuration storage. In all cases, it lets me simply define my DTOs and use them as part of my object hierarchy without ever having to worry about persistence. I just save my object graph by serializing it to XML and rebuild the graph by deserializing the stream again.

I admit that this last bit does depend on some language dependent plumbing that's not all that standard. In .NET, it's built in and let's me mark in my objects with attributes. In Java, I use Simple for the same effect. Without this attribute driven mark up, I'd have to walk the DOM and build m objects by hand, which would be painful.

Sure, for data, binary serialization would be cheaper and more compact, but that misses the other benefits I get for free. The data can be ingested and produced by a wide variety of other platforms, I can manually edit it, or easily build tools for editing and generation, without any specialized coding.

For my Silverlight project, I'm currently using JSON as my serialization layer between client and server, since there currently is no XmlSerializer or even XmlDocument in Silverlight 1.1. It, too, was painless to generate and ingest and, admittedly, much more compact. But I then I added this bit to my DTO:

List<IContentContainer> Containers = new List<IContentContainer>();

It serialized just fine, but then on the other end it complained about there not being a no-argument constructor for IContentContainer. Ho Hum. Easily enough worked around for now, but I will be switching back to XML for this once Silverlight 2.0 fleshes out the framework. Worst case, I'll have to build XmlSerializerLitem, or something like that, myself.

All in all, XML has allowed me to do a lot of data related work without having to constantly worry about yet another file format, or parser. It's really not about being the best format, but about it virtually being everywhere and being supported with a mature toolchain across the vast majority of programming environment and that pays a lot of dividents, imho.