XML indentation in .Net

Author Tim

Upon integration of Chorus into WeSay and more specifically in using a DVCS (specifically Mercurial) to manage WeSay's XML encoded .lift file, the exact format of that file has taken on a new importance. Because Mercurial (and Chorus at this point) uses a standard line diffing tool to express the difference between two revisions, line breaks, indentations and other white space have suddenly become an issue where they normally are not in XML documents.
As it turns out formatting XML in .net is not entirely trivial. Though the XMLWriter and XMLReader as well as their respective xmlWriterSettings and XmlReaderSettings have various switches for enabling and disabling indentation and linebreaking on attributes, these are bound together by subtle interactions which I hope to clarify in this post.

First some background:
The indentation and whitespace in a given XML file can be of interest for at least two reasons:
– line differs typically care about whitespace and the name itself bears witness to the importance of newlines.
– readability. It's much easier for a human being to read a nicely formatted XML file.

In WeSay the .lift file is frequently created from two seperate files. First, a valid .lift file and secondly an xml fragment file. Each time an entry is added or modified these two files are merged to form the new .lift file. For this reason we are interested in the interaction between an XmlReader and an XmlWriter that outputs said readers data.

As an example we will use some very simple XML rather than an actual lift entry as that construct is unnecassarily complex for this discussion.
Here are the two source files:
File 1:

               

File 2:

                

Here is our envisioned result:

                                    

All of these files have been written with indentation and new lines for each element and attribute. This makes for ok readability and should keep our diff files nice and small.

The first thing we are going to try is to see what happens when we start with completely unformatted input files, default readers and a default writer:
File 1:

File 2:

And the code goes something like this:
XmlReaderSettings readerSettings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};

XmlWriterSettings writerSettings = new XmlWriterSettings
{
ConformanceLevel = ConformanceLevel.Document
};

XmlReader reader = XmlReader.Create(stream0, readerSettings);
XmlWriter writer = XmlWriter.Create(stream1, writerSettings);

while (!reader.EOF)
{
writer.WriteNode(reader, true);
}

With these settings the resulting file looks like this:

Just one long line… pretty much the worst case possible for a line diffing tool and for reading. So let's spruce it up a bit and add some formatting to the writer by changing the WriterSettings a bit:XmlWriterSettings writerSettings = new XmlWriterSettings
{
Indent = Indent,
NewLineOnAttributes = newLineOnAttribute,
ConformanceLevel = ConformanceLevel.Document
};
Resulting file:

                                    

*sigh*… beautiful.
But being geeks we just can't halp but fix something that ain't broke. So inspite of this beautiful result we now want to try and fix the the source file. This isn't entirely unreasonable considering you may want to look at the source file while debugging and it would be nice if it were a bit more legible. So just for kicks, let's see what happens when we put a single line break in the source file.. say after the element.

File 1:

Resulting File:

                          

?!!? what happened?! Not only do we have a line break after the element, but also the

element and the closing element are not indented!!
This brings us to our first interesting observation: Whitespace in a source document causes the writer to ignore it's Indent Attribute until the containing element of the whitespace (in our case ) is closed. This is true of whitespace such as “spaces” as well. Here is the resulting file if I substitute the newline of our last example with a simple space:

                           

Interestingly, you'll notice that this is not the case for the NewLineOnAttribute Property of the XmlWriterSettings. This is even more interesting when you consider that this property is ignored UNLESS the Indent Property is TRUE. here it is straight from the horses mouth (i.e. MSDN): This setting has no effect when the Indent property value is false.

Ok.. so we've established that whitespace is an issue. The easiest way to get around this is to instruct the reader to ignore whitespace so that the writer doesn't get too clever on us:

XmlReaderSettings readerSettings = new XmlReaderSettings
{

IgnoreWhitespace = true,

ConformanceLevel = ConformanceLevel.Fragment
};

So now we are back on track and looking good! To celebrate, let's tell the world how happy we are! Let's write a string into our first file that will proclaim our joy! Of course we will do this without spaces.. just in case.

File 1:

I'mSoHappy

Resulting file:

      I'mSoHappy                    

Arrrgh! It did it again!!! So here is observation number two: A text node in a source document causes the writer to ignore it's Indent Attribute until the containing element of the text node (in our case ) is closed.

Finally, WeSay uses an XPathNavigator in some places and in the course of my testing I noticed that XmlWriter.WriteNode() behaves slightly different when it is passed an XPathNavigator rather than an XmlReader. Specifically, it seems to always ignore whitespace. So passing an XmlReader (with IgnoreWhitespace = false) to WriteNode for the first file and an XPathDocument for the second file where the files look like this:
File 1:

File 2:

Results in a final file looking like this:

                          

Here's an outline of the code:

XmlReaderSettings readerSettings = new XmlReaderSettings
{
ConformanceLevel = ConformanceLevel.Fragment
};

XmlWriterSettings writerSettings = new XmlWriterSettings
{
ConformanceLevel = ConformanceLevel.Document
};

XmlReader reader = XmlReader.Create(stream0, readerSettings);
XmlReader reader2 = XmlReader.Create(stream0, readerSettings);
XmlDocument document = new XmlDocument();
document.Load(reader2);
XmlWriter writer = XmlWriter.Create(stream1, writerSettings);

while (!reader.EOF)
{
writer.WriteNode(reader, true);
}
writer.WriteNode(document.CreateNavigator(), true);

Note that this is the case even when you create an XmlDocument from an XmlReader with IgnoreWhistespace = false.

So that about wraps it up. This was not meant to be an exhaustive study of all the Xml- Reader/Writer/Document/WrietSettings/ReaderSettings/XPathNavigator interactions so if you find anything else unusual or that I grossly misrepresented something please feel free to let me know!


Purchase Discounted Replica Chanel Shoes

Replica Chanel Bags In Usa

Purchase Discounted Replica Chanel Bags In Canada

Chanel Handbags In Australia
Discounted Chanel In Canada

Purchase Discounted Replica Chanel In Usa

Purchase Discounted Fake Chanel Handbags In Uk

Discounted Fake Chanel Shoes In Ireland

Purchase Designer Replica Chanel In Ireland

Discounted Designer Replica Chanel Bags In Uk
Fake Chanel

Discounted Fake Chanel Bags

Purchase Discounted Designer Replica Chanel Bags In Ireland

Purchase Designer Replica Chanel Handbags In Usa
Chanel In Australia
Purchase Discounted Designer Replica Chanel Shoes In Ireland

Fake Chanel Handbags In Australia
Purchase Fake Chanel In Uk

Designer Replica Chanel Shoes In Canada
Designer Replica Chanel Bags In Ireland

Replica Chanel Bags In Canada

Purchase Discounted Designer Replica Chanel Handbags In Usa
Replica Chanel Bags In Uk

Purchase Discounted Chanel Handbags In Usa

Purchase Discounted Designer Replica Chanel In Ireland

Purchase Discounted Fake Chanel Bags In Australia

Designer Replica Chanel Handbags In Usa
Purchase Discounted Designer Replica Chanel Bags In Canada

Purchase Replica Chanel Handbags

Sale Windows 7 Ultimate

debt counseling corp

zp8497586rq

Reader's Comments

  1. cambell |

    Dare I ask about the consistent / equivalent behaviour on Mono?

  2. Tim |

    Always the troublemaker… glad you asked of course 🙂
    It seems that mono behaves identically to .net in this case except for one exception (see addendum above). But an equally large issue that I should have caught before did rear its ugly head and that is the difference between newline characters in Linux and Windows land. I changed liftIO to always output “\r\n” for newlines so all should be well now. Glad we caught that.