intertwingly

It’s just data

Trust, but verify some more


Last week, I created a nightly job to verify that my inputs are clean, well formed XML.  That took care of my inputs, but it didn't verify the process by which the web pages were created.

I've since added some code to verify that each of the pages in the cache of pages served in the past 24 hours are well formed and valid XHTML.  This uncovered an interesting boundary case that I hadn't considered.

Specifically, this blog entry.  Notice that the title has two consecutive dashes in it.  Seem inocuous?  Well, the title is repeated in the trackback metadata, and the trackback metadata is contained in an XML comment, and consecutive dashes are illegal in the body of an XML comment.

Unfortunately, since the W3C validator doesn't allow trackback metadata to be directly nested in the XHTML, I will continue to place this information inside a comment.  So, in the case I happen to have consecutive dashes inside a title, I now replace the dashes with numeric character references.