intertwingly

It’s just data

Is my weblog well formed?


The W3C validator says it not only is well formed, but valid.  I also run a nightly cron job which validates the pages served for that day against the xhtml DTDs provided.  I also serve the content using the XHTML mime type to browsers which support it, which causes Mozilla at least to be ultra-strict about well formedness.

But can I be sure?

Based on these two tests, it is my conclusion that at the present time that the requirement that all XML parsers reject all non-well formed documents that they are presented with as a Platonic ideal... something that perhaps can be aspired to, but something that is rarely if ever seen in the real world.

So, in a nihilist sense, no, I can not be sure.  I'm relying on imperfect tools to reassure me that I am doing it right.

- - -

Brent and Nick have both stated their intent to reject feeds that are not well formed.  They, too, will undoubtedly be relying on inperfect tools to implement this policy.  So, they too, can never be quite sure.  Frankly, I don't see how a requirement that Atom consumers either make up for the inadequacies of whatever XML parser they chose to use by either providing a front end filter or by writing their own parser would substantially improve the situation.

So, why are they doing this?  My intuition tells me that this is based on a sincere desire to move from a world in which producers need to conform to whatever a predominance of other consumers may happen to accept and into a world in which there is a single clear definition as to what is acceptable.

It also appears to be a response to the growing recognition that liberal parsing strategies are a slippery slope, and a not particularly evolutionary stable strategy.  My next (and final) test will explore this a bit further.

- - -

If you accept that perfection can't ever be achieved, the next question to face is whether such policies will substantially improve the quality of inbound feeds or if it will in fact cause mass defection of users to other tools or formats.  Or both.

Given the data presented so far, I don't see conclusive evidence of the oft repeated claim that feed parsers that aren't liberal will be a significant competitive disadvantage.  Sharpreader is (fairly) conservative, has competition, and seems to be doing fine.

Whether ill formed feeds exist because early aggregators were liberal or whether many of today's aggregators are liberal because of the existence of ill formed feeds is an imponderable.  Both are likely to be true.  The key ingredient that appears to be lacking to break this vicious circle is an an effective feedback loop.

I do agree that in an abstract sense that the efforts organized by Syndic8 and the existence of the feedvalidator are the "right" way to address the problem of well formedness, but these efforts to date do not appear to be sufficient.

I have hopes that the courageous and noble stands being made by Brent and Nick will make a difference.  And that the end result will benefit Luke and Dare and others that wish to employ "real" XML parsers.  This is because, contrary to popular belief, there are exceptions to Postel's "law".

Note: nothing in this endorsement should be construed to imply that an aggregator needs to be abrasive or abusive in their application of this policy.  I may be biased, but I do like SharpReader's approach of linking to the feedvalidator first, and providing the email address for feedback to the aggregator author second.

- - -

One thing that needs to be said is that this needs to be a voluntary action on the part of aggregator authors.  Each tool author needs to be free to modify, and potentially reverse entirely, their stated policy based on feedback that they receive.  Without feeling that they are somehow letting down the Atom community.  As tool authors, their first responsibilities are not to the producers or to the spec writers, but to their user base.