It’s just data

Constrained Data and Interoperability

Tim Bray: Any standard that tries to constrain the way in which data, once received, is processed, is broken.

All generalizations are false, including this one.

OK, that was too easy.  So, let’s try again: HTTP is “broken” as it constrains what GET means vs, say, POST or PUT or DELETE.  Without such constraints, web crawlers simply wouldn’t be possible.  But perhaps Tim’s rule can be fixed by saying that there is an exception for headers vs bodies?  Nah, let’s not go there.  Lets go find an example that deals with content.

HTML5 is clearly broken by Tim’s definition.  And while it may go too far in places, I can say that there are definitely many areas where that definition is a good thing.  I wouldn’t have agreed with that statement a few years ago, but I do now.  Enthusiastically.  But to explain why, I need to first back up.

Like most people, I learned HTML via view-source.  I learned how to produce tables by looking at examples.  Such as this one.  Tables have rows, rows have data cells.  One can use a th elements instead of td elements when you want headers.  I’ve seen some people replace the tr elements with thead elements, but it didn’t seem to make much difference, so I didn’t always follow that practice.

People who have learned by viewing my source may have learned similar lessons, I guess.

I turns out that this is wrong.  And is wrong in a way that affects plugins like tablesorter.  Tables have bodies, and possibly heads and foots, and captions and whatnot; but tables have no rows, at least not as immediate children.  And if you don’t include tbody elements, the browser will insert the necessary elements for you (assuming you are using text/html like any sane person would).  This means that things “just work” even if you learned the lesson I “learned” and tried to use this plugin.

Of course, this plugin didn’t work for me, at least not at first, as I didn’t include initially include tbody elements and do serve my content as application/xhtml+xml.  No biggie, easily fixed.

Now lets look at this from a browser vendor perspective.  If you want things to “just work” you need to know this.  And this is just one small example.  There are many more, and they lead to an alternate postulate, namely:

If what you want is interoperability, a DOM, and JavaScript, then you need the mapping of stream-o-bytes to a tree-structure to be completely well defined.

This conclusion continues to be controversial, but as I indicated, I have been convinced.  And furthermore, as my tbody example shows, this goes well beyond well-formedness.  It has to do with validity too.

And, again, I won’t dispute the possibility that there are other areas where HTML5 has gone too far.  But even if a long list of such areas is produced, you can’t prove a negative with any number of examples.

Therefore, the more interesting question is:

When are constraints useful for achieving interoperability?

P.S.  HTML5 allows tr elements as direct children of table elements.  It is quite instructive to look at under which conditions it allows such.