More Honoured in the Breach
Joshua Allen: the spec could easily be modified to be just as useful without breaking so many feeds
Surely, then, somewhere in Microsoft there are sufficient resources to apply to improving the spec? Particularly if it could be done so easily?
A few notes:
- re: well-formed is a very specific definition; this definition can only be meaningfully applied if the encoding of the document is known. Otherwise different observers could come to different and inconsistent conclusions as to whether a given document is well formed or not.
- re: the platform is consistent; it is always possible
to take a subset of data points and draw this conclusion.
However, when viewed in an entirely different context, you may end
up drawing another — entirely different —
conclusion. In this case, I’d suggest IE7 as a
context. On the bulk of the pages (HTML), the
charset
parameter of theContent-Type
header is very relevant. Apparently on others (XML), this parameter will be ignored. This state, in the context of a single product, can hardly be described as consistent.
Note: attempts have been made to
revise RFC 3023, but they appear to have stalled. In my
opinion, the revisions suggested so far (including deprecating
text/xml
) don’t address the core problem.
I’d suggest that inspiration be taken from the HTML 4.0
specification,
section
5.2.2 which notes:
The HTTP protocol ([ RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the “charset” parameter is absent from the “Content-Type” header field. In practice, this recommendation has proved useless because some servers don’t allow a “charset” parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the “charset” parameter.
If this were adopted, IE7’s consistency could be made whole again. The resulting situation would then be as follows:
- servers which are un-configured (i.e., don’t specify a charset) would continue to operate as they do today.
- servers which are properly-configured (i.e., specify the correct MIME type and optionally the correct charset) would work as intended — including some that might not work today
- servers which are mis-configured (i.e., explicitly specify a charset which does not match the encoding of the document) would behave in accordance to the relevant standards