Abstract
Today's most used web servers deliver Content-Types based on the file extension of the resource. If the extension is '.html', the file will be served as Content-Type: text/html, if it is '.xml' it will be served as Content-Type: text/html and so on. This method is not very RESTful and breaks some specifications in regard to content encoding issues, specifically RFC 3023.
Status
Withdrawn.
Author: AsbjornUlsberg.
Rationale
RFC 3023 statest that all content served as text/* without a corresponding 'charset' parameter should be interpreted as encoded in US-ASCII. This makes it impossible for UnprivilegedUsers to serve non-US-ASCII XML files from webservers with the Content-Type: text/xml, even if encoding is set in the XML declaration. Example:
Content-Type: text/xml <?xml version="1.0" encoding="utf-8"?> <doc>Asbjørn's document</doc>
The above document should not be parsed to respect the 'encoding' value of the XML declaration, but should be parsed as US-ASCII, according to RFC 3023. To serve the above document correctly, one needs to either replace text/xml with application/xml or add a charset parameter to the Content-Type header. Examples:
Content-Type: text/xml; charset=utf-8 <?xml version="1.0" encoding="utf-8"?> <doc>Asbjørn's document</doc>
Content-Type: application/xml <?xml version="1.0" encoding="utf-8"?> <doc>Asbjørn's document</doc>
Content-Type: application/xml; charset=utf-8 <?xml version="1.0" encoding="utf-8"?> <doc>Asbjørn's document</doc>
All the above examples are valid. The problem is that all of them are virtually impossible if you are an UnprivilegedUsers.
Note that it's still possible to serve the document correctly by rewriting it as:
Content-Type: text/xml <?xml version="1.0"?> <doc>Asbjørn's document</doc>
What we can do
-
We can choose not to care about the UnprivilegedUsers. If they can't modify the Content-Type header somehow to serve the XML and Atom documents in correct Content-Type or with appropriate charsets, then their feeds will always be invalid.
-
We can try to influence web server developers to do the right thing, which would be:
-
Implement support for an '.atom' extension which would map to Content-Type: application/atom+xml.
-
Make it easier and maybe mandatory to explicitly state charset information for all text/* content. This will need heavy rewrites of installation procedures for all webservers, or some other configuration option.
-
We can state that all XML documents served as text/xml and thus US-ASCII must use entity references to characters outside of US-ASCII.
Proposal
Note: Some spec text will be sketched out here in the not so distant future. In the meantime, it is empty.
Impacts
The consequence is that the UnprivilegedUsers needs to be explicitly supported or not. If they are supported, Atom needs to be served as 'text/xml'. If they aren't, I think this needs to be explicitly stated somewhere:
-
If you do not have the ability to alter the Content-Type of the Atom feeds you serve so that they are application/atom+xml, you SHOULD NOT use Atom. Atom accepts one Content-Type only, and cannot be served as text/xml or application/xml.
Extensibility
Notes
This Pace has been withdrawn because it is superseded by the much more complete PaceShouldBeWellFormed.