Anne
van Kesteren: If your host has already configured your
server like this you can not alter the character encoding using a
META element. Every document that suggests otherwise is
incorrect.
Nearly
ten months ago, I set out to tackle internationalization issues
on my weblog. My research included not only the specs, but
experimentation with web browser software. My conclusion at
the time was that they were out of sync.
Time for an update. For starters, Anne points to a new
emerging
standard that is consistent with previous W3C
specs
and
tutorials.
But do these specs represent reality? In my
DevCon
2004 slides, I asserted otherwise. This was based on
testing I had done, in particular two tests:
iso-8859-1: While the HTTP content-type header specifies
that this page is utf-8, it actually contains iso-8859-1 content,
and it does everything it can within the page to say so.
utf-8:
Again, the HTTP content-type header specifies that this page is
utf-8, this time correctly. However, the XML declaration and
the meta tag attempt to fool the browser into thinking
otherwise.
I'm now getting different results than the ones I reported on
last time, ones that are more consistent with the standards as
written. Perhaps the declarations that
XML on the
Web Has Failed were premature, we need to only give it more
time?
On the other hand, and on a much narrower scope, the consensus
continues to build that any notion that HTTP has a meaningful
default
charset
continues to be foolish.
Meanwhile, try these two tests above, and if you get any
interesting results, please leave a comment specifying what you saw
and what browser (including version) you used. As I
understand the specs,
iso-8859-1 should be treated as if it had unprintable
characters in it, and
utf-8
should display correctly. After you view each page, try a
refresh, particularly in IE.
Using Safari 1.2.4, the utf-8 one works as expected but the iso-8859-1 omits the internationalised characters entirely, instead displaying "Itrntinliztin". In Firefox 1.0 on OS X the utf-8 one works but the iso-8859-1 displays "I?t?rn?ti?n?liz?ti?n".
Opera 7.50 on OS X displays the utf-8 one correctly but displays the iso-8859-1 in the same way as Firefox but with squares instead of question marks.
Aside: the title of the utf-8 example page is currently "iso-8859-1".
IE5/Mac 5.20 on OS X does something really weird: it displays the iso-8859-1 one the same way the other browsers display the utf-8 one, but completely mangles the utf-8 one. Picture here: [link]
The only bug that currently exists in browsers that choose application/xhtml+xml as MIME type in the above test cases is that in the iso-8859-1 test case, they should throw in a non well-formed error. This is a known bug in Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=174351
Sam Ruby: Meta Charset Update. Anyone who wants to understand charset issues on the web should go through most of what he links. The main thing I took away is that the charset specified in HTTP has primacy. And that makes sense. Since that makes...
Firefox 1.0 and Opera 7.54 running on Linux both say it's text/html and display it as expected: the Latin1 page has unprintable character placeholders while the UTF-8 page shows up correctly.
Friday 20 May 2005 08:57 Nee das onzin. Je hebt totaal geen idee van hoe het in elkaar zit. Internet Explorer laat het eerst correct zien. Maar als je daarna op refresh drukt bekijken ze het opnieuw en laten ze het gecorrigeerde resultaat zien. Zie...
vrijdag 20 mei 2005 08:57 Nee das onzin. Je hebt totaal geen idee van hoe het in elkaar zit. Internet Explorer laat het eerst correct zien. Maar als je daarna op refresh drukt bekijken ze het opnieuw en laten ze het gecorrigeerde resultaat zien. Zie...