Rick Jelliffe: if you make up or maintain a public text format, and you don’t provide a mechanism for clearly stating the encoding, then, on the face of it, you are incompetent. If you make up or maintain a public text format, it is not someone else’s job to figure out the messy encoding details, it is your job.

I guess it would follow that Python and Perl are competent programming languages.

“Incompetent?”  What is it with XML gurus and the name-calling?

Seriously though, Korean text is usually in EUC-KR, not UTF-8, and it’s one of the easiest languages/encodings to auto-detect accurately, using a multibyte prober and some frequency distribution analysis.  And Korean (or anything else) in UTF-8 can be detected with a regular expression.  I’m not sure what the chances are that a random binary file would match, but I’m sure the chances go down as the file size goes up.

(Yah, those quotes were intentional.  Taken from here.  Savor the irony.)

Random Thoughts, in no particular order:

