intertwingly

It’s just data

Semper fidelis


Tim Bray: In the past few days I’ve been watching two debates on the subject of Unicode

This lead me to a morning full of exploration, where I learned quite a few thing about m17n, a term I had not come across before.

I was aware that the latin a (a) and cyrillic а characters (а) are different characters, but display using the same glyph.

I was aware that the latin small letter e with acute could be encoded either as é (é) or (é) and that both encodings are intended to be canonically equivalent.

What was news to me is that characters like Unicode Han Character ‘enter, come in(to), join’ (入) have common semantics across various so-called CJK languages but displays as (zh-tw) or as (ko) depending on the value of lang and/or xml:lang, and that this controversy is one of several factors which are shaping Ruby’s Unicode roadmap.

This apparently is very troubling to Tim who (like most of us) have neither the inclination, time, or means to fork either the IETF or Ruby, which leads me back to Mark Pilgrim’s quest for fidelity that lead to the statement that Openness is not a cargo cult. Some get it, some don’t. Apple doesn’t. This sentiment seems to me to be the underlying foundation for Mark’s beautiful theory that is increasingly at odds with uncomfortable facts. Ones like Jacques’ observations on how to obtain the best fidelity for MathML with various products.

A theory that I’m more comfortable with was expressed over half a decade ago in Mark Pilgrim’s Misspent youth. Generalizing this, it seems to me that:

Unfortunately, this theory has to contend with its own set of ugly facts, namely Perl, PHP, Python, and Ruby.

Bah