intertwingly

It’s just data

Telex Digraph Mappings


Aurélio, Küng, Stärk, Uña, Łuksza, these are but a few of the names of contributors to the ASF.  Names which contain non-ASCII characters.  Characters that subversion doesn’t deal with consistently between Mac and other platforms.

I can map these (as well as a few others) names (albeit in a lossy manner) to subversion-safe file names using the following JavaScript:

name=name.replace(/\u00e4|a\u0308/g,'ae');
name=name.replace(/\u00e5|a\u030a/g,'aa');
name=name.replace(/\u00e7|c\u0327/g,'c');
name=name.replace(/\u00e9|e\u0301/g,'e');
name=name.replace(/\u00f1|n\u0303/g,'ny');
name=name.replace(/\u00f6|o\u0308/g,'oe');
name=name.replace(/\u00f8/g,'o');
name=name.replace(/\u00fc|u\u0308/g,'ue');
name=name.replace(/\u0141/g,'L');

But at this point, it occurs to me that such a set of mappings must have been done before.  The Wikipedia entry for Umlaut indicates that there is such a set of rules for Telex devices, but I have been unable to locate these rules.  Anybody have a pointer?