intertwingly

It’s just data

Atom to JSON with Erlang


atom2json.erl converts a directory of Atom files to a directory of JSON files.  As with most real-life problems, this one has multiple layers.

First one needs to settle on an XML to JSON mapping.  It turns out that there are many different approaches to this problem.  For now, I elected to do some generic XML-to-JSON mapping crap.  An RFC in this area would be helpful, particularly one that dealt with the notion of Extensions, and one that exposes the true structure of [x]html Text Constructs as those would crucial enablers for things like standard Map/Reduce jobs that extract Microformats and RDFa.

Next, it turns out that the data structures returned from the XML parser/builder (xmerl) are not what the JSON parser/builder (rfc4627) expects, so there’s yet another layer of impedance mismatch.

The next level down, there are Erlang concepts of tuples, arrays, binary, and (lower case) atoms that need to be dealt with.  Even lower down, there is utf-8 which apparently the current rfc4627 implementation doesn’t properly handle, so that module needs to be patched.  (Note: this is only for the JSON builder part, another patch would be required to support JSON parsing).

Add in requirements like coalescing consecutive XML text nodes, and the desire to spawn a separate thread per conversion, and the task seems like a fairly daunting one.  Yet the resulting Erlang program is remarkably compact, clean, and simple.

With many frameworks and languages, I get the feeling that I’m dealing with a metal cabinet covered by layers of marine paint; one where scratches tend to reveal sharp edges.  With Erlang, I get the feeling of a Victorian mahogany armoire; one where scratches in the wood simply reveal more rich wood.