atom2json.erl converts a directory of Atom files to a directory of JSON files. As with most real-life problems, this one has multiple layers.
First one needs to settle on an XML to JSON mapping. It turns out that there are many different approaches to this problem. For now, I elected to do some generic XML-to-JSON mapping crap. An RFC in this area would be helpful, particularly one that dealt with the notion of Extensions, and one that exposes the true structure of [x]html Text Constructs as those would crucial enablers for things like standard Map/Reduce jobs that extract Microformats and RDFa.
Next, it turns out that the data structures returned from the XML parser/builder (xmerl) are not what the JSON parser/builder (rfc4627) expects, so there’s yet another layer of impedance mismatch.
The next level down, there are Erlang concepts of tuples, arrays, binary, and (lower case) atoms that need to be dealt with. Even lower down, there is utf-8 which apparently the current rfc4627 implementation doesn’t properly handle, so that module needs to be patched. (Note: this is only for the JSON builder part, another patch would be required to support JSON parsing).
Add in requirements like coalescing consecutive XML text nodes, and the desire to spawn a separate thread per conversion, and the task seems like a fairly daunting one. Yet the resulting Erlang program is remarkably compact, clean, and simple.
With many frameworks and languages, I get the feeling that I’m dealing with a metal cabinet covered by layers of marine paint; one where scratches tend to reveal sharp edges. With Erlang, I get the feeling of a Victorian mahogany armoire; one where scratches in the wood simply reveal more rich wood.
Interesting, and thanks again - my planned bit of late night fun for today is to drag various chunks of pre-existing XML data into Mnesia. It looks very much like your code will give me a bit of a leg up. At the very least, I no longer have to look in the documentation to find file:list_dir. :)
I have similar feelings about Erlang, it would seem. It was Ewan Silver’s comment of “The more I look at, and play with, Erlang the more I like it.” that made me finally take the plunge and start tinkering around with Erlang. I know exactly what he meant now - I like it more every day.
It’s worth noting though, that while YOUR resulting Erlang program is undeniably clean, compact and simple (I was expecting a lot more code after reading your post), it’s also possible to produce an extremely unpleasant mess with Erlang in the wrong hands. That’s true of any language of course, but I have a hunch that Erlang is very near the top of the “bad code potential per pound” list.
Sam Ruby gushes ... With many frameworks and languages, I get the feeling that I’m dealing with a metal cabinet covered by layers of marine paint; one where scratches tend to reveal sharp edges. With Erlang, I get the feeling of a Victorian...
Sam, that’s great stuff! I’m very interested in making rfc4627.erl support utf-8 properly. Bart, your comment above looks like a good approach. I have yet to completely digest the way the specification wants Unicode to be approached, but it seems to me that the Unicode conversion needs to run over the input byte stream, rather than only over bytes within strings. Anyway, I’m heads-down on another project at the moment, but hope to find some time to integrate some utf-8 support over the next few weeks. If you like, please feel free to contact me via email - tonyg at lshift dot net.
Asisto con una mezcla de envidia y escepticismo a la conversión a Erlang de Sam Ruby . Sam Ruby es un típico goleor tecnológico ™ Siempre está a la última en cuanto a estándares y lenguajes sobre la web. De hecho, algunos los hace él :) Como...
Typo: The link to the code (atom2json.erl) in the excerpt (hence home page and atom feed, I guess day or month list resource too) is wrong, it is pointing to the January entry on application/atom+json instead of the source code.
Anant Jhingran: Counter example. I’ve been playing with CouchDB. That code is definitely pre-alpha at this point, but this post is not about the code itself, but about the interface it provides. I was testing i...
I finished testing the Erlang program by Sam Ruby to convert atom files to JSON . I ran it a number of times on the atom part of the repository for the wiki / blog I’m converting from JSPWiki to mombo . I found it fast and, specially, it...
Erlang represents strings as lists of (ASCII, or possibly iso8859-1) codepoints. In this regard, it’s weakly typed - there’s no hard distinction between a string, “ABC”, and a list of small integers, [65,66,67]. For example:...