It’s just data

Atom to JSON with Erlang

atom2json.erl converts a directory of Atom files to a directory of JSON files.  As with most real-life problems, this one has multiple layers.

First one needs to settle on an XML to JSON mapping.  It turns out that there are many different approaches to this problem.  For now, I elected to do some generic XML-to-JSON mapping crap.  An RFC in this area would be helpful, particularly one that dealt with the notion of Extensions, and one that exposes the true structure of [x]html Text Constructs as those would crucial enablers for things like standard Map/Reduce jobs that extract Microformats and RDFa.

Next, it turns out that the data structures returned from the XML parser/builder (xmerl) are not what the JSON parser/builder (rfc4627) expects, so there’s yet another layer of impedance mismatch.

The next level down, there are Erlang concepts of tuples, arrays, binary, and (lower case) atoms that need to be dealt with.  Even lower down, there is utf-8 which apparently the current rfc4627 implementation doesn’t properly handle, so that module needs to be patched.  (Note: this is only for the JSON builder part, another patch would be required to support JSON parsing).

Add in requirements like coalescing consecutive XML text nodes, and the desire to spawn a separate thread per conversion, and the task seems like a fairly daunting one.  Yet the resulting Erlang program is remarkably compact, clean, and simple.

With many frameworks and languages, I get the feeling that I’m dealing with a metal cabinet covered by layers of marine paint; one where scratches tend to reveal sharp edges.  With Erlang, I get the feeling of a Victorian mahogany armoire; one where scratches in the wood simply reveal more rich wood.


Interesting, and thanks again - my planned bit of late night fun for today is to drag various chunks of pre-existing XML data into Mnesia. It looks very much like your code will give me a bit of a leg up. At the very least, I no longer have to look in the documentation to find file:list_dir. :)

I have similar feelings about Erlang, it would seem. It was Ewan Silver’s comment of “The more I look at, and play with, Erlang the more I like it.” that made me finally take the plunge and start tinkering around with Erlang. I know exactly what he meant now - I like it more every day.

It’s worth noting though, that while YOUR resulting Erlang program is undeniably clean, compact and simple (I was expecting a lot more code after reading your post), it’s also possible to produce an extremely unpleasant mess with Erlang in the wrong hands. That’s true of any language of course, but I have a hunch that Erlang is very near the top of the “bad code potential per pound” list.

Posted by Ciaran at

I think that if you change the one occurrence of list_to_binary/1 in rfc4627.erl into utf8:to_binary from utf8.erl, the JSON parser will also give you UTF8 binaries.

Posted by Bart Schuller at

Sam Ruby: Atom to JSON with Erlang

“With Erlang, I get the feeling of a Victorian mahogany armoire; one where scratches in the wood simply reveal more rich wood.”...

Excerpt from del.icio.us/tag/erlang at

Sam Ruby: Atom to JSON with Erlang

[link] [more]...

Excerpt from reddit.com: programming - newest submissions at

An Atom store on top of CouchDb would be fantastic.  You could have the best of both worlds: easy to interact with and programmable (Couch) and supported by millions of clients out of the box (Atom).

Posted by Adam Hupp at

It's Simply Different

Sam Ruby gushes ... With many frameworks and languages, I get the feeling that I’m dealing with a metal cabinet covered by layers of marine paint; one where scratches tend to reveal sharp edges. With Erlang, I get the feeling of a Victorian...

Excerpt from Making it stick. at

Sam, that’s great stuff! I’m very interested in making rfc4627.erl support utf-8 properly. Bart, your comment above looks like a good approach. I have yet to completely digest the way the specification wants Unicode to be approached, but it seems to me that the Unicode conversion needs to run over the input byte stream, rather than only over bytes within strings. Anyway, I’m heads-down on another project at the moment, but hope to find some time to integrate some utf-8 support over the next few weeks. If you like, please feel free to contact me via email - tonyg at lshift dot net.

Posted by Tony Garnock-Jones at

Yet Another Programming Weblog: Sam Ruby y Erlang

Asisto con una mezcla de envidia y escepticismo a la conversión a Erlang de Sam Ruby . Sam Ruby es un típico goleor tecnológico ™ Siempre está a la última en cuanto a estándares y lenguajes sobre la web. De hecho, algunos los hace él :) Como...

Excerpt from Planeta Código at

The RDFa syntax is not yet stabilized but there is an « RDFa distiller » [1] to test the implementability of RDFa. Ivan Herman tested it on his homepage.[2]

[1] [link]
[2] [link]

Posted by karl dubost, w3c at

Beautiful work!

The part of converting xml entity to json encoder ready list/tuple, can actually be even more concise:

json(#xmlElement{name=Name, attributes=Attributes, content=Content}) ->
  [atom_to_binary(Name), {obj, json(Attrs)}, json(Content)];
json(List) when is_list(List) ->
    json_1(List, []).

json_1([], Acc) -> lists:reverse(Acc);
json_1([#xmlAttribute{name=Name, value=Value}|Rest], Acc) ->
    json_1(Rest, [{Name, list_to_binary(xmerl_ucs:to_utf8(Value))}|Acc]);
json_1([#xmlElement{}=Element|Rest], Acc) ->
    json_1(Rest, [json(Element)|Acc]);
json_1([#xmlText{value=Value1},#xmlText{value=Value2}|Rest], Acc) ->
    json_1([#xmlText{value = Value1 ++ Value2} | Rest], Acc);
json_1([#xmlText{value=Value}|Rest], Acc) ->
    json_1(Rest, [list_to_binary(xmerl_ucs:to_utf8(Value))|Acc]);
json_1([_Other|Rest], Acc) -> 
    json_1(Rest, Acc).


Posted by Caoyuan Deng at

Typo: The link to the code (atom2json.erl) in the excerpt (hence home page and atom feed, I guess day or month list resource too) is wrong, it is pointing to the January entry on application/atom+json instead of the source code.

Posted by Santiago Gala at

Santiago: fixed.  Thanks!

Posted by Sam Ruby at

Ascetic Database Architectures

Anant Jhingran:  Counter example.  I’ve been playing with CouchDB.  That code is definitely pre-alpha at this point, but this post is not about the code itself, but about the interface it provides. I was testing i... [more]

Trackback from Sam Ruby

at

Converting Atom to JSON (recipes from Sam Ruby)

I finished testing the Erlang program by Sam Ruby to convert atom files to JSON . I ran it a number of times on the atom part of the repository for the wiki / blog I’m converting from JSPWiki to mombo . I found it fast and, specially, it...

Excerpt from Boxes and Glue at

How should JSON strings be represented in Erlang?

Erlang represents strings as lists of (ASCII, or possibly iso8859-1) codepoints. In this regard, it’s weakly typed - there’s no hard distinction between a string, “ABC”, and a list of small integers, [65,66,67]. For example:...

Excerpt from LShift Ltd. at

atom2json.erl converts a directory of Atom files to a directory of JSON files....

Excerpt from del.icio.us/rybesh at

JSON and XML Conversion

← Older revision Revision as of 15:25, 23 April 2010 Line 610: Line 610: Other relevant resources: Other relevant resources: * JsonML * JsonML   + *...

Excerpt from Open311 Wiki - Recent changes [en] at

JSON and XML Conversion

← Older revision Revision as of 15:25, 23 April 2010 Line 610: Line 610: Other relevant resources: Other relevant resources: * JsonML * JsonML   + *...

Excerpt from Open311 Wiki - Recent changes [en] at

Add your comment