Parsing Atom with Erlang
A simple program for parsing memes.atom. Below is an annotated version.
-module(memes).
-export([scan/0]).
-include_lib("xmerl/include/xmerl.hrl").
Define a module named memes that exports a single function named scan which takes zero parameters. Include the headers for xmerl, a library for processing XML.
memes_url() -> "http://planet.intertwingly.net/memes.atom".
Define a simple function that returns a constant string. Some people prefer to use macros for things like this.
scan() ->
application:start(inets),
{ ok, {_Status, _Headers, Body }} = http:request(memes_url()),
{ Xml, _Rest } = xmerl_scan:string(Body),
format_entries(xmerl_xpath:string("//entry",Xml)),
init:stop().
Main program
- start inets. Erlang is all about loadable modules and long running processes.
- fetch the
memes_url()using http. The result is pattern matched against a tuple of length two, the first of which must be the atom (think symbol or interned String)ok. The second term in the pair must itself be a tuple of length three, of which the first two terms are discarded, and the final term is bound to the variable namedBody. Erlang is designed to deal with components that fail, and each of these assertions are a part of that philosophy. - parse the
Bodyusing xmerl_scan. The resulting structure is bound to a variable namedXml, and the remainder of the string is discarded. - invoke an XPath expression on the
Xmlusing xmerl_xpath. The result is an array, which is passed directly to a function namedformat_entries/1. - init:stop is called to gracefully shutdown all running threads.
format_entries([]) -> done;
format_entries([Node|Rest]) ->
[ #xmlText{value=Title} ] = xmerl_xpath:string("title/text()", Node),
[ #xmlAttribute{value=Link} ] = xmerl_xpath:string("link/@href", Node),
Message = xmerl:export_simple_content([{a,[{href,Link}],[Title]}],xmerl_xml),
io:format('~s~n', [xmerl_ucs:to_utf8(Message)]),
format_entries(Rest).
In lieu of looping constructs, Erlang programs tend to use sequential logic and pattern matching.
- When
format_entries/1is called with an empty list,doneis returned. - Otherwise when
format_entries/1is called with a list, the first node is bound to the variableNode, and the remainder of the list is bound to a variable namedRest, at which point: - Another XPath expression is used to extract the
title. Assertions are made that the result is an array of length one, the first and only item in that array is a record of type #xmlAttribute, and the field namedvaluein that data structure is bound to a variable namedTitle. - Yet another XPath expression is used to extract the
hrefattribute of thelinkelement. - The Link and Title are combined to form an XHTML anchor element, which is exported into a string and bound to the variable
Message. - The
Messageis then converted toutf-8(from a list of unbounded integers each representing a Unicode character) and output using io:format. - The function
format_entries/1is again called, this time with the remainder of the list.
Clearly dumping XHTML fragments to stdout isn’t ideal (perhaps XHTML-IM instead?), and you wouldn’t want to dump every meme on every run, but those are problems for another day.