Parsing Atom with Erlang
A simple program for parsing memes.atom. Below is an annotated version.
-module(memes). -export([scan/0]). -include_lib("xmerl/include/xmerl.hrl").
Define a module named memes
that exports a single function named scan
which takes zero parameters. Include the headers for xmerl, a library for processing XML.
memes_url() -> "http://planet.intertwingly.net/memes.atom".
Define a simple function that returns a constant string. Some people prefer to use macros for things like this.
scan() -> application:start(inets), { ok, {_Status, _Headers, Body }} = http:request(memes_url()), { Xml, _Rest } = xmerl_scan:string(Body), format_entries(xmerl_xpath:string("//entry",Xml)), init:stop().
Main program
- start inets. Erlang is all about loadable modules and long running processes.
- fetch the
memes_url()
using http. The result is pattern matched against a tuple of length two, the first of which must be the atom (think symbol or interned String)ok
. The second term in the pair must itself be a tuple of length three, of which the first two terms are discarded, and the final term is bound to the variable namedBody
. Erlang is designed to deal with components that fail, and each of these assertions are a part of that philosophy. - parse the
Body
using xmerl_scan. The resulting structure is bound to a variable namedXml
, and the remainder of the string is discarded. - invoke an XPath expression on the
Xml
using xmerl_xpath. The result is an array, which is passed directly to a function namedformat_entries/1
. - init:stop is called to gracefully shutdown all running threads.
format_entries([]) -> done; format_entries([Node|Rest]) -> [ #xmlText{value=Title} ] = xmerl_xpath:string("title/text()", Node), [ #xmlAttribute{value=Link} ] = xmerl_xpath:string("link/@href", Node), Message = xmerl:export_simple_content([{a,[{href,Link}],[Title]}],xmerl_xml), io:format('~s~n', [xmerl_ucs:to_utf8(Message)]), format_entries(Rest).
In lieu of looping constructs, Erlang programs tend to use sequential logic and pattern matching.
- When
format_entries/1
is called with an empty list,done
is returned. - Otherwise when
format_entries/1
is called with a list, the first node is bound to the variableNode
, and the remainder of the list is bound to a variable namedRest
, at which point: - Another XPath expression is used to extract the
title
. Assertions are made that the result is an array of length one, the first and only item in that array is a record of type #xmlAttribute, and the field namedvalue
in that data structure is bound to a variable namedTitle
. - Yet another XPath expression is used to extract the
href
attribute of thelink
element. - The Link and Title are combined to form an XHTML anchor element, which is exported into a string and bound to the variable
Message
. - The
Message
is then converted toutf-8
(from a list of unbounded integers each representing a Unicode character) and output using io:format. - The function
format_entries/1
is again called, this time with the remainder of the list.
Clearly dumping XHTML fragments to stdout isn’t ideal (perhaps XHTML-IM instead?), and you wouldn’t want to dump every meme on every run, but those are problems for another day.