Parsing Atom with Erlang
A simple program for parsing memes.atom. Below is an annotated version.
-module(memes).
-export([scan/0]).
-include_lib("xmerl/include/xmerl.hrl").
Define a module named memes that exports a single function named scan which takes zero parameters. Include the headers for xmerl, a library for processing XML.
memes_url() -> "http://planet.intertwingly.net/memes.atom".
Define a simple function that returns a constant string. Some people prefer to use macros for things like this.
scan() ->
application:start(inets),
{ ok, {_Status, _Headers, Body }} = http:request(memes_url()),
{ Xml, _Rest } = xmerl_scan:string(Body),
format_entries(xmerl_xpath:string("//entry",Xml)),
init:stop().
Main program
- start inets. Erlang is all about loadable modules and long running processes.
- fetch the
memes_url()using http. The result is pattern matched against a tuple of length two, the first of which must be the atom (think symbol or interned String)ok. The second term in the pair must itself be a tuple of length three, of which the first two terms are discarded, and the final term is bound to the variable namedBody. Erlang is designed to deal with components that fail, and each of these assertions are a part of that philosophy. - parse the
Bodyusing xmerl_scan. The resulting structure is bound to a variable namedXml, and the remainder of the string is discarded. - invoke an XPath expression on the
Xmlusing xmerl_xpath. The result is an array, which is passed directly to a function namedformat_entries/1. - init:stop is called to gracefully shutdown all running threads.
format_entries([]) -> done;
format_entries([Node|Rest]) ->
[ #xmlText{value=Title} ] = xmerl_xpath:string("title/text()", Node),
[ #xmlAttribute{value=Link} ] = xmerl_xpath:string("link/@href", Node),
Message = xmerl:export_simple_content([{a,[{href,Link}],[Title]}],xmerl_xml),
io:format('~s~n', [xmerl_ucs:to_utf8(Message)]),
format_entries(Rest).
In lieu of looping constructs, Erlang programs tend to use sequential logic and pattern matching.
- When
format_entries/1is called with an empty list,doneis returned. - Otherwise when
format_entries/1is called with a list, the first node is bound to the variableNode, and the remainder of the list is bound to a variable namedRest, at which point: - Another XPath expression is used to extract the
title. Assertions are made that the result is an array of length one, the first and only item in that array is a record of type #xmlAttribute, and the field namedvaluein that data structure is bound to a variable namedTitle. - Yet another XPath expression is used to extract the
hrefattribute of thelinkelement. - The Link and Title are combined to form an XHTML anchor element, which is exported into a string and bound to the variable
Message. - The
Messageis then converted toutf-8(from a list of unbounded integers each representing a Unicode character) and output using io:format. - The function
format_entries/1is again called, this time with the remainder of the list.
Clearly dumping XHTML fragments to stdout isn’t ideal (perhaps XHTML-IM instead?), and you wouldn’t want to dump every meme on every run, but those are problems for another day.
Sam Ruby: Parsing Atom with Erlang
simple example of something in erlang.......Excerpt from del.icio.us/tag/erlang at
Thanks Sam! I’m just starting to learn the language, and for me XML tools are mustUnderstand (and mustHave, amongst other things [link] ). Hopefully by the time I need them, someone will have turned “XSLT like transformations in Erlang” [link] into real XSLT...
Posted by Danny at
IPC with Erlang
Previous status: I’ve shown Erlang code that sends a Jabber message. I’ve shown Erlang code that parses a planet Venus generated Mememes feed. Now, lets consider the problem of sending notifications for new items. Typically... [more]Trackback from Sam Ruby at
Nitbits
Ik maak me nu al verschillende dagen druk omwille van dit hier . Na het gedoe over Ruby en Rails (Twitter? Schaalbaarheidsproblemen? Anyone? ), en een kleine revival van PHP (hoewel, PHP5 is al drie jaar uit en blijkbaar wordt het toch niet zo druk...Excerpt from Urgh at
links for 2007-08-30
How to Configure Swiftiply with Nginx (tags: ruby rails nginx mongrel sysadmin cluster 247up) Parsing Atom with Erlang (tags: erlang programming web xml) Full Stack: Portable Home Directory over NFS on OSX authenticated via OpenLDAP on Debian Linux...Excerpt from Bloggitation at
Sideways Computing
On many-core, wasted cores, threads, processes, transactional memory, and the E-word....Excerpt from Planet Atom at
Hey, Sam - just browsing about, and noticed your “http” link here ("fetch the memes_url() using...") actually points to “httpd.html”. Thanks for all the Erlang experimentation. :)
Posted by Glen at
This code is a bit less robust than one might like in practice.
a) http:request can fail, so you probably want a case statement to deal with errors
b) xmerl_xpath:string sometimes returns a list of multiple text nodes (for example when an entity is contained in the text, don’t ask me why), and when it does the pattern match will fail.
If you want to use this in a long-running process, both of these need to be dealt with one way or another.
Posted by Kevin Scaldeferri atForcing a Code Update in a Running Erlang Process
A couple days ago I wrote my first IRC bot, to echo updates from an RSS feed into an IRC channel. It seemed like a fun thing to do in Erlang, and the bot ended up being basically a splice of Jonathon Roes’ bot skeletor with Sam Ruby’s post about...Excerpt from Kevin's Weblog at
[from jaredhanson] Sam Ruby: Parsing Atom with Erlang
[link]...Excerpt from del.icio.us/network/lucky_sengoku at