It’s just data

Publishing a Blog From a mod_atom Store

Seth Gordon: Planet (http://www.planetplanet.org/) was designed to crawl all the feeds on the blogroll and produce some appropriately formatted HTML page with all their contents; you could just set it up so it only read your own blog’s mod_atom feed, make some appropriate template, and voila!

That would certainly cover the front page, but that’s about it.

Fortunately, there are bits and pieces that cover the rest.  I’ve contributed heavily to Planet, the Universal Feed Parser, and html5lib, and maintain what effectively is the only active development branch of Planet at this point, which I call Venus.  As Venus has been refactored, it is easier to discuss this in terms of Venus’s architecture than of Planet’'s.

Venus has been split into two phases, Spider which fetches the data, and Splice which selects and formats entries.  They communicate by means of an Atom store.  Let’s look at each in turn.

The output of all this is placed on disk, one file per entry.  At this point, it is worth considering the internal data format of Tim’s mod_atom, where all data placed on disk, one file per entry.  Hmmm...  Atom Store!

The bazillion feed formats issue is a non-issue here, nor is the eight ways to specify an author name, nor is the seemingly endless creative ways in which people seem to misuse RFC 822 formatted dates; all that remains as an unaddressed issue is the cleansing of the HTML.  In terms of this diagram, that simply means that html5lib needs to shift from the left to the right, and Spider is no longer necessary.

Now, lets look at that right hand side.  Splice is brain dead simple.  It reads a sets of entries, concatenates them into a feed, and then sends that feed to the template engine of your choice.

It actually is simple enough that I don’t believe that there actually will be any code worth reusing.  If you are producing your web site dynamically, you need a controller that parses the URI to determine which file(s) to read off of disk, parse those files (an XML parser will do just fine here), sanitize the HTML (again, all you need is in html5lib), resolve relative URIs, and then pass the output through a template of your choice.

If you are generating your website statically, you do basically the same thing, but place the output on disk instead.

Oh, and did I mention that html5lib was available in two languages: Python and Ruby?

But enough with hand-waving.  Time for some real code.  Checkout thisDownload this.  Tailor two lines.  And then:

eruby atompub.rhtml

Joe can port it to Python in 10 minutes.  Steve to JavaScript in 20 hours or so.  Prefer Java?  C#?  Perl?  Go for it!


I’ve been puzzled by this. Couldn’t something similar be written in 50 lines of Python combined with some “Script PUT” directives in an .htaccess file? If the POST and PUT handlers generate static XML files, and run the feedvalidator code on incoming requests, you’d catch most of the bogus client implementations out there.

Posted by Robert Sayre at

Wouldn’t it be better to make html5lib part of mod_atom? You really want your atom store to contain clean xhtml.

Posted by Sjoerd Visscher at

Sam Ruby: Publishing a Blog From a mod_atom Store

[link] [more]...

Excerpt from reddit.com: programming - newest submissions at

“Blosxom, the Next Generation” is truly a beautiful thing. :)

Posted by d.w. at

Sam Ruby: Publishing a Blog From a mod_atom Store

Sam Ruby: Publishing a Blog From a mod_atom Store by benoit & 1 other(s) python mod_atom atom Copy | React (0) [link]...

Excerpt from Public marks from user benoit at

I’m starting to agree with Sjoerd.  I’m really uncomfortable about accepting raw claims-to-be-HTML from the wild and sticking it in something with a URI which can be publicly fetched by anyone.  So a C version of html5lib that I could jam into mod_atom (at least as an optional step) would be a good thing.

Posted by Tim Bray at

I’m really uncomfortable about accepting raw claims-to-be-HTML from the wild and sticking it in something with a URI which can be publicly fetched by anyone.

Yeah, I don’t see how that could possibly work on a large scale.  Oh, wait...

Posted by Mark at

So a C version of html5lib that I could jam into mod_atom (at least as an optional step) would be a good thing.

Bring it on.  :)  mod_atom would be insanely capable with this addition.

Posted by Scott Johnson at

[from ttopper] Sam Ruby: Publishing a Blog From a mod_atom Store

If you are producing your web site dynamically, you need a controller that parses the URI to determine which file(s) to read off of disk, parse those files (an XML parser will do just fine here), sanitize the HTML (again, all you need is in...

Excerpt from del.icio.us/network/2mm at

Sam Ruby: Publishing a Blog From a mod_atom Store

The bazillion feed formats issue is a non-issue here, nor is the eight ways to specify an author name, nor is the seemingly endless creative ways in which people seem to misuse RFC 822 formatted dates; all that remains as an unaddressed issue is the...

Excerpt from Public marks with search atom publishing protocol at

Sam Ruby: Publishing a Blog From a mod_atom Store

[link]...

Excerpt from del.icio.us/tag/samruby at

Tab Sweep

I think my list is indicative that I divide my attention too thin: Java’s Fear of Commitment ObjectGrid v6.1 User Guide Grails Object Relational Mapping Exhibit Examples from the SMILE project World of Resources in Rails Planet Venus Code Robaccia...

Excerpt from 16cards at

Add your comment