intertwingly

It’s just data

Planet Webtuesday


Planet Webtuesday is a aggregator of member blogs.  One cool feature is that members can add new feeds to the list simply by editing their member page to include a wiki link to a feed along with the word “feed” in the link description.

This setup can also automatically extract individual user posts from a group blog, simply by specifying the desired author’s name (example).  This works independent of the character encoding of the source — not everybody in Zürich has the foresight or hospitality to limit their names to the ASCII character set.  This works independent of the feed format — RSS 2.0, for example doesn’t have a place defined to place people’s names, so some put their names in RFC 822 style comments, others ignore the specification and put their names in place of email addresses, and still others resort to so-called “funky” extensions.  In every case, the Universal Feed Parser ferrets this information out and canonicalizes it.

Being able to depend on the canonical well formed, utf-8, xhtml, fully qualified (non-relative URIs), and Atom 1.0 format for every entry does make many things easier for designers of filters and templates, but it does require some ability to visualize the mapping.  One thing I often found myself doing to test things out is to build a temporary configuration file, creating a temporary cache, running a few tests, viewing the outputs, and then cleaning up afterwards.

This type of repetitive stuff that scripts are good at, so I wrote one.  It is called tests/reconsititute.py, and an example usage is as follows:

python tests/reconstitute.py http://feeds.feedburner.com/boingboing/iBag

You can get it here.