It’s just data

Planetary Exploration

The original Planet was simply named Planet (not Planet Planet despite what the web site says).  It was originally created by Scott James Remnant and Jeff Waugh.  My original interest was in ensuring that it had proper Atom support.  Over time, I became the defacto primary maintainer.

Planet’s strength was that it built on Mark Pilgrim’s feedparser which converted any feed into a lazy dictionary, and Tomas Styblo’s templating engine which converted a static file and a dictionary into HTML (or whatever format you like).  So if you do the math, feed plus static file (plus some configuration) gives you a site.  State information was managed using a dbhash and a cache directory.

While small, it became difficult to maintain (pretty much all of the mainline logic was in the __init__ file), and I had dreams of introducing filters.  So, I embarked on a radical refactoring.  In the process, I eliminated the need for the dbhash.

With this new design, I was able to create a number of filters, but the way content was handled was always an issue.  All content was simply serialized as a string.  This meant that sanitization required parsing the content into tokens, removing or modifying nodes, and re-serialization.  Expanding relative URIs involved the same process.  Extracting microdata involved the same process.  Etc., etc..  Each introducing the possibility of mangling such things as mathml.

Mars took a different direction.  Instead of a dictionary, there was a DOM.  Feed elements were placed into the DOM.  Content elements were in the DOM.  You can iterate over everything.  Everything is only parsed once.  Everything is only serialized once.  A much cleaner design... if you like XSLT templates.  If you like more traditional templates, like haml, all this ended up doing was moving the problem of converting the DOM into a dictionary/hash to another place.

Now that I’m exploring node.js, I have the opportunity to revisit this once again.  Since I have access to jquery, I should be able to eliminate the pesky conversion of a DOM into a format usable by a templating engine problem.  I should be able to pass in a single value, named $, which contains a set of entries to be iterated over.

As this is a journey, I’m not sure where this will end up, or if it will end up with anything useful at all.  Perhaps it will end up with a more scalable and dynamic server that ties into pubsubhubbub.  Perhaps the planet software itself will move from the server to the client and take advantage of web workers and local storage.

I would love such a distributed client-storage aggregator whose users with different online habits contribute their availability to checking others' feeds.

Posted by Hoàng Đức Hiếu at

Awesome. :-)

Posted by Jeff Waugh at

I keep meaning to revisit Brent Simmons’s notion for thin-server RSS sync: [link]

Seems like maybe between that, and what you’re talking about, someone could build a sort of store-and-forward service for browser-based feed clients across desktop and mobile.

Posted by l.m.orchard at

I don’t understand how such a thing could “know[s] about ... the status of news items” while achieving “No latency”, “Security”, and “Reachability”.  For that matter, I don’t understand the value of “Longer limits on news item status” without archiving to the content.

Posted by Sam Ruby at

So what language/skill will you be learning this year?

Rafe is going to be investing in learning JavaScript and Node.js. After some server-side JavaScript work last year with Alfresco......

Excerpt from Karl Martino at

Well, the thin-server sync idea is syncing user-generated state between feed-reading clients. It’s just annotations attached to URLs - or hashes of URLs, say. It’s all assuming that you can attach a unique identifier across clients to every news item.

So, no latency means that a feed reader pulls a feed straight from the originating web site, only consulting the sync server for flags on the items in the feed. That’s as opposed to loading the whole bundle from Google Reader, after their pollers have gotten around to hitting a feed.

Security means that the sync server never has feed content, which helps for private intranet feeds.

Reachability means that your feed reader doesn’t necessarily have to run in the cloud to share state between readers. That is, you could run NetNewsWire, read some pre-fetched things on a plane without wifi, then sync your read-item and starred-item flags to the server once you get on the ground.

As for longer limits on new status - that means your desktop, laptop, or self-hosted feed aggregator in the cloud can archive as long as you like, as a separate service entirely from the feed item status sync server.

Posted by l.m.orchard at


Elijah Insua:  Now we have a release of jsdom capable of supporting xml5: nodeunit --reporter minimal testrunner.js testrunner.js: ...............................  OK: 31 assertions (144ms) weld also looks promising.  For my use case, I would... [more]

Trackback from Sam Ruby


Add your comment