Reading Lists for Planet

2006-05-12T04:28:00Z

Mary Gardiner: Anyone know of any showstopper bugs preventing a stable release? Anyone else in favour of poking Jeff until he does some magic?

Until now, planet.intertwingly has focused on ensuring that Atom 1.0 support is complete and correct. It is time to widen the net.

The best and easiest way to do this is to try a diverse bunch of popular feeds. And as luck would have it Dave Winer just posted a top100.opml file.

The native format for Planet configuration is a config.ini file, so adding support for OPML requires a converter. The conversion I wrote is rather dumb and unforgiving. If the document is not well formed, it will not be accepted. I realize that some people say xmlUrl, others say xmlurl, but only the former is accepted for now. I realize that some use text, others use title and still others use both, but again, in this initial implementation will only accept text.

When I was done, I realized that I had just implemented Reading Lists. To use, simply place a list of OPML files, one per line, into your config.ini. If you are online, these files will be fetched on every run. If there are fetch errors or you are offline, the last cached version will be used. Feeds that are no longer in the list will no longer be polled. Feeds that are added to the list will start to be polled.

Results

See for yourself.

One of the first things I noticed is that the OPML top 100 list is really a top 91 list. Several weblogs publish multiple feeds, either in multiple formats, or have one that they publish and one that FeedBurner provides. This means that the popularity of people who publish only one feed is overstated, and the popularity of people who publish multiple feeds is understated.

In all, multiple feeds causes more work for everybody. People really should pick one x.0 format (RSS 1.0, RSS 2.0, or Atom 1.0) per feed, and stick with it.

That being said, Planet seems to do a fairly decent job of detecting and eliminating these duplicates.

And planet also excels at handing encoding issues. £500 comes out as £500 on planet.intertwingly, whereas it shows up as Â£500 on hosting.opml. And the ever so popular â€™ is correctly displayed as a smart quote.

It seems that not everybody provides author information. Even for group blogs like Make — though this information is in Make’s Atom 0.3 feed. This is silly. Pick one feed format. Preferably with an x.0 version number. And provide the same information to everyone.

I have yet to spot a relative URI reference issue.

All of the popular feeds seem to be present, active, responsive.

Potential improvements

ExtremeTech’s managing editor value is not exactly an email address, and this confuses the feed parser slightly. This feed also doesn’t have any pubDates.

Ars Technica double escapes titles. While what the Universal Feed Parser is doing is defensible, there are some heuristics that can be added. While guessing can never be as good as knowing, the odds can certainly be improved.

While tools like IE7 can be described as Draconian, there are a few places where the feed parser borders on being Procrustean. <font size="1"> is AOK, but <span style="font-style: italic;"> is not. A white-list of css properties should be defined.

Conclusions?

Anybody spot anything I missed?

As near as I can tell, the issues are are minor, and the process of tweaking is one that never ends. It’s time to start wrapping this up.