intertwingly

It’s just data

Feedback on feeds


There are a lot of experimental NEcho 0.1 feeds out there of various quality.  I've seen some that aren't well formed XML (no, I'm not providing any names).  I've seen others that are excellent.  I'm going to take a look at two here that are high profile, and implemented across a large number of blogs, and therefore likely to be emulated by others.  As others come online, I'll try to do likewise.

Typepad's feeds validate, and are tracking to the standard as it evolves.  The feed is clean and simple.  Title, subtitle, summary, and content are well formed and inline.  You can clearly see that the original post (the one at the bottom) was modified from its original form.

If you have a feed parser or format driver that you are experimenting with, you should definitely try it against this feed.

Blogger's feeds are clearly marked as temporary prototypes.  They should be considered experimental.  And for that purpose, they don't disappoint.  They contain a number of elements and attributes that are currently marked as blogger extensions.  A number of these (e.g., generator) should be made common.

More interesting is the one issue that causes the feed to fail to validate.  Looking closer at the feed, you can see that summaries are not text/plain, but text/html.  In most cases, it is inline, but in one case it is escaped.  By looking at these side by side, you can see the differences.

Apparently, there are requirements for html in titles.  At a minimum, people argue for the ability enable the use of bold and italics.  Others significantly overdo it, IMHO.

This leads me to think that the right answer is to define all of the content related items (title, subtitle, summary, content) the same way: with a default of text/plain and with the single level of escaping required by XML.  Those that wish to use other types or an additional level of escaping simply are required to note this with an attribute.

I say this knowing that the discussion that Tim Bray captured so well nearly three weeks ago is still ongoing.  Apparently there are multiple use cases for how feeds are produced.  And multiple use cases for how feeds are consumed.  And these give inconsistent guidance on what is ideal.

What I will say is that given my experience with the validator, I find that people don't read specs carefully, if at all.  More often, they emulate what they see.  They follow examples.  And when they see mostly escaped content, they emulate poorly.  If you want to see what I mean, ask somebody to create a title of "Ben & Jerry's".  Then tell them that you want "Ben" in italics and "Jerry's" in bold.

Having a validator and working with people one on one to fix their feeds certainly helps, but frankly is an uphill battle.  Particularly when people note that their feed "works OK in aggregator X" - not the definition of interop that I for one particularly aspire too.

So, while I'm sensitive to the notion that consumers would have a few less lines of code to write if there were only one way, I feel that we should face reality.  Pick a default that matches what most people are likely to do by hand, and a define an explicit marker for what a number of programs will generate.

IMHO.