The “it” in this case is the White House feed. Before it used the same UUID in each entry. That would have been valid, but only if the intent was that the blog only contained one entry, and that entry was replaced in place on each new post. Clearly that is not the intent, so it was invalid.
On Sunday, I deployed a change to the FeedValidator to indicate this error.
Today the feed has been corrected.
Whoa.
Now I doubt that the people who maintain the software that runs the White House blog read this blog. Perhaps they care about the Feed Validator. More likely it was just a coincidence. But in any case, they have gone from something that an Asshole might claim is spec compliant, to being a Moron, and in such a way that isn’t likely to cause anywhere near as much problem as before, and furthermore in a way that escaped notice of the Feed Validator.
Apparently, they are in good company, because on page 6 of RFC 5005 in the second example, the feed/entry/@id has the wrong number of characters in the first part, and non-hex digits in the last two parts.
In any case, I’m not certain that messing with the White House in this matter is a good idea. Should I happen to mysteriously disappear, let’s just say that it has been nice knowing you.
Rob, I understand the sentiment, but such IDs will cause a very real problem with Venus if others use the same scheme; still this is a marked improvement over the how the feed looked just a day or so ago.
Should I happen to mysteriously disappear, let’s just say that it has been nice knowing you.
If you end up chained to a Teletype in the White House basement, forced to generate the feed by hand, remember that you can place hidden calls for help in comments. :)
And the crying shame is, the atom entry ID problem could be so easily solved by using tag URI.
I don’t see the point of tag URIs. Either your alternate link is truly unchanging, in which case you can just use http IDs – like Tim Bray does. Or you store an unchanging ID as part of the entry in your data store somewhere, in which case there is no practical difference between tag and uuid except that the latter makes this requirement blindingly obvious.
Out of three hand-rolled feed generators using tag IDs whose source I have seen, all of them implemented the How to construct an Atom ID using tag: URIs section of Mark’s article by transliterating the alternate link at feed generation time. It’s like the rest of his article isn’t even there. Then when their alternate links change, so do their IDs. Remind me what the point is of having IDs in the first place?
So in my experience the most successful strategy in getting Morons to do the right thing is to keep absolutely mum about the existence of Mark’s article and tell them to use uuid IDs.
isn’t an aggregator that is broken by this putting too much trust in the authors of the feeds. some tag uri schemes will be predictable and a malicious author could generate a bunch of entries to deliberately collide with another feed.
Ben: that’s definitely a valid concern, and has been much discussed. A point on the other side is that some feeds (like my Planet’s feed) syndicate data from others, and duplicate detection across feeds is a valid feature too.
To me, it comes down to this: I have selected the people I chose to subscribe to. Implicit in that selection is an implication of trust.
The feed is not fixed - a new ID is generated every time an entry is edited, which is just as invalid as before, although slightly less harmful and much less detectable.
(I’ve also just noticed GWB had several feeds on his site - for press briefings and news releases and radio address, and they had decent GUIDs by the looks of it)
Ben: The primary purpose of IDs in feed readers (or in mine at least) is so that read/unread states (and other attributes) can be correctly persisted when the feed is refreshed, especially when the text content changes slightly. So I’m only ever comparing ids with the ids gathered from the same feed the previous time it was loaded. For exactly the reason you mention (as well as general laziness), I never compare across feeds.
@Aristotle: Tag URIs are useful where either you don’t have a decent way to mint UUIDs (such as if you’re writing code in PHP, or Python pre-2.5, and implementing RFC4122 just isn’t worth the time and effort) but need some way to generate an identifier with the same properties. That’s it. The tag URI site explains under the header “Why not use an http URL instead?” why something like a tag URI or a UUID would be preferable to a HTTP URI, something also covered in a recent post by Subbu Allamaraju. Still, if you can generate a UUID rather than resort to something else or if you’re happy with using HTTP URIs, more power to you.