When it first appeared the White House feed had a few entries, all with the same id. Now it has 87 such entries. As first suggested by James Holderness, the feedvalidator now marks this feed as invalid. It will do so for all feeds that contain ten or more entries all with the same id. If this ends up producing too many false positives, I’ll tweak the algorithm.
Also noted in the process: the feed itself contains a fair amount of debris. A sytle attribute? A meta tag? o:p is common in content carelessly copy/pasted from Microsoft Word.
script elements and onclick attributes generally aren’t syndication friendly.
Using the correct mime type and adding in a self link wouldn’t be a bad idea either.
Are there automated tools that will convert the HTML output of Office into valid HTML that could be used in a feed?
Sam Ruby notes that the White House feed contains a fair bit of debris: Also noted in the process: the feed itself contains a fair amount of debris. A sytle attribute? A meta tag? o:p is common in content [...]...
Bob,
Tidy or one of it’s variants like JTidy can do a pretty good job at cleaning up Word’s HTML. Not as easy for normal authors as the just paste and forget experience EditLive! gives you but very useful to have in your set of tools.
Why do you discuss this case in this blog and why don' t you call your new president and tell him that his feed is invalid? Thought IBM has a direct line to the white house - in the late seventhieth this was definitive the case :)