Note: in the proper application of
scientific method, observation precedes the formulation of a
hypothesis. In that spirit, I’d like to ask that people
indulge me for a moment and refrain from rationalizations,
justifications, and explanations, and focus for the moment on
simply gathering data.
BottomFeeder displays the feed and it's single item, using a space instead of a quote. Browsing the page spawns a browser to this page (as expected in Bf)
Shows up as a broken feed in Wildgrape NewsDesk 1.1. Feed details show the following error:
"Unable to update this channel
Reason: This channel may be publishing invalid RSS, or you may have discovered a bug in NewsDesk. Please submit this channel to support@wildgrape.net."
Subscribes OK in NewsGator 1.3 in Outlook 2002, displays an entry titled "Not so smart quotes" from "It's just broken" containing the body text: "Postels law".
"Error parsing RSS XML: There is an invalid character in the given encoding. Line 15, position 26.
Please try to validate this feed(link). If this feed validates as correct RSS, you can send an error report(link)."
Radio Userland: no errors
Everything works, even the right quote. (which is interesting, because quotes often show up wrong in Radio, f.e. in Tim Bray's feed)
Userland Radio 8 on Windows XP parses the feed without throwing an error and displays the "smart quote" in its rendering without attempting to entify it. I'll pull it up in Radio on the Mac when I'm near my Mac tonight.
Feed is correctly added in my web-based RSS aggregator called "MyBlogroll" ([link]). The title of the feed is "It's just broken", and I have one item entitled "Not so smart quotes" with "Postels Law" for the description.
I just got the list of recently updated blogs from blo.gs/changes.xml and validated all the feeds listed. Filtering out bad data in the list itself (feeds that weren't actually feeds, or that could not be retrieved), there were a total of 1181 feeds tested, of which 92 were invalid. 31 were invalid because of non-well-formed XML. Keep in mind these are actively maintained sites (they've updated in the past few hours).
Then I checked the Technorati Top 100. Of 100 sites, 35 have feeds listed, of which 4 were invalid. 1 was invalid because of non-well-formedness.
Now, we all draw our lines in the sand in different places, and that's fine, but I'm thinking that any aggregator that can't even handle the 35 most popular feeds in the world is broken, regardless of what Norman Walsh or the XML specification says.
Mark,
You state that only 1 of the top 100 feeds was invalid due to non-well formedness so why due you quote the number 35 then mention the XML specification or Norman Walsh for that matter?
PS: Isn't this offtopic for this thread? I believe Sam specifically stated that folks should "refrain from rationalizations, justifications, and explanations, and focus for the moment on simply gathering data".
Dare, only 35 of the Technorati Top 100 sites have feeds listed ( [link] ), thus the sample size is 35, not 100. I mention Norman Walsh because he commented in this thread that "a parser that accepts that document is broken". We are having a semantic disagreement over the definition of the word "broken". He believes it should be used to mean "not conformant to the specification", and I believe it should be used to mean "functions in the best interests of the vendor's paying customers".
My ultra orthodox feed reader (which is just an XQuery in Saxon) reports an "Unconvertable UTF-8 character . . ." error, and fails to parse the document.
Radio on the Mac (running in Safari) also throws no parsing error on this feed, but the display in Safari is odd (Postel[base ']s law) and the "smart quote" does get entified as ’ when slurped into the weblog posting function.
What parts of this are done by Radio and what are done by Safari, I couldn't tell you.
The issue of quality has been on the mailing list a lot recently, along with many blog posts referring to Postel's Law. Although there are doubters, the consensus seems to be that insisting on feed producers sticking to the spec,......
Version 2.7.2 of my feed parser, released today, will by default refuse to parse this feed. It does a first-pass check for wellformedness, and when that fails it sets the 'bozo' bit in the result to 1 and immediately terminates. You can revert to the previous behavior by passing disableWellFormedCheck=1, but it will print arrogant warning messages to stderr to the effect that anyone who can't create a well-formed XML feed is a bozo and an incompetent fool.
Les auteurs respectifs de NetNewsWire et de FeedDemon ont annoncés cette semaine que leur support d’Atom serait XMLement strict. Cela veut dire que les fils Atom devront nécessairement être valides XML pour être lu dans ces 2 aggrégateurs et que...
Is a XML parser that accepts non well-formed XML broken?
Sam has been doing some experimenting w/ non well-formed XML. The result is that most parsers actually accept his invalid XML document. Are these parsers broken? The answer to the question is obviously yes! These products were found...
I think there are a lot of people who are willing to do a little bit to improve feed quality, if it's not too hard and they can do it from where they are already....
[more]
RSS (RDF Site Summary or Really Simple Syndication) has come a long way, up to RSS 3.0 now (which is no longer XML -- after all, since when has XML been simple?). I'd like to see Atom catch on as a syndication format, and as is being pointed out,...
Norman Walsh. "A parser that accepts that feed is broken" - Absolutely.
But we're not talking about parsers here, we're talking about RSS reader applications. So:-
Julian Bond. "A feed reader that fails to display the feed, and fails also to inform the user that the feed is broken, is broken".
This form of error is so common and so trivial to handle that failing to display the data (perhaps with a ?) is being unnecessarily awkward to the user. But we should still be warning the user, so they can warn the feed owner, that there was a problem.