It’s just data

Scientific Method

What does your aggregator do with this?  Is it valid?

Note: in the proper application of scientific method, observation precedes the formulation of a hypothesis.  In that spirit, I’d like to ask that people indulge me for a moment and refrain from rationalizations, justifications, and explanations, and focus for the moment on simply gathering data.


FeedDemon: No error, no display problem.

Mozilla: No error, display is broken.

IE: Error.

Posted by Roger Benningfield at

Firebird 0.7/Windows displays OK (but with an symbol instead of the apostrophe in "Postel's"). Bloglines subscribes to the feed fine.

Posted by sil at

Displays fine in feed on feeds.

Posted by Claude Montpetit at

NetNewsWire subscribes with no complaints, displays a reasonable-looking message, and comes back to this page when you click on it.

Posted by Tim Bray at

BottomFeeder displays the feed and it's single item, using a space instead of a quote.  Browsing the page spawns a browser to this page (as expected in Bf)

Posted by James Robertson at

Shows up as a broken feed in Wildgrape NewsDesk 1.1.  Feed details show the following error:

"Unable to update this channel

Reason: This channel may be publishing invalid RSS, or you may have discovered a bug in NewsDesk. Please submit this channel to support@wildgrape.net."

Posted by Jason Lefkowitz at

Subscribes OK in NewsGator 1.3 in Outlook 2002, displays an entry titled "Not so smart quotes" from "It's just broken" containing the body text: "Postels law".

Posted by Mark Gardner at

Seems to be OK in Bloglines, does that show up in Mark Pilgrim's list? Presumably not if its doing its job correctly.

Posted by adrian cuthbert at

From SharpReader:

"Error parsing RSS XML: There is an invalid character in the given encoding. Line 15, position 26."

Makes sense, but for the sake of collecting data I'll pretend I don't know why. ;P

Posted by Drew Marsh at

[SharpReader v0.9.3.2]

"Error parsing RSS XML: There is an invalid character in the given encoding. Line 15, position 26.
Please try to validate this feed(link). If this feed validates as correct RSS, you can send an error report(link)."

Posted by Grant at

[AmphetaDesk 0.93.1 OS X]

"AmphetaDesk could not determine the format of http:// intertwingly.net/stories/2004/01/12/broken.rss."

(I added the space in the URL so the link parser wouldn't get it)

Posted by Robert Sayre at

[Feedreader 2.5, build 610]: Parser error.

Posted by DeanG at

Radio Userland: no errors
Everything works, even the right quote. (which is interesting, because quotes often show up wrong in Radio, f.e. in Tim Bray's feed)

Posted by Sjoerd Visscher at

Userland Radio 8 on Windows XP parses the feed without throwing an error and displays the "smart quote" in its rendering without attempting to entify it. I'll pull it up in Radio on the Mac when I'm near my Mac tonight.

Posted by Michael S. Manley at

A parser that accepts that document is broken.

Posted by Norman Walsh at

Feed is correctly added in my web-based RSS aggregator called "MyBlogroll" ([link]). The title of the feed is "It's just broken", and I have one item entitled "Not so smart quotes" with "Postels Law" for the description.

Posted by Julien Julien at

Shrook 1.33: feed entry goes grey (the Colour of the Broken Feed)

Posted by Mark Nottingham at

[Newz Crawler 1.6.1 beta]

Subscribes OK, renders the quote wrong.

[FeedDemon 1.0]

Subscribes OK, renders OK

[Bloglines in IE6]

Subscribes OK, renders a question mark at the location of the quote

Posted by Werner Vogels at

NewsMonster  1.2.2 on Netscape 7.1  subscribes fine,  but the "smart quote" displays as a "?"

Posted by Nick Chalko at

RE: Scientific Method

RSS Bandit: can subscribe, then reports same error as SharpReader: invalid XML

Message from TorstenR

at

[AmphetaDesk 0.93.1] on Win2000

AmphetaDesk could not determine the format of

http://intertwingly.net/stories/2004/01/12/broken.rss

.

Posted by Bill Stoddard at

I just got the list of recently updated blogs from blo.gs/changes.xml and validated all the feeds listed.  Filtering out bad data in the list itself (feeds that weren't actually feeds, or that could not be retrieved), there were a total of 1181 feeds tested, of which 92 were invalid.  31 were invalid because of non-well-formed XML.  Keep in mind these are actively maintained sites (they've updated in the past few hours).

Then I checked the Technorati Top 100.  Of 100 sites, 35 have feeds listed, of which 4 were invalid.  1 was invalid because of non-well-formedness.

Now, we all draw our lines in the sand in different places, and that's fine, but I'm thinking that any aggregator that can't even handle the 35 most popular feeds in the world is broken, regardless of what Norman Walsh or the XML specification says.

Posted by Mark at

RE: Scientific Method

Mark, You state that only 1 of the top 100 feeds was invalid due to non-well formedness so why due you quote the number 35 then mention the XML specification or Norman Walsh for that matter? PS: Isn't this offtopic for this thread? I believe Sam specifically stated that folks should "refrain from rationalizations, justifications, and explanations, and focus for the moment on simply gathering data".

Message from Dare Obasanjo

at

Dare, only 35 of the Technorati Top 100 sites have feeds listed ( [link] ), thus the sample size is 35, not 100.  I mention Norman Walsh because he commented in this thread that "a parser that accepts that document is broken".  We are having a semantic disagreement over the definition of the word "broken".  He believes it should be used to mean "not conformant to the specification", and I believe it should be used to mean "functions in the best interests of the vendor's paying customers".

Posted by Mark at

feedvalidator.org reports an XML well-formedness error and immediately stops parsing.  [link]

Userland's validator reports no errors.  [link]

Posted by Mark at

I'd have expected Shrook to have shown it, but it doesn't, which I've just fixed.  It can only cope with malformed Unicode though, not malformed XML.

Posted by Graham at

The Syndication Subscription Service [link] flags it as not well-formed: [link] .

Posted by Morten Frederiksen at

Fatal Error: Is HTTP missing a verb?

There's been a rather widespread debate about what Atom aggregators should do in the case of invalid feeds. The two...... [more]

Trackback from franklinmint.fm

at

Awasu rejects it:

XML parse failed (4:L15/C25): not well-formed (invalid token)

Posted by Taka at

My ultra orthodox feed reader (which is just an XQuery in Saxon) reports an "Unconvertable UTF-8 character . . ." error, and fails to parse the document.

Posted by Jay Fienberg at

Radio on the Mac (running in Safari) also throws no parsing error on this feed, but the display in Safari is odd (Postel[base ']s law) and the "smart quote" does get entified as ’ when slurped into the weblog posting function.

What parts of this are done by Radio and what are done by Safari, I couldn't tell you.

Posted by Michael S. Manley at

BlogExpress processes and displays the feed like a regular one.

Posted by Sérgio Nunes at

offtopic: I get a CGI error trying to use 'é' on the name field of the comments.

Posted by Se'rgio Nunes at

Thanks, Sérgio... fixed!

Posted by Sam Ruby at

To my surprise, IdeaGraph did accept the feed, but didn't give a good rendering of the character. I expected Xalan and/or Xerces to balk.

Screenshot (from an old build - I'm currently reworking a lot including a tag soup RSS front end, Atom support will follow the spec, naturally):

[link]

Nice spellchecking, btw.

Posted by Danny at

Mark - if 4 of the top 35 feeds don't conform to the spec, it suggests the spec is broken, not the readers.

Posted by Danny at

Atom, the well-formed format

The issue of quality has been on the mailing list a lot recently, along with many blog posts referring to Postel's Law. Although there are doubters, the consensus seems to be that insisting on feed producers sticking to the spec,......

Excerpt from Finally Atom at

Version 2.7.2 of my feed parser, released today, will by default refuse to parse this feed.  It does a first-pass check for wellformedness, and when that fails it sets the 'bozo' bit in the result to 1 and immediately terminates.  You can revert to the previous behavior by passing disableWellFormedCheck=1, but it will print arrogant warning messages to stderr to the effect that anyone who can't create a well-formed XML feed is a bozo and an incompetent fool.

[link]

Posted by Mark at

Atom et XML

Les auteurs respectifs de NetNewsWire et de FeedDemon ont annoncés cette semaine que leur support d’Atom serait XMLement strict. Cela veut dire que les fils Atom devront nécessairement être valides XML pour être lu dans ces 2 aggrégateurs et que...

Excerpt from Znarf Infos - le carnet web at

MyHeadlines works.

Sample:

[link]

Cheers,
Mike

Posted by Mike Agar at

Could you do this again but in reverse?
Save the XML file as UTF-8 and say it is ISO-8859-1?

Posted by Sjoerd Visscher at

Is a XML parser that accepts non well-formed XML broken?

Sam has been doing some experimenting w/ non well-formed XML. The result is that most parsers actually accept his invalid XML document. Are these parsers broken? The answer to the question is obviously yes! These products were found...

Excerpt from iBLOGthere4iM at

Mozilla’s behavior is bug 174351.

Posted by Henri Sivonen at

Is my weblog well formed?

Can I ever be sure?... [more]

Trackback from Sam Ruby

at

If people won't go to the validator

I think there are a lot of people who are willing to do a little bit to improve feed quality, if it's not too hard and they can do it from where they are already.... [more]

Trackback from dive into mark

at

eRONA - [link] - or better Magpie - [link] - what is the parser behind eRONA, parses it. The questionmark gets translates into a ', that's it.

Posted by Sascha at

Really Simple Standards

RSS (RDF Site Summary or Really Simple Syndication) has come a long way, up to RSS 3.0 now (which is no longer XML -- after all, since when has XML been simple?). I'd like to see Atom catch on as a syndication format, and as is being pointed out,...

Excerpt from amphiskios.net at

Personality test

Universal Feed Parser 3.0 beta 19 is out. (149 words)...

Excerpt from dive into mark at

Norman Walsh. "A parser that accepts that feed is broken" - Absolutely.

But we're not talking about parsers here, we're talking about RSS reader applications. So:-

Julian Bond. "A feed reader that fails to display the feed, and fails also to inform the user that the feed is broken, is broken".

This form of error is so common and so trivial to handle that failing to display the data (perhaps with a ?) is being unnecessarily awkward to the user. But we should still be warning the user, so they can warn the feed owner, that there was a problem.

Posted by Julian Bond at

Sam Ruby's test feed did not fail anymore..., but depends

... [more]

Trackback from torsten's .NET blog

at

Sam Ruby's test feed

Referring to this post: [link].html, Sauce Reader parse the feed with no problem....

Excerpt from Sauce-Dev at

Add your comment