It’s just data

RSS Profile Up For Vote

Rogers Cadenhead: We propose that the board endorse and publish the RSS Profile, making it available under a Creative Commons Attribution-ShareAlike 2.0 license so that others can build upon and extend it with their own recommendations.

I’ve taken a first pass at what changes would be required for this profile to be supported by the Feed Validator.

In most case (background color: white), no changes will be required beyond verifying that the test cases are in place and some message texts may need to be updated.  When possible, I have tried to link to the actual message produced by the Feed Validator for comparison purposes.

In a number of places (background color: yellow), there are some new requirements that won’t be hard to implement.  In a few cases (background color: orange), I have a few questions.

In parallel with the vote, I will be prototyping a number of these changes (just the sections in yellow for now) on beta.feedvalidator.org.


Sam Ruby: RSS Profile Up For Vote

[link]...

Excerpt from del.icio.us/armadillo at

Disclaimer: speaking for myself, not the board, etc.

First let me just say that your attempt to translate the profile into feed validator tests has raised some issues which I believe are worth addressing in a future version of the profile. In particular I think we could do with fewer uppercase SHOULDs. However, I don’t believe there are any issues that would make me doubt the usefulness of the profile in its current form. “Perfect is the enemy of good enough”.

In reply to your orange questions:

4.1 Version 2.0 REQUIRED: No. Although the text for that could be worded better. I don’t believe the profile has the authority to deprecate older versions of RSS even if we wanted to.

4.1.1.4 Slash-delimited category: No, it is not intended that all category values SHOULD have a slash in them. It makes more sense if you read that statement as two separate sentences. 1. The value SHOULD be a slash-delimited string (that doesn’t mean a slash is required, just that slashes may be present and, if present, serve to delimit the string). 2. The value identifies a hierarchical position in a taxonomy (if there are slashes present). The text could certainly be clearer, but I think categories are a disaster in general and that section could do with a lot more advice on the subject.

4.1.1.7 docs should point to rssboard: Only if you can detect that the feed producer is relying on the specification and profile published by the board. And the easiest way to detect that is by checking the docs value. Circular reference. Validator asplode.

4.1.1.16.1 skipHours 24: I’d guess a warning, but the best person to ask would be the developer of NewzCrawler - that’s the only aggregator I’m aware of that does anything with the skipHours element. Or if someone had more enthusiasm than me they could run a quick test.

5.1.1 SHOULD include self link: “Eyebrowse” - nice typo. :) I think it’s a good (lowercase) recommendation although of limited benefit (and becoming less so with the proliferation of FF2 and IE7). So perhaps the SHOULD is too strong. As much as I’d like to see more feeds doing this, I don’t know if it’s good idea to force such a recommendation and I don’t know to what extent users will feel pressured and annoyed by a warning.

5.1.1 Core elements: This is referring specifically to the use of atom:link relations other than self. That’s not to say that a recommendation against FUNKY elements isn’t a good thing. But as you say, it needs an enumeration and IMO more research.

I think that’s all of the orange covered. I have comments on some of your yellow additions too, but that will have to wait for another day.

Posted by James Holderness at

James: thanks.  I see the above as a tepid endorsement of a warning on skipHours='24' and missing atom:link/@rel='self'; and a hold on the rest.  I’ll wait a few days, and barring other input I’ll proceed on just those two.

Initial implementation of all the yellow additions is now complete and deployed on beta.feedvalidator.org.  Please do post your comments on these items.

Posted by Sam Ruby at

Other than section 3.1 which we’ve already discussed, I think the only yellow section I have problem with is 3.3 (E-mail addresses). On my test feed, I’m seeing four error/warning messages:

The first one, while not new, I believe could do with less strictness if it is to match the profile more closely (which accepts any form of email address as valid). I’m seeing that error for an author like this:

<author>John Smith, jsmith@example.org</author>

which clearly does include an email address (just not a format that you might expect). I’m not exactly sure what your current checks look like, but I’d suggest something as simple as looking for an @ character (possibly also some basic checks of the surrounding characters).

The second message seems to be a catch-all for any address that doesn’t match the RECOMMENDED format, even those that do include a real name. The example above produces that warning and it clearly (to a human) includes a real name. I think you’d be better off with more generic text that just says something like: “author SHOULD be in the form username@hostname.tld (Real Name)”.

Also, I think this is another area where the profile might possibly be too strict in its usage of SHOULD (at least in respect to the real name part of that recommendation). At least one example in the RSS spec ommited the real name, and it’s also a fairly common choice for feeds in the wild. Bottom line: while I’d certainly like to encourage a real name, I wouldn’t want to push it if you got a lot of complaints for such a warning.

The warning about HTML seems reasonable, although in some cases it’s overkill if you’re already warning about not matching the RECOMMENDED format.

Finally, the warning about hexadecimal character references doesn’t realy apply here at all. I think we should have been more specific about which elements were being referred to in the Character Data section, because IMO address elements are really a special case. Users should not be getting that warning for somethinig like this:

<author>John Smith &lt;jsmith@example.org&gt;</author>

which is a common way to represent RFC 2822 style addresses.

Posted by James Holderness at

RSS Profile Up For Vote by znarf rss Copy | React (0) [link]...

Excerpt from Public marks from user znarf at

Other than section 3.1 which we’ve already discussed

You say this like I did not address your concerns there.

test case

Are there still ways in which this could be improved?

On my test feed, I’m seeing four error/warning messages

OK, I’ve reordered the checks so that you only get one message per email address, with the most specific and most common errors checked first.

test cases

Does this address your concerns?

Posted by Sam Ruby at

Other than section 3.1 which we’ve already discussed

You say this like I did not address your concerns there.

I haven’t actually looked at that section since the problems I reported on the mailing list, but I suspect I’d be happy with what you have for now. There may be areas where the tests could be improved, but I don’t think I have enough up-to-date data to make any more solid recommendations.

OK, I’ve reordered the checks so that you only get one message per email address, with the most specific and most common errors checked first.

I think that’s a lot better, but I would have expected the “Email address is not in recommended format” warning on the rfc2368 address. Also the help text for that message could use some work (I’m referring to the “solution” section - you’ll see what I mean when you read it).

And there are still some forms of email where I’m gettings the “author must include an email address” error, e.g.

<author>John Smith (mailto:jsmith@example.org)</author>

Then there’s this:

<author>mailto:John%20Smith%20%3Cjsmith%40example%2Eorg%3E?subject=some%20feedback</author>

I can understand you not recognising that as being an email address (having just recommended you use a simple check for @ characters), but why am I getting the “Encode "&” and “<” in plain text using hexadecimal character references" warning?

Posted by James Holderness at

OK, I’ve added a few test cases, reordered the logic slightly, and tweaked a few messages.  It’s not perfect, but is explainable.

For starters, I don’t recall hearing complaints about this, so I haven’t invested in improving it.

Now for some reason the advisory board has decided that mailto: is simultaneously now both valid and deprecated.  As I haven’t seen this in practice, I’m not all that much concerned.  So I invested the minimum in supporting this: if I see a mailto: address, I will strip the URI scheme and URL decode the rest and proceed on.  That means that if somebody does this and URI encodes the recommended format, parenthesis and all, they will escape detection.  At this point, I’m not loosing much sleep over that.

Now for the good stuff.  This recommendation will mostly affect people who have simple email addresses, so I want to optimize for that.  So I check for three characters: “@”, “)” and space (note: this is after URI decoding).  For some combinations of these marker characters missing, I can produce tailored messages.  For others, I simply fall back to the checking that was in place before I started making changes to support the profile.  That includes checking for HTML characters in plain text, but in most cases such conditions will be caught much earlier.

Posted by Sam Ruby at

Now for some reason the advisory board has decided that mailto: is simultaneously now both valid and deprecated.

Valid but not recommended.

The only requirement that the RSS 2.0 spec makes for fields like author is that the value be an email address; it doesn’t specify any particular form of email address. A mailto: address is most definitely a form of email address and thus is valid (as are many other formats not accepted by the feedvalidator).

However, since all the examples in the RSS 2.0 spec use an email address of the form username@hostname.tld (Real Name) - although sometimes without the Real Name - it was argued that could be considered an implicit recommendation for that particular format. Add to that the explicit recommendation in the RSS 0.91 spec, and it seemed like a good idea that the profile include that recommendation too.

Now most aggregators couldn’t care less what is included in the various address fields, since they just display the content as is (or outright ignore them). However, for the few that do attempt to parse email addresses, it certainly helps when feeds follow this recommendation.

I hope that explains the reasoning.

Posted by James Holderness at

However, for the few that do attempt to parse email addresses, it certainly helps when feeds follow this recommendation.

This set includes the Universal Feed Parser.

Posted by Sam Ruby at

RSSBus and Atom

Sam Ruby pointed out some of the changes that would be required for Feed Validator to support the new RSS Profile .  The RSS Profile is the result of checking popular feed reader capabilities in order to put together a “best practices” document for...

Excerpt from textBox1 at

I don’t recall hearing complaints about this,

Upon further review, that message tried to convey the most common cause for the problem.  As that cause now generates a different message, the original text is no longer likely to be helpful, so it has been replaced with a more generic message.  Still not perfect — it doesn’t tell you about other options which, while technically valid, will generate a warning, and yet does tell you about options which will also generate a warning.  So, not perfect, but still better.

Posted by Sam Ruby at

Validating the RSS Profile

It didn’t take long for the RSS profile to brew up some controversy in the RSS community. FeedValidator.org is now issuing some warnings where the RSS profile is providing additional guidance. Todd Cochrane , in his usual demeanor, wrote a...

Excerpt from The RSS Blog at

Add your comment