intertwingly

It’s just data

Happy Birthday, Feed Validator


The Feed Validator has been giving advice for five years as of today.  From a modest beginning of 300 test cases, there now are over two thousand.

My favorite post on this topic during these past five years is Common Feed Errors.  Time to revisit.

Missing atom:link with rel="self" (3072)
This is a relatively new recommendation from the RSS Advisory Board.  One thing I don’t remember sharing before is that typically when I do these checks, I find that fully one out of three feeds have already fixed these messages by the time I can recheck the feed myself.  This message appears to be no exception.  In addition, it already is fixed in WordPress HEAD.  Needless to say, I expect the frequency of this message to go down quickly.
XML Parsing error: syntax error (901)
This is the glaring exception to the one out of three rule I mentioned above.  XML Errors in Feeds still is the most systematic analysis of such errors that I am aware of to date.  It would be nice if that study were to be updated.
Email address is missing real name (822)
Another new recommendation.  Again, this should work itself out over time.  Adding either your real name, or a recognizable pseudonym, should increase usability with a number of feed aggregators.
item should contain a guid element (634)
This is not a new recommendation.  From the original RSS 2.0 spec:
In all cases, it’s recommended that you provide the guid, and if possible make it a permalink.
Undefined parent element: child (574)
This message covers two separate symptoms: typos and people not knowing about RSS 2.0’s support for namespaces.  While this issue is considerably less troublesome than non-well-formed feeds, what is a concern is that this many years after the RSS 2.0 spec was released, this problem is as prevalent as it is.
element must be an RFC-822 date-time (500)
This continues to be the most problematic date format ever.  I’m pleased to see that extensions such as SSE have moved away from it.
Feeds should not be served with the type/subtype media type (479)
Misconfigured servers, often serving feeds as either text/html or text/plain, have regretfully lead browser vendors and even spec writers to conclude that content sniffing is a necessity.
Your feed appears to be encoded as “this”, but your server is reporting “that (376)
Another way in which servers are commonly misconfigured: the use of text/xml in ways that don’t comply with RFC 3023.
HTTP Error (381)
It is clear that not everybody has mastered even the most basic concepts of the internet, many still need a bit of help.  Don’t laugh, undoubtedly there are areas where you aren’t an expert.  Now look again at that count.  That many people needed additional help when the Feed Validator said that their feed was not found, or that there is a server error.  In the past week alone.
Image title doesn’t match channel title (278)
Another, relatively new, recommendation.
Invalid email address (274)
In general, this means that people are incorrectly using RSS 2.0 core elements when the Dublin Core extension is what they really want.
Self reference doesn’t match document location (214)
Sometimes this simply means that there are multiple URIs which can be used to fetch a feed (example http://www.example.com/… vs http://example.com/…), but in other cases there is a real problem.
element should not contain script tag (172)
Most well-maintained aggregators these days strip scripts from incoming feeds, so if you include such things in your feeds with the expectation that users will see the effects, you will often be disappointed.  Unfortunately, this often affects embedded YouTube videos.
Invalid HTML (166)
While HTML grammar rules are fairly lax (especially when compared against XML), there actually are some rules.  While browsers routinely deal with common variations (at times, with minor differences), the more important consideration is that a simple unmatched quote may confuse the code that scans your markup for security risks.  This can lead to users seeing widely divergent, often severely stripped, output.
element should not contain script attribute (150)
Same basic issue, but in this case dealing with attributes like onclick.
UnicodeError: decoding error, invalid data (146)
This is a common enough subclass of well-formedness errors that it merits its own message.  And, yes, that means that this count really should be added to the SAX Error count above.  Most commonly this error occurs when people write code that essentially does a bit-for-bit copy of data from a webpage (which defaults to iso-8859-1 encoding), to an XML feed (which defaults to utf-8).
Invalid URI character (93)
Most commonly, a space character.
Undefined named entity (86)
This is yet still another common enough well-formedness error to merit its own message.    and — are not predefined in XML.
The XML encoding does not appear to match the characters used (83)
This is a variation on Unicode Errors.  In this case what you have is an incorrect encoding, but one that technically is legal.  Like taking a data that is either utf-8 or win-1252 encoded and declaring it as iso-8859-1.  In some many cases, what you will see in a feed is incorrect numeric character references, like ’ when what is desired is a right single quote or ’.
Incorrect day of week (83)
All I can say is that the sheer frequency of this error flabbergasts me.  People even have been known to protest when they get this message.  Again, don’t laugh, one day it could be you.
Email address is not in recommended format (81)
Another new recommendation, but one that affects relatively few feeds.
Missing recommended iTunes parent element: child (75)
Itunes is optional, but if you add itunes elements you might as well follow the recommendations.
element should not contain HTML (75)
People still try to put escaped HTML in some of the darndest locations.  But I am pleased to report that this is down slightly from before.
Image link doesn’t match channel link (66)
Another long standing recommendation -- this one is down significantly from prior times.
element must be a full URI (65)
Also down significantly.