Another context that this comes up in is the Feed Validator. Originally, the Feed Validator employed a mix of strategies, but over time, I’ve been converting each and every one over to a white list strategy.
Why? I can’t begin to list all the possible misspellings of isPermaLink, nor all of the possible places where itunes:category can not be placed.
This does mean that from time to time, people will notice that false positives do occur. All I can say when such happens is that I’m sorry, and I will try to be responsive when people provide specific use cases. But even then, every effort will be made to simply whitelist just those use cases and nothing more. A relatively recent example involved the use of specific rdf elements in the context of an Atom feed. This came up in September, and again this month.
First time I’ve seen RDF used in Atom like this, although it’s very much the kind of thing I personally had in mind when Atom extensions were under discussion (for my own stuff I wound up thinking more about RDF as Atom content, or just using GRDDL on the Atom doc as a whole).
Anyhow, quick question: the construct here looks a bit unusual:
When we’re talking about extension elements, I’d strongly recommend making this a warning instead of an error, call it “Questionable Use of Extension Element” with the following description:
Your feed contains an element that this validator does not recognize in this context... This may just be a typo. Element names are case-sensitive; make sure you’re using the right case. For example, pubDate has a capital “D”... This may simply be a case of an element being placed in the wrong context or apparently being used for purposes other than what it was originally intended. For example: itunes:category can only be placed inside of the channel element, it does not belong inside an item
(note that this does not cover trying to validate something that is not a feed, which the current error covers)
James, we’ve had this discussion before, and undoubtedly we will have it again.
From my point of view, every element in an Atom feed is in a namespace, and the use and possible position of that element is defined by the appropriate spec. And while, no, I can’t rule out a future RFC 4685bis defining a “thr:when” attribute, I can say that the current one doesn’t, and comfortably flag any such attributes as an error. Even though it is an extension attribute.
If/when such a RFC 4685bis does materialize, the appropriate additions to the whitelist will be made.
If you review the feed validator mailing list, you will see that not everybody who participates has the same level of understanding of the issues as you do. An all too common question is “how do I fix a 404 error?”. To explain to such a person that absolutely nothing in the atom namespace is technically an error as the authors of the spec reserve the right to define future additions to this namespace doesn’t do anybody any good. No matter how “strongly” or how often you have made this recommendation to me.
For that matter, an RFC 4287bis could very well define additional root elements. Whee! Everything is valid. Nobody goes home without a trophy. Not.
Nor do I want to stop flagging the improper placement of itunes:category elements (probably the most significant contributor to what still is the fifth most common feed validator reported error) simply because you “strongly” recommended it.
I will again encourage you to find specific, and real world, usages which the feed validator flags inappropriately. Last night, I added a check to cover your absurd soap body example. I’m thankful that I have a large test suite as my first attempt to fix this would have caused a message to be produced on virtually every RSS 1.0 feed.
The point of the soap:body example was to demonstrate an single point: the validity of a RFC4287 document does not depend on the currently correct usage of extension elements. If you wish to validate extension elements in addition to the feed, flag errors relating to those extensions separately from validation errors relating to the feed. I’m certain there are simple ways to tell your users that while the feed itself may be valid, the particular use of any given extension may not be valid or is, at the very least, questionable. As it stands now, the current validator output causes confusion by leading users to believe that perfectly valid feeds are not valid due simply to the fact that extension elements are being used in a way the specs allow but the validator developers had not anticipated.
The other example I gave in my soap:body post demonstrates a specific real world use case — using the app:categories element within an atom:feed to point to an Atompub Categories document. The element is used correctly, and it appears in a location that is perfectly valid according to RFC4287 and RFC5023 does not forbid that element from being reused in other contexts. The validator should not be flagging it’s use within atom:feed as an error. It’s really no different than reusing the atom:link element in other contexts (e.g. an RSS feed). Flagging such things as errors will prematurely restrict the adoption of potentially useful serendipitous reuse of existing extensions.
Arguing that a dc:date that is encoded as RFC 822 should not make an RSS 1.0 feed be marked as invalid because that particular feed format chose to focus on a small core and a rich ecosystem of extensions kinda misses the point. This particular check was in the feed validator from the first day it was deployed.
When people point out real world, non fabricated, not hypothetical, and deployed use cases, those specific use cases are quickly accommodated. I can point to several documents which describe how to use atom:link in the context of RSS 2.0, and to a considerably larger body of existing feeds, and even to a few consumers that actually take this information into account.
If you can do the same for app:categories, then fine. But bringing up an absurd usage of soap:Body doesn’t actually help your case.
There are several thousand Atom feed documents currently published on IBM’s intranet that contain app:categories elements. Does that count as "real-world, non fabricated, not hypothetical and deployed"?
If an extension element is being used incorrectly based on some standard definition or well documented and commonly understood best practice, then by all means signal an error, If, however, you cannot point to any spec text or documented best practice anywhere that says that a particular, unexpected use of any given element is clearly wrong, then the validator should not signal an error; a warning is fine, but it’s certainly not an error.
There are several thousand Atom feed documents currently published on IBM’s intranet that contain app:categories elements. Does that count as "real-world, non fabricated, not hypothetical and deployed"?
Perhaps. The first I heard of it was when I was forwarded an email by you that included someone who was saying that that usage seemed to be somewhat outside of what the spec intended, and you alone defending your rather unusual perspective. When my initial response was to agree with the person who read the spec differently than you did, the next thing I knew you went public with an absurd example.
If an extension element is being used incorrectly based on some standard definition or well documented and commonly understood best practice, then by all means signal an error, If, however, you cannot point to any spec text or documented best practice anywhere that says that a particular, unexpected use of any given element is clearly wrong, then the validator should not signal an error; a warning is fine, but it’s certainly not an error.
Please read the body of this post. I readily will agree that employing the use of white lists involves a tradeoff. One that on balance trades off a few false positives for a more robust approach to validation. I do not agree with your premise that all that should be thrown away lightly.
When my initial response was to agree that the person who read the spec differently than you did, the next thing I knew you went public with an absurd example.
You make that sound like a bad thing. The reason for posting publicly was to see if others felt the same way about it... that is, I specifically wanted to solicit the opinion of a broader community.
The absurd example was used solely to demonstrate the point that the validity of a feed per RFC4287 has absolutely nothing to do with the definition or utility of the extension elements I may choose to to include in my feed. The feed may be silly and useless, but it’s still valid. If the validator chooses to support the validation of certain extensions, the warnings and errors related to those should be kept separate from the warnings and errors relating to the validity of the feed.
Please read the body of this post. I readily will agree that employing the use of white lists involves a tradeoff. One that on balance trades off a few false positives for a more robust approach to validation. I do not agree with your premise that all that should be thrown away lightly.
No one is saying that anything should be thrown away. What I’m saying is that a number of questionably-defensible error conditions should be changed to warnings. What I’m saying is that if you’re going to presume to validate a feed based on specs and documented best practices, there ought to be actual spec language and documented best practices to back it up.
Aristotle: That’s not what I said. The Atom namespace is well documented and does not include an element called “atom:LiNK”. That would be an obvious error that is well supported by existing spec language. What I am saying is that it should not be an error to use known elements (e.g. app:categories) in new and undocumented ways when the specification of those elements does not explicitly rule out such use.
For that matter, an RFC 4287bis could very well define additional root elements. Whee! Everything is valid. Nobody goes home without a trophy. Not.
That reminds me, I need to dig up and dust off the source to my OPML validator. It was... concise. Didn’t have a cool trophy icon, though. Sounds like a good LazyWeb project.
I linked to it - how much clearer could I be? Considering your name is listed in the copyright at the bottom of the page, I would have thought you had some idea of its existence.
And your point is? I’ve been very consistent about what I think is and isn’t valid and how I believe the validator should be handling extensions. In each case where this has come up, I’ve been able to demonstrate a real use case and back it up with spec text or precedent. Where is the spec text that says it is invalid to use app:categories as an extension element within an atom:feed element (I’d even settle for a documented best practice).
I am trying to understand why it would be unreasonable for the validator to issue a warning rather than an error when a known extension element is used in any context other than what it was originally intended when the specs for those extensions do not implicitly or explicitly rule out such use.
It would be helpful to this discussion if you could demonstrate that you understand the concept of a whitelist
Heh... it would be more helpful if you’d just answer the question.
Perhaps if I asked the question differently it would help: Regardless of the method you are using to validate feeds, why is it ok for the feed validator to say that a perfectly valid feed is invalid simply because you do not agree with how a particular extension element is being used?
Regardless of the method you are using to validate feeds, why is it ok for the feed validator to say that a perfectly valid feed is invalid simply because you do not agree with how a particular extension element is being used?
Sigh. Look up the definition of false positive. I’ve used it several times.
The very same line of code that produces valuable feedback on misplaced itunes categories also provides incorrect feedback sometimes. A whitelist of elements and locations fixes those specific problems. An answer of all elements everywhere misses the point, as it suppresses valuable feedback. Trying to patch that approach with a blacklist is a dead end, as it requires you to enumerate all possible misspellings. The feed validator has a lot of test cases, but nowhere near enough to support such an approach.
The very same line of code that produces valuable feedback on misplaced itunes categories also provides incorrect feedback sometimes. A whitelist of elements and locations fixes those specific problems. An answer of all elements everywhere misses the point, as it suppresses valuable feedback. Trying to patch that approach with a blacklist is a dead end, as it requires you to enumerate all possible misspellings. The feed validator has a lot of test cases, but nowhere near enough to support such an approach.
It seems to me that when a known extension element is used in an unexpected yet valid way, a warning would be more appropriate than an error in that it provides valuable feedback AND avoids the false positive.
The very same line of code that produces valuable feedback on misplaced itunes categories also provides incorrect feedback sometimes.
I think the point JamesS is trying to make, is that when there’s a chance of incorrect feedback, the feed validator should be giving a warning rather than an error. I get where he’s coming from, and probably have argued that myself in the past, but I’ve come around to your way of thinking. Especially with the warning having been toned down to a recommendation (which I think is good), something like a misplaced itunes category is too important to just be warned. And, as you say, when there’s a false positive (from a real world use case) you can whitelist it.
JamesH: iTunes is very specific on where in the feed it’s elements can be used and the meaning/usefulness of those tags within a feed is well-established. I would fully expect that any validator that claims to comprehend the itunes namespace would signal an error when dealing with an out of place itunes:category element. However, RFC5023 is not as explicit about where the app:categories element can be used. It defines two locations where it is meaningful within the context of Atom Service Documents and says absolutely nothing about it’s use in RFC4287 documents; The most a validator claiming to comprehend the Atompub namespace can reasonably do is signal a warning when the app:categories element is used within an atom:feed.
FWIW, here’s another example that <i>is</is> based on a “real-world, non fabricated, not hypothetical and deployed” use case. Specifically, the Lotus Connections Activities component implements the notion of a “Collection of Collections”. Within the top level collection, each entry represents a sub-collection and contains a corresponding app:collection element.
And another... this one also based on a “real-world, non fabricated, not hypothetical” approach we are currently exploring as a solution to the problem that Atom Service Documents have no means of unique identifying Atompub collections or differentiating between different kinds of Atompub collections (e.g. a service document may have one workspace with a collection used to manage blog instances along with one workspace per individual blog instance). Unfortunately, the FeedValidator’s whitelist is incorrectly claiming that this solution is invalid.
Oh, and as a side note: it appears that the validator may be having problems on service documents served with the proper application/atomsvc+xml media type. When I attempt to serve up the document using the proper media type, the validator complains that it can’t locate the file.
JamesS: I can’t really comment on your specific case since I know almost nothing about atompub elements and how they’re being used (or are intended to be used) - I stopped following atompub some time ago. I’m just saying that, in general, I think it’s perfectly reasonable for the feed validator to mark everything as invalid that doesn’t have a known valid use case. That’s the whole point of whitelisting. When in doubt, assume the worst - and in this case the worst means invalid.
If you want to argue that your particular usage should be added to the whitelist, that’s a different issue (which as I say I can’t comment on).
Sam: I just noticed that the fragment part of the urls in your comment feed all have an extraneous “.0” on the end at the moment. Looking at past comments, it seems to have started sometime around November 6th.
Why is a validator is expected to proclaim valid extensions it does not support? Isn’t acting as a white list what a validator is all about?
Of course, one might argue that people get scared of extensions if they are proclaimed invalid. But wouldn’t it be better if writers of extensions specs (we want there to be specs for them, right?) coordinated with validator developers to permit the extensions from Day One of each extension?
That is, isn’t better to make the process of amending the white list as low-barrier as possible instead of letting extensions pass by putting them into the unchecked space of a black list?
FWIW, the way I try to tackle the issue in Validator.nu is allowing user-provided schemas, so users can use their extended copies of schemas. This way the validator doesn’t let typos go unnoticed but allows users who know they are doing punch holes they want.
Thanks. It is worth nothing that it took quite a lengthy period of time for me to come around to that way of thinking. Time looking at a lot of feeds, both buggy and non-buggy. Time listening to the questions, comments, and complaints that have shown up on the feed validator mailing list, and on other feed-related lists (like rss-public).
here’s another example ... and another
Both test cases have been added (entry-with-collection, service-with-id) and fixes have been made and deployed. The fixes included not only adding these elements, but also adding supporting infrastructure and fixing latent bugs in the feed validator itself necessary to make this work. I’m not suggesting that it was hard, just that it wasn’t automatic or free.
it appears that the validator may be having problems on service documents served with the proper application/atomsvc+xml media type.
Fixed. Thanks!
I just noticed that the fragment part of the urls in your comment feed all have an extraneous “.0” on the end at the moment.
Fixed. Thanks!
And why would a “Questionable use of a Known Extension” warning not also be appropriate?
While the current feedvalidator source does have a list of known namespaces, it does not have a centralized list of known elements. More importantly, most elements have a lot of implicit semantics, and that requires some code. A few examples from just the two test cases you provided: I’m assuming that a workspace can have multiple categories but can only have one id. And that categories can have a term attribute but an id can not. And that ids can’t be duplicated in a service document (whereas they can in an atom feed). Hardcoded knowledge such as this allows the feedvalidator to produce error messages such as the this one.
Normally, the above can be discussed and consensus can be reached before deployed in the feed validator. Ideally, this discussion would take place in a mailing list like atom-syntax where others may participate. It boggles my mind that Lotus Connections has deployed (at least internally) without that discussion taking place.
In any case, my point here is that the checks that the feed validator makes are context dependent. Atom elements in Atom feeds have a certain meaning. Atom elements in an RSS feed tend to enforce less semantics. Atom elements in other contexts may have more or less semantics, and the only way to determine that would be to look at actual use cases.
Henri: Why is a validator is expected to proclaim valid extensions it does not support
Never said it was.
Sam: "More importantly, most elements have a lot of implicit semantics, and that requires some code. A few examples from just the two test cases you provided: I’m assuming that a workspace can have multiple categories but can only have one id. And that categories can have a term attribute but an id can not. And that ids can’t be duplicated in a service document (whereas they can in an atom feed). Hardcoded knowledge such as this..."
While I appreciate you fixing the validator to address the false positives I posted, I never asked or suggested that the validator needs to be able to validate the use of elements like atom:id when they’re unexpectedly used as extensions in other contexts. In the absence of clear public documentation, the most the validator should be doing is returning a “Questionable Use” warning.
Sam: "Ideally, this discussion would take place in a mailing list like atom-syntax where others may participate. It boggles my mind that Lotus Connections has deployed (at least internally) without that discussion taking place."
I wasn’t aware that implementors had to get prior approval from the mailing list to deploy new solutions; I had assumed that conforming to the relevant specifications and documented best practices would be enough but, hey, I guess not. And, FWIW, I’ve discussed the use of atom:id’s in service documents and app:collection’s in entries on several occasions on the atompub mailing list.
Sam: "In any case, my point here is that the checks that the feed validator makes are context dependent. Atom elements in Atom feeds have a certain meaning. Atom elements in an RSS feed tend to enforce less semantics. Atom elements in other contexts may have more or less semantics, and the only way to determine that would be to look at actual use cases."
Once again I have to ask: why would a “Questionable Use” warning not be appropriate? Given a lack of context, when you encounter a known element in an unexpected location and you have no idea why it is there, the only test you can reasonably fall back on is a) whether the container element allows it to be there and b) whether the definition of the element explicitly rules out such use. If it passes either of those checks, issue a “Questionable Use” warning and move on to the next item. From an implementation point of view, I cannot see how issuing such a warning would be difficult to do.
Henri: Why is a validator is expected to proclaim valid extensions it does not support
Never said it was.
I disagree.
James Snell: RFC4287 is very clear about the fact that any namespaced elements are allowed as extensions within atom:feed. So is the Feed Validator right to mark the feed invalid? I don’t think so. (emphasis added)
I guess that’s easier than answering the question.
I count at least seven attempts to answer that question in the text above. Adding an eight (or ninth? or tenth? I lost count) seems pointless.
I now realize that my sentence was ambiguous. It was not meant as an imperative, but merely as a statement or an observation. Your lapsing into sarcasm and injecting sentiments that weren’t expressed merely for the emotional impact that such would create is somewhat less than constructive. There clearly is no discussion going on here, what there is is a soliloquy.
Here’s an offer. I will mark these specific additions as questionable if that is what you wish. Not because they are “unrecognized” and certainly I will not downgrade the error of placing itunes:category to warning as you originally suggested, but because these usages are recognized as questionable.
The key lesson from this discussion appears to be this: The FeedValidator will, on occasion, indicate that perfectly valid feeds are invalid for no reason that can be explained by looking at the relevant specs or documentation. Rather than fixing...
The following comment is awaiting moderation on James Snell’s blog:
I’ve thought about it overnight. There are vocabularies like SSE and iTunes that are very context specific. There are vocabularies like Dublin Core (and perhaps large portions of Atom) that are fairly generic.
A constructive way to contribute would be to suggest a list of element names that would be whitelisted for generic usage. When elements in this whitelist are found in unexpected locations a “Questionable Use of Extension Element” warning would be generated. Upon demonstration of a real and public use, coupled with even the most minimal amount of documentation, the warning on this specific usage would be eliminated. My intent continues to be to make this a low bar to encourage reuse, and the warning itself would reflect this.
Elements that are in completely unknown vocabularies will generate a different warning.
Elements that are in “known” vocabularies but are inappropriately included in or excluded from the whitelist would be treated as a simple FeedValidator bug.
The current list of known namespaces can be found here:
Not quite. What I’m complaining about is the fact that when I search for something on the weblog which isn’t there, I should be told that the search failed (and given a 404) rather than splashing up an unfriendly page with the message “The requested URL /blog/?q=foobarbaz was not found on this server”. The search interface is broken because it doesn’t handle the failure case properly. Compare a failing search here with a gracefully failing search.
As a post scriptum, sorry for not explaining what I meant more clearly originally. I was a little surprised by the stark Apache 404 page and expected something that looked more like the rest of the site, with an explanation that the search found nothing.
Ah, but still, it’s a rather stark and unfriendly carefully crafted facsimile. :-)
Seriously though, why did you chose to do it that way rather than something somewhat more user-friendly? The first thing that popped into my head when I saw it was that I’d gone and done something wrong somehow.
Ah, but still, it’s a rather stark and unfriendly carefully crafted facsimile. :-)
Seriously though, why did you chose to do it that way rather than something somewhat more user-friendly? The first thing that popped into my head when I saw it was that I’d gone and done something wrong somehow.
why did you chose to do it that way rather than something somewhat more user-friendly?
Mostly because I was solving another problem at the time (I don’t recall which one, one of the problems I had was crawlers chasing archives to the ends of time — in both directions), and then simply reused the technique in another situation.
I also have a tendency to think of the features on my weblog as only things I would use.