intertwingly

It’s just data

Validate on subscription?


I've thought about Brent's proposed compromise, and to borrow a phrase that is a favorite of Tim Bray, I think that there is a way that 80% of the value can be obtained with 20% of the effort.  Is there really a market requirement to be selectively pedantic on a feed by feed basis?

It seems to me that there are two levels of errors.  Unrecoverable, and recoverable.  A HTTP status code of 404 is something that the aggregator can not work around.  On the other hand, a malformed date may marginally reduce the user's experience, but arguably should not prevent the user from seeing what other data can be salvaged from the feed.

Unrecoverable errors, by necessity, needs to be handled each time a feed is retrieved, but do recoverable errors need to be reported each time such an error is encountered?  I mean, do thousands of people need to be alerted whenever a stray smart quote appears on boingboing?

From strictly an engineering point of view, is that the right design for a feedback loop?  My experience is that, in addition to alerting the wrong person, an overabundance of such alerts tends to dull the message.  People simply will tune them out.

An alternative might be to only validate on subscription.  This would certainly reduce the number of such messages.  The also would present such messages to users at a time when they might expect feedback.

I would also suggest that all such messages be oriented to their target audience.  If a feed contains encoding errors, let the user know that some characters may not appear as intended.  If the feed is missing a required element, tell the user what they will be missing.  If a date is not of the appropriate format, let the user know that such information may be misinterpreted or ignored.

This information could be accompanied by a simple checkbox to inhibit the display of further messages.

Hopefully, such an approach will ultimately result in a more educated consumer base.  A greater demand for higher quality feeds would certainly not be an unwelcome side effect.  It also means that feeds would be sampled regularly.

Parting thought: in my opinion, such checks don't have to be bullet proof, merely effective.  Apply the 80/20 rule here too.  The well formedness checks provided by your off the shelf parser can generally be obtained with a few lines of code.  Ditto for a simple scan for required elements.  I can share the regular expressions used by the feedvalidator.

However, I do have one suggestion.  I would suggest that this not lead to a practice whereby each consumer documents what subset or superset of the various specifications they support at the moment.  It would be better for all concerned if such checks are made, and errors are reported, in terms of the original specifications.