Whitelisting

2007-11-16T06:57:39Z

From time to time, the subject of whether to use whitelists or blacklists come up. As an example, originally when Mark Pilgrim wrote How To Consume RSS Safely (way back in 2003!), he described a list of elements that needed to be blacklisted, and mentioned — almost in passing — that whitelisting may be a reasonable alternative. Over time, Mark came to realize that there really isn’t any contest: A Whitelist is the best way to validate input. It basically comes down to a sense of what kind of errors you are willing to tolerate.

Another context that this comes up in is the Feed Validator. Originally, the Feed Validator employed a mix of strategies, but over time, I’ve been converting each and every one over to a white list strategy.

Why? I can’t begin to list all the possible misspellings of isPermaLink, nor all of the possible places where itunes:category can not be placed.

This does mean that from time to time, people will notice that false positives do occur. All I can say when such happens is that I’m sorry, and I will try to be responsive when people provide specific use cases. But even then, every effort will be made to simply whitelist just those use cases and nothing more. A relatively recent example involved the use of specific rdf elements in the context of an Atom feed. This came up in September, and again this month.