It’s just data

Sanitized For Your Protection

Jacques Distler: So I sat down and wrote a sanitization function for Instiki based, in part, on Sam’s Python code. ... It comes with unit tests, lots of unit tests.

At some point, I’ll likely backport this version to Python.  I have an unfinished branch of Venus where all sgmlib processing is replaced with html5lib, and sanitation is done after that.

I would very much like to see a more robust (html5lib) error-correcting parser in the Ruby version.

This is not a pressing issue for me, since, in my case, the parser is consuming well-formed XHTML previously serialized by REXML. But, for more general use, it needs to be able to sanitize arbitrary tag-soup input.

Also, more unit tests would be most welcome.

Posted by Jacques Distler at

P.S.: Just as a stylistic matter, you might want to

1. Remove the “1” from your quote. In the original, it was a superscript link to a footnote.
2. Link to the blog entry, rather than to the main page of the blog. Entries eventually fade from the main page (though, in my case, perhaps they don’t fade fast enough).

Posted by Jacques Distler at

Jacques: Both points fixed.  Thanks!

Posted by Sam Ruby at


Since you’re making links, you might want to turn unit tests into a link, too. (Also, people might search in vain for XHTML::Node if we don’t tell them where to find it.)

Posted by Jacques Distler at

you might want to turn unit tests into a link, too


Posted by Sam Ruby at

HTML5 Sanitizer

A while back, I commented that I would likely backport Jacques’s sanitizer to Python.  I still haven’t gotten around to that, but I have ported it to html5lib (source, tests). My approach was slightly different. [more]

Trackback from Sam Ruby


Add your comment