""" While there unquestionably are a lot of applications of XML for which strict, draconian, error handing is appropriate, there also are a number of use cases for which robust scavenging is required, as is evidenced by the popularity of libraries such as BeautifulSoup and the Universal Feed Parser. HTML5's grammar is a rich a blend of SGML (the common ancestor to both HTML and XML), XML, and custom parsing rules; these rules were arrived at by observing the effective consensus by browser vendors have converged on in the process of dealing with the enormous diversity of documents that exist on the internet; often produced by hand editing and copy/pasting templates. Much of that experience can directly benefit those that find themselves in need of recovering data from mal-formed XML at any cost, particularly for the XML documents which are produced using similar hand editing, copy/pasting, and templating techniques that are used to produce invalid HTML. Additionally, given the rough similarity between HTML and XML syntax, naive users will often copy things that happen to work in HTML into XML documents. Just be aware that in scavenge mode, some data will be interpreted in manners different than the author intended, as such intent can't be determined. Also be aware that some of the more advanced XML features that are less commonly used in hand-produced XML, like internal DTD subsets, are not supported by this process. For this reason, it is recommended that data first be parsed by a "real" XML parser and this logic only be used as a fallback. References: * http://googlereader.blogspot.com/2005/12/xml-errors-in-feeds.html * http://wiki.whatwg.org/wiki/HtmlVsXhtml @@TODO: * Build a Treebuilder that produces Python DOM objects: http://docs.python.org/lib/module-xml.dom.html * Produce SAX events based on the produced DOM. This is intended not to support streaming, but rather to support application level compatibility. * Optional namespace support * Special case the output of XHTML