It’s just data

No more "XML parsing failed" errors

Andreas Bovens: we’ve decided to stop throwing draconian XML parsing failed error messages, and instead, attempt to reparse the document automatically as HTML.

No more “XML parsing failed” errors

Note that the reason to do this is to deal with bad browser sniffing where sites send HTML/XHTML markup meant to be served as text/html as application/xhtml+xml, application/xml or text/xml only to Opera, which causes Opera to encounter an XML parse error that breaks the site for Opera.

However, when the parse error is encountered, Opera doesn’t do any detection/sniffing to see if the markup looks like html or xhtml before it does the reparsing. This means that even an XML document served as application/xml with a root element of “zipzambam” will be automatically reparsed as HTML if there’s a parse error. It might be cooler if Opera only did the reparsing when the type is application/xhtml+xml or if the root element is “html” etc. A spec for this part might be cool.

Also, there’s currently no option in Opera to turn this off to revert to the old behavior. And, there’s no support for any no-reparse type of http header a site can send to hint to Opera not to do this.

Ultimately, this is a good change and a good default. The only problem is the user has no way to control it. There’s no global option for this, there’s no option for this in a site preference and there’s no option to turn this off for local files.

However, this is Opera’s first try with this. I’m sure they can make it better. I’ve always wanted to try out automatic reparsing like this. Now I can.

(Note: This doesn’t affect XHR if you were wondering.)

Posted by Michael A. Puls II at

No more “XML parsing failed” errors

Also, sometimes, sites not only send Opera the wrong mime type, but they also send Opera different content. In those cases, reparsing as HTML doesn’t really help Opera get what other browsers get.

Posted by Michael A. Puls II at

No more “XML parsing failed” errors

@Michael: “And, there’s no support for any no-reparse type of http header a site can send to hint to Opera not to do this.”

Note that the content-type is already a contract from the web site owner telling the rest of the world, parse this as XML. Adding a new header on failure is just pushing the failure a bit further. It doesn’t solve anything. It will work for the first few months and then it will be implemented in libraries and then [clueless|clumsy|framework-tied] developers will send wrong headers.

The first nasty issue is bad user-agent sniffing.
The second is sending content-type based on user agent instead of the type of the content.

Sniffing the type of content in the browser add a heavy heuristics which doesn’t necessary work. You already mentioned the fact that Opera could be receiving a different representation of the resource. But there is also for example people who wants to serve application/xhtml+xml on a XHTML file which looks like an HTML file. XHTML5 is possible. And then the heuristics fails apart.

It is a very interesting issue in terms of contracts, usability in distributed environment with browsers having different market shares. Users will always think the browser is wrong.

As for the error message about reparsing. Note that the solution is now only deployed in an experimental build and still being worked on. For example, it breaks acid3 test. And that creates another set of thinking about what XML specs meant by stop processing (with an XML parser). There is no way to know on the server side if the client has changed its parsing engine. We could imagine a DOM flag DOM_PARSER_TYPE: [html|xml]. But then you would still need a round trip in between the server and the client. At least a script could locally check in the DOM which flag has been set.

Posted by karl at

No more “XML parsing failed” errors

“XHTML5 is possible. And then the heuristics fails apart.”

I meant that in this case, it’d always reparse on error, even if it is XHTML5. Meaning, if using XHTML5 and expecting an error page if there’s a parse error, you still don’t get one.

“Sniffing the type of content in the browser add a heavy heuristics which doesn’t necessary work.”

Correct. But, perhaps it could be just simple enough that it works for the sites Opera has problems with.

“As for the error message about reparsing. Note that the solution is now only deployed in an experimental build and still being worked on. For example, it breaks acid3 test.”

Yes. I’ve mentioned elsewhere that Opera could have a default site preference (in override_downloaded.ini like it does for other sites) to show the error page on acid3 instead of reparsing. Or, acid3 could be changed in some way. Even though it’s just a snapshot that has this, I think it makes sense to deal with use-cases fast so it doesn’t get stuck like this in a final build.

But, to be honest, the simplest solution is to just provide an option for advanced users to turn off the automatic reparsing.

“The first nasty issue is bad user-agent sniffing. The second is sending content-type based on user agent instead of the type of the content.”

True. But, when getting sites fixed fails, it’s time for mumbo jumbo parsing trickery.

Posted by Michael A. Puls II at

No more “XML parsing failed” errors

Advanced users have access to validators.  Admittedly that doesn’t solve the problem when servers vary their responses based on sniffing the user agent...

Most browsers these days have the ability to notify users of pop-ups that were not processed or plugins that are required — often done by a simple bar across the top that shows up via a blind down animation.  If it is felt that it is important to notify the user of the recovery action that was taken(and to be quite honest, I am not convinced that it is), perhaps something like this could be effective.

Posted by Sam Ruby at

No more “XML parsing failed” errors

Michael said:

Ultimately, this is a good change and a good default. The only problem is the user has no way to control it.

Welll, there is more than one problem with this - reparsing as HTML has at least three problems:

(1) A variant of your point about lack of user control, but still: It  introduces a gap between how the code is served and how it is consumed. Subsequently, when using using Dragonfly, one will have to be careful to check whether Dragonfly reports XML errors or HTML errors. Currently Dragonfly categorizes the XML error as an “other” error, wheras it in reality is the primary error ...

(2) An architectural problem is the fact that XML defaults to UTF-8, while HTML defaults to a locale default encoding - typically Windows-1252. Thus, in the worst cases, reparsing an XML doc as HTML could in fact make the page unreadable.</li>

(3) Yet another architectural issue is that XML does not have quirks-mode. And so, if the page contains a BOM that contradicts with the encoding info from HTTP, reparsing a page as HTML becasue it has an illegal BOM,  would lead to Quirks-Mode - like on this page: [link] And that is exactly how Opera’s reparsing works. Bad, in my book.

Other browsers are also dealing with these issues: When the BOM contradicts with HTTP,  then Webkit and IE ignore the encoding info coming from HTTP. This is a behaviour that - from these browsers’s perspective - probably increases the number of well-formed XML pages.  One coudl claim that they should rather have displayed a parsing error. And if Webkit and IE would give preference to the BOM over the HTTP when reparsing, then this would probably be problem free for the user since it would lead standards-mode parsing. But if  Webkit/IE woudl give heed to HTTP over BOM, then this would lead to quirks-mode — like it currently does in Opera. (FIrefox does neither reparse or forgive an illegal BOM - so in Firefox’s case, it just displays YSOD.)

Opera has already tried to circumevent the BOM/quirks-mode problem, by - against HTML5 - doing some BOM removal so that a HTML page can be in standards mode even if it has an illegal BOM.  Would it not be more meaningul to give heed to encoding (read: the BOM) first and foremost — perhaps as part of this new reparsing behaviour?

Opera’s BOM removal is just weired - take this page: [link] It s served as ISO-8859-1, but in reality it is UTF-8 encoded and with a BOM. Thus, Webkit/IE “lock” it into UTF-8 and thus lands in quirks-mode. Firefox 7, however, adheres to HTML5 and to HTTP, and thus brings it into quirks-mode and in windows-1252 encoding - this is all logical, even if I like Webkit/IE’s behavior more. But what does Opera, including the alpha/beta of Opera 12? It removes the BOM — and thus lands in no-quirks mode.

Karl said:

It is a very interesting issue in terms of contracts, usability in distributed environment with browsers having different market shares. Users will always think the browser is wrong.

Yeah, thats probably often true. But that doesn’t need to imply that we should feel sorry for the browser. Too much self-pity can lead too much pathching .. Which in triggers more patching.  I used to think that Opera’s HTML-reparsing was a good thing. But when I discovered the patchy approach described above, I became critical to its current implementation.

Posted by Leif Halvard Silli at

No more “XML parsing failed” errors

Err, I said:

Thus, Webkit/IE “lock” it into UTF-8 and thus lands in quirks-mode.

I meant: Thus, Webkit/IE “lock” it into UTF-8 and thus lands in no-quirks-mode.

Posted by Leif Halvard Silli at

The first browser blinks on XHTML parsing

The first browser blinks on XHTML parsing I’m late to the party, but Opera has decided to stop strict parsing of XHTML (via Sam Ruby ): [...] we’ve decided to stop throwing draconian XML parsing failed error messages [on invalid XHTML], and...

Excerpt from Chris's Wiki :: blog at

No more “XML parsing failed” errors

Looking to get the hack fifa 19 online and it can be so useful to get the coins here.

Posted by jenny singh at

Add your comment