html5lib 0.10
If you can cite an example, and in particular, cite the behavior that IE exhibits in response to this example; I am confident that it will be taken seriously, quickly converted into spec text, a test case, and fixes to both code bases. Furthermore, I would expect the browser vendors that participate in the WhatWG to come into compliance with the specified behavior.
Posted by Sam Ruby at
Though not with all of the IE behavior Sam. We want trees, not graphs.
Posted by Anne van Kesteren at
As far entity parsing in particular. I believe the algorithms match what IE implements. Except that some entities that IE supports but nobody else are not part of HTML5 (yet).
Posted by Anne van Kesteren at
I believe the algorithms match what IE implements.
Anne, given past history with James Holderness, I believe that he has spotted a case or two where it has not. But instead of giving specific error reports, he tends to start out with hyperbole ("don’t even come close") and will take every opportunity to cast vague aspersions ("someone’s supposition that has been erroneously quoted"). But stick with him, eventually he will produce a wonderfully detailed report of the issue, and it will be real.
James, Anne is from Opera software and is an active participant in WHATWG. If you can convince him that there is a problem, he will generally take it from there.
Posted by Sam Ruby at[The following three entries were rescued from my log files; see this bug - Sam Ruby]
If you can cite an example, and in particular, cite the behavior that IE exhibits in response to this example
AtoZ
By my reading of the spec, that should be displayed as AtoZ, but IE displays it as AtoZ.
Of course other browsers do things differently, and obviously the spec can’t match the behaviour of every browser exactly, but I’m curious whether there was a concious decision to match any browser. You seem to imply that IE was the target, which would certainly make sense, but is that formally documented somewhere?
PS: Your live preview kind of burst into flames when I wrapped the IE output in a code marker (triple braces). I think it had something to do with the ampersand, because converting that to an amp entity solved the problem (I didn’t think that was a good idea though).
Posted by James HoldernessFor some reason I keep getting “POST limit exceeded” when I try to reply. I’m wondering whether the content has anything to do with it, so trying again with something simple.
Posted by James Holderness
If you can cite an example, and in particular, cite the behavior that IE exhibits in response to this example
AtoZ
By my reading of the spec, that should be displayed as AtoZ, but IE displays it as AtoZ.
Of course other browsers do things differently, and obviously the spec can’t match the behaviour of every browser exactly, but I’m curious whether there was a concious decision to match any browser. You seem to imply that IE was the target, which would certainly make sense, but is that formally documented somewhere?
PS: Your live preview kind of burst into flames when I wrapped the IE output in a code marker (triple braces). I think it had something to do with the ampersand, because converting that to an amp entity solved the problem (I didn’t think that was a good idea though).
Posted by James Holderness atOf course other browsers do things differently, and obviously the spec can’t match the behaviour of every browser exactly, but I’m curious whether there was a concious decision to match any browser. You seem to imply that IE was the target, which would certainly make sense, but is that formally documented somewhere?
The target is to be good enough that it is possible to implement the HTML 5 spec directly and produce a parser that is compatible with enough content to be deployed in a mainstream web browser. In practice that means the spec is largely based on the behaviour of IE simply because that is the browser against which most content is authored. However, in cases where existing mainstream web browsers do something different to IE and don’t have a large number of bug reports arising from the difference, exactly matching the IE behaviour is apparently not required for good enough web compatibility. In these cases, the spec can opt to choose a behaviour that is different to IE (but similar to another widely deployed browser) for reasons including performance, consistency, etc.
Of course, this doesn’t mean that this specific issue isn’t worth considering for change.
Posted by jgraham atBy my reading of the spec, that should be displayed as AtoZ, but IE displays it as AtoZ.
Firefox displays it as AtoZ. Producers would be well advised to avoid this.
Posted by Sam Ruby atIE seems to never requires the semicolon for &#DD, but always requires the semicolon for &#xHH.
Posted by zcorpan at
PS: Your live preview kind of burst into flames when I wrapped the IE output in a code marker (triple braces). I think it had something to do with the ampersand, because converting that to an amp entity solved the problem (I didn’t think that was a good idea though).
Fixed. Thanks!
Posted by Sam Ruby atBut instead of giving specific error reports, he tends to start out with hyperbole ("don’t even come close")
Sorry, that was my fault for misreading the spec. At first glance, I honestly believed it didn’t come close to what any of the major browsers were doing (I didn’t notice the weird layout of the entities table). I’ll admit it’s fairly close to IE.
Firefox displays it as AtoZ. Producers would be well advised to avoid this.
That’s why the spec gives different instructions to producers. We’re talking about instructions for consumers here. Firefox also displays named entities differently to IE (as does Opera). In that case the spec appears to describe the IE behaviour, rather than the Firefox behaviour or the Opera behaviour. If this was a conscious design decision I would have expected more consistency, but I understand there are other factors that come into play.
In these cases, the spec can opt to choose a behaviour that is different to IE (but similar to another widely deployed browser) for reasons including performance, consistency, etc.
Fair enough. I just think it would have been nice to know up front what behaviour the spec was documenting so I could make an educated decision about using their parsing algorithms or not. The vague assertion that the spec is based on popular browsers doesn’t fill me with confidence. You have answered my question though. Thanks.
Posted by James Holderness atThe idea is that over time “popular browsers” adopt the specification and all do the same thing. I suppose it might be nice to also document how browsers currently don’t implement the specification (i.e. how they differ), but that’s a lot of additional work if you want to write it all down properly. (If you follow the WHATWG mailing list you’ll get lots of bits and pieces on how browsers implement the current specification though.)
Posted by Anne van Kesteren at
The idea is that over time “popular browsers” adopt the specification and all do the same thing.
Conversely, the hope is that those browsers that adopt the specification and all do the same thing become more popular. :-)
Posted by Sam Ruby atIE seems to never requires the semicolon for &#DD, but always requires the semicolon for &#xHH.
For the record, it’s slightly more complicated than that, since attributes are handled differently to regular content. Also, there are other minor differences between what the spec says and what browsers do (I’m not going to start listing them all - it’s easy enough to check if anybody cares). The thing is that none of this really matters if it isn’t the intention to match the behaviour of IE exactly, which is the impression I’m getting.
If you follow the WHATWG mailing list you’ll get lots of bits and pieces on how browsers implement the current specification though.
I was kind of hoping the WHATWG spec itself would be the definitive source for all that information (or at least in regards to IE error handling). Personally all I’m interested in is how to write an IE compatible parser. If the WHATWG spec can’t tell me that it’s not going to be as useful to me as I’d hoped. But that’s my problem, not theirs.
Conversely, the hope is that those browsers that adopt the specification and all do the same thing become more popular. :-)
Good luck with that. :)
Posted by James Holderness at
I really like the work that the WHATWG is doing to document HTML parsing. In fact I hope to make use of their documentation to improve my own parsing code at some point. What bothers me, though, is the assertion I keep seeing that their algorithms are reverse engineered from popular web browsers. In the small area of HTML parsing with which I am somewhat familiar (entity parsing in particular), their algorithms don’t even come close to matching what any of the major browsers are doing.
I’m beginning to wonder whether it really was their intention to document current browser behaviour. Is that not just someone’s supposition that has been erroneously quoted as fact so often now that everyone believes it to be true?
Posted by James Holderness at