It’s just data

Authoring Conformance Requirements

The HTML5 validator will produce both errors and warnings. I personally believe that many of the so-called “errors” are at best shoulds and at worst pose no real interoperability problems and are so frequently violated that the message produced only serve to obscure real problems.

To help evaluate this thesis, I’ve analyzed a few sites, categorized each error and warning, and taken a first pass at sorting these messages. Those that I have sorted to the top are ones I’ve thought to be less likely to be intentional and/or more likely to cause interoperation issues. And, therefore, those that appear later tend to be ones that I either find likely to be willful violations, or are unlikely to cause any problems at all.

I want to stress that this ordering was done quickly, and is likely to have many, many errors. I’m presenting it early in the hopes that others would comment on this. Such comments may very well influence further exploration I do in this area.


From Alexa Top 500:

  1. google.com
  2. facebook.com
  3. yahoo.com
  4. youtube.com
  5. live.com
  6. wikipedia.org
  7. blogger.com
  8. baidu.com
  9. msn.com
  10. qq.com
  11. yahoo.co.jp
  12. twitter.com
  13. google.co.in
  14. sina.com.cn
  15. google.cn
  16. google.de
  17. wordpress.com
  18. myspace.com
  19. microsoft.com
  20. google.co.uk

Honorable mentions:

HTML5 notables (and/or organizations I’m associated with):

See also Bug 7034.


Awesome data!

Some of these errors are bogus or misclassified. I filed a few validator.nu bugs:

[link]
[link]
[link]

Posted by Maciej Stachowiak at

I’m sure that the Chinese authors of sina.com.cn found the following warning useful:

The character encoding gb2312 is not widely supported. Better interoperability may be achieved by using UTF-8.

Posted by Leif Halvard Silli at

Sam, you can’t explain orchards to people who only understand apple-pie-on-my-plate.

[link]

This validator.nu result reminds me of a scene from the Incredibles:

Syndrome (to Mr. Incredible): “I’ll give them the most spectacular heroics anyone’s ever seen. And when I’m old and I’ve had my fun, I’ll sell my inventions so that everyone can be superheroes. Everyone can be super. And when everyone’s super, no one will be.” [evil laughter]

Posted by Shelley at

Aryeh Gregor and I have started gathering some more detailed data (including hand-classification of the errors). If anyone wants to help (or add additional data or cover more sites) that would be most welcome.

[link]

Posted by Maciej Stachowiak at

Updated and cross-referenced.

Posted by Sam Ruby at

FWIW, the google.com results are actually for google.fr. html5.validator.nu is hosted in France, so google.com redirects to google.fr by IP-to-country mapping.

Posted by Henri Sivonen at

HTML5 Authoring Conformance Study

Methodology: ←Older revision Revision as of 20:37, 28 March 2010 (4 intermediate revisions not shown.) Line 10: Line 10: * For pages that declared themselves to be something other than HTML5, [link] was used to validate them as...

Excerpt from HTML WG Wiki - Recent changes [en] at

HTML5 Authoring Conformance Study

Methodology: ←Older revision Revision as of 20:37, 28 March 2010 (4 intermediate revisions not shown.) Line 10: Line 10: * For pages that declared themselves to be something other than HTML5, [link] was used to validate them as...

Excerpt from HTML WG Wiki - Recent changes [en] at

Tracking changes to the spec.

Posted by Sam Ruby at

Add your comment