It’s just data

Misdirection

Anne van Kesteren: What also was funny was that the Web was not about the browser except that lots of people here at TPAC wanted browsers to do things differently.

I’m not there, but I can’t believe that anybody there would ever say or even want to imply that the web does not include browsers.  Perhaps the solution is to add the word “just” to the line, thus: the Web was not just about the browser.  There... all better.  Contradiction is all gone now.

It seems that the distributed extensibility discussion won’t go away like apparently some would hope it would.  This proposal only affects the performance of web pages with element and attribute names which contain a colon in them, and only affects the local name and namespace URI of such elements and attributes.  Values that essentially are unused in HTML4.

It occurs to me that Anne may be intentionally being thick here.  what is wrong with using XML for this?  Come on.  I can answer that with two words: IE, and Postel.  Next question?


Well, Postel is a good one (and XML5 theoretically solves that issue), but IE doesn’t do the other things they seem to want either. However, the discussion was pretty meta so I’m not sure I understood it all correctly.

Posted by Anne van Kesteren at

XML5 did sound promising, but the source was last updated 2007-10-21.  A solution that was tentatively explored and abandoned over a year ago is not a solution.  It needs to be both finished and implemented interoperably in at least two browsers.

My feeling is that while the technical approach is sound, calling it XML is a non-starter.  A new mime type is required.  Unfortunately, such an approach would also be a non-starter without buy in from MS.

Until there is is buy in from MS (or until they become irrelevant — neither appears to be likely to happen anytime soon), the only viable solutions are ones that involve the use of the text/html mime type.

Posted by Sam Ruby at

This proposal only affects the performance of web pages with element and attribute names which contain a colon in them

That “only” sounds potentially misleading. It’s quite a lot of pages - about 10% of pages from dmoz.org contain element or attribute names which contain a colon in them. (Most of that is xml:lang, and most of the rest is xmlns:o and xmlns:v and o:p from Microsoft Office). So it’s probably more useful to argue that there will be no negative effect on compatibility for any pages that currently exist, rather than that it doesn’t affect most pages.

Posted by Philip Taylor at

If a new media type is required XML5 would not be the preferred solution. It contains some legacy handling that would not be needed and several other decisions in XML could be rethought as well.

Anyway, my point is that for the use cases people have in mind, more is needed than just distributed extensibility. “Just” having namespaces was not enough as far as I could tell.

Posted by Anne van Kesteren at

Phillip: I’ve heard people dismiss proposals involving namespaces for performance reasons.  I feel it is important to address that issue head on.  I don’t think the performance difference will even be measurable for the preponderance of existing web pages.  That is especially true if the only use of colons in elements or attributes involve names that start with the letters x, m, and l.

Posted by Sam Ruby at

As a general rule of thumb, dismissing things for performance reasons is a very poor argument.  If actual benchmarks haven’t demonstrated a performance hit of at least, say 50% or more, my feeling is that it’s a moot point.  If there are no benchmarks, then you can’t really argue the point at all.  And regardless of the issue being discussed, performance is an issue that always becomes less of an issue as time passes, simply due to hardware improvements.  Further, performance issues are always surmountable.  They’re certainly not one of the hard problems of computer science.

Posted by Bob Aman at

The discussion has been triggered in part because it has been said that the Web was only defined by what was browsable with a Web browser. Other communities in the room which includes authoring tools implementers, mobile content developers, people developing APIs (be based on semantic web technologies or XML), etc. voiced their disagreements.

The browsers being part of the ecosystem and having a lot of visibility in this ecosystem, people indeed wanted them to modify some of the behaviour such as for example showing when a page had some non conformant markup. Some people also suggested the possibility to save and/or view source the document as a conformant document. Let’s say a “Save as conformant xhtml 1…” or a “Save as conformant html 5” document. It has definitely issues. A kind of built-in htmltidy.

I have noticed that some of the issues come from a misunderstanding of the specification between the parsing section and the content model, which are quite different. There is nothing into the specification which explains once you have parsed a document how to make it conformant html 5. You can write html 5, You can parse tag soup, including html 5. But you can’t parse tag soup and serialize it as conformant html 5.

Posted by karl dubost, w3c at

But you can’t parse tag soup and serialize it as conformant html 5.

I am very interested in that.  Venus consistently produces well-formed output.  I’d like Mars to be able to consistently produce conformant output.  Given a white-list approach to sanitization, that would seem to be within reach.

For this to be able to handle all inputs, it is important to me that there is a proper super/subset relationship between the set of DOMs that can be serialized as HTML5 and XHTML5.

Posted by Sam Ruby at

A Peek Inside the W3C

I’ve long believed that the Ajax/JavaScript communities and the W3C should communicate more and have more awareness of what both camps are doing so we can work together better and get things done. In light of this, here are some updates on a...

Excerpt from Ajaxian » Front Page at

Extensibility and Markup, Again and Again

Proving that the issues with extensibility will never go away until faced, and resolved: Anne van Kesteren : Concerns that HTML5 does not have distributed extensibility. That is, namespaces. What people seem to want is to extend the browser with...

Excerpt from Burningbird's RealTech at

A Peek Inside the W3C

I’ve long believed that the Ajax/JavaScript communities and the W3C should communicate more and have more awareness of what both camps are doing so we can work together better and get things done. In light of this, here are some updates on a special...

Excerpt from Ajax Blog at

I’d like Mars to be able to consistently produce conformant output.  Given a white-list approach to sanitization, that would seem to be within reach.

HTML 5 requires elements be used for their defined semantics, so being able to output conformant HTML 5 is hard to do so programmatically from non-conformant input. (Where hard means needs AI.)

Posted by Geoffrey Sneddon at

Fair enough.  I may not be able to reach that goal.  But I would like to try and see how close I can get.  And the lessons learned from that process may be able to be fed back into producing a better specification.

Posted by Sam Ruby at

What if there already was broad support for XHTML?

W3C XHTML 1.0 Appendix C : This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents. W3C XHTML Media Types : The use of ‘text/html’ for XHTML SHOULD be limited for the purpose of...

Excerpt from adrianba.net at

Distributed Extensibility stays alive and Anne

Anne says "Concerns that HTML5 does not have distributed extensibility. That is, namespaces. What people seem to want is to extend the browser with hundreds of markup languages. (How this keeps things simple to answer was not something I saw......

Excerpt from Dave Orchard's Blog at

Add your comment