It’s just data

HTML Reunification

Rob Sayre: this objection wouldn’t be relevant to a document with no “author conformance requirements”, right?

At the present time, the HTML 5 document is a browser behavior specification and a list of author conformance requirements.

The first part is essentially uncontroversial — nobody, including browser vendors like it, but it is what it is, and it ain’t changing. Authors of libraries recognize this too.  Everybody is OK with that.

The second part is the source of seemingly unending controversy.  Are alt attributes required?  What should be considered a conformance error in SVG?  Is RDFa legal? The current draft hasn’t been built based on consensus, and this needs to be resolved prior to Last Call.  Deferring this reminds me of a commercial...

Meanwhile, extensibility and the relative roles of HTML and XHTML2 working groups were hot topics at the AC meeting last month.  Steven Pemberton and I have been having a productive discussion, and we’ve also consulted with a number of AC representatives.  This post is the result of those discussions.  Quick summary:

Further background:

At the second AC meeting breakout session, TV Raman made a comment a number of times that we need to partition the idea of extensibility into two parts: extending the platform vs. extending the language.  Those two words didn’t make much sense to me, but his examples did.  Video and 2D graphics are things that need to be implemented by the browser vendors.  And care needs to be taken that such features are defined in a way that they interact consistently with how other parts of the platform interact with other components such as CSS.  Raman refers to these as platform features.

The second type of feature is one that does not need any support by browser vendors other than to simply to parse the unknown elements and attributes into a DOM, and to provide access to the DOM via JavaScript.  RDFa comes to mind as such a feature, as does ubiquity-xforms.  TV Raman refers to these as language features.

I like and endorse TV Raman’s split.  Different burdens of proof and policies need to be applied to each type of feature.

First the obvious: for platform features (i.e., the ones that impact browser implementations), consensus by browser vendors is essential.  I will also note that there are platform features in both of the current HTML5 and XHTML2 published working drafts which do not yet enjoy that level of consensus (trivial example: <q> elements).  My own personal option on how we should deal with this situation is that we should give the HTML5 working draft a severe hair cut (along the lines of Rob’s draft, but perhaps only slightly less brutal), and then add in only the barest necessity of syntax necessary to support platform features required.  I care not what version number of HTML we assign to that document.

For language features, the bar can (and should) be much lower.  Today’s browser have a default rendering and a default mapping to the DOM for unknown markup.  In many cases (e.g., attributes), that means that such markup will not be visible.  An unfortunate consequence of this is an invaluable feedback loop is lost, and therefore data quality will suffer.  We need to agree up front this is entirely the responsibility of library developers to make this stuff visible.  Experience has shown that validators, while necessary and important, are not sufficient. FWIW, a similar observation can be made about lang attributes for non-CJK languages.

Meanwhile, much of the requirements for unfettered language level extensibility come from vendors which produce content via XML pipelines.  Love it or hate it, (and there are plenty in the latter camp even with XML circles) XML namespaces are the way to do such extensibility.  Ben Adida (correctly, in my opinion) observed that the working group that has responsibility for an XML serialization of HTML needs to be aware of and respect such as *a* mechanism for extensibility.  It was also widely observed that those with such pipelines don’t find the distinction between HTML and XHTML to be a useful one as it is only the one that controls the final transfer (and therefore the content-type) that has any control over the serialization.  The obvious implication of this is that markup requirements for language features will bleed through between one serialization and the other.  Direct observation has corroborated this.

My conclusion is that xmlns attributes, and both element and attribute names containing colons, need to be allowed in conformant HTML.  It needs to be noted that such nodes are placed into the DOM today differently by HTML and XML parsers.  This is unfortunate, but given the experience of Opera, it appears to be beyond our ability to correct at this point.  Judging by the people who have coded RDFa and ubiquity-xforms libraries, this is accepted as the cost of doing business on today’s web, and a little JavaScript in a library can smooth over the differences for applications built upon these libraries.  Yes, this will impact people who code validators, but such an impact is much smaller than any impact on content creators.

Effectively this means that Design Principle 3.5 ("DOM Consistency") be relaxed — but for language extensions ONLY.

RDFa and ubiquity-xforms libraries require zero support from browsers today.  I’d like to see an exercise be taken to decide which elements and attributes are to be considered “conformant”, and by that I only mean that such can exist in HTML5 content and not cause a conformance checker to throw up all over the page.  Any additional features required of browsers in support for XForms would have to meet a very high bar indeed and should not be presumed to be included in any revision earlier than HTML6 and quite possibly not even then.

The goal here is not to repeat the exercise where people present use cases and continue to have the HTML working group be a gatekeeper and king maker and sole determinant as to which features are permissible and which are not in the open web.  Specifically, if RDFa is “out”, then I see HTML5 and XHTML2 continuing their separate ways.

On the other hand XForms and RDFa would only be “in” to the extent that they don’t require any browser changes, anything more will have to meet a very high bar.  Additionally, features capriciously removed in HTML 5 such as font and @profile would likely to be restored.

Beyond this the only responsibility we are talking about here for implementers of HTML parsers in general (and browsers in particular) is to place this data into the DOM.  We are not talking about enforcing quotes on attributes, switching to a “real” XML parsers in such instances and building a quilt out of the various DOM fragments produced, or draconian error handling. None of this applies.  The only request is that we explore treating element names with colons in them as foreign content (specifically, the presence or absence of a trailing slash should determine whether the next tag should be treated as a sibling or a child).  As HTML parsers won’t need to deal directly with resolving namespaces, even reparenting nodes isn’t an interoperability issue: libraries that operate on the resulting DOM will consistently evaluate namespaces after the reparenting operation.

So far, I’ve only talked about element and attribute names which contain a colon in them.  Since HTML parsers are unaware of xmlns attributes, it is entirely too dangerous to encourage the use of default namespaces in conformant HTML.

That’s the high level sketch.  As always, the devil’s in the details. We will need to go case by case through the features that are planned for HTML5 and XHTML2 and see how to deal with each one.  Examples:

That should be enough examples to illustrate the overall approach.  I’m sure that when we worth trough each feature there will be some tough issues.  But feature by feature, element by element, and attribute by attribute we will need to work through this together.

When we are done, we will have a criterion for determining which features are platform features (and therefore subject to greater scrutiny), and which are language features (and how such are to be expressed in markup in a way that avoids overlap).  Within the scope of platform features, we need to identify the absolute minimum that absolutely needs to go into the next release of HTML (picture the run on the bank scene from It’s a Wonderful Life), and have confidence that there will be a next release of HTML after that one.

One thing I have not talked much about is modularity.  If there are those who wish to pursue a non-normative definition of the XML infoset produced as a result of a parse in terms of an XML Schema, they should be free to do so as long as it remains entirely optional and without requiring any runtime overhead for parsers.  Additionally, such a definition of modularity should in no way endorse the ability for platform vendors to subset and therefore fragment the platform.

I don’t presume that this approach is ideal by any means.  Instead the question is whether or not something along these lines could be something everybody could live with?

This in no way precludes someone or a group producing a separate best practices guidelines for authors, and if they want to suggest that their version of best practices are in conflict with the recommendations of the User Agent Accessibility Guidelines Working Group, the SVG Working Group, the Semantic Web Deployment Working Group, I wish them luck.  An alternate approach would be aim higher and recognize that not everybody will either want to, or be able to, meet that bar.  Make alt attributes a best practice, make quoting attributes a best practice, etc.  If this group focuses on best practices instead of a conformance requirements, it may gain consensus faster.

Such a document could even be generated from the same source file.

But meanwhile, perhaps we can get a candidate recommendation out next year for the browser behavior portions of the document.


Hear hear!

Excellent suggestions, well put, absolutely necessary!

Posted by Jarvklo at

I’d like to retain intact the design principles for platform design as worked out by the browser vendors.

The Design Principles are not just for stuff that browsers implement. They are for designing HTML in general, which matters for moving the Web forward in general. Particularly, the Priority of Constituencies principle places users and authors ahead of implementors even though you portray the Design Principles as having been worked out by the browser vendors (as opposed to having been worked out by the WG?).

For language features, the bar can (and should) be much lower.

Shouldn’t the Design Principles apply, though? Why should the bar be lower?

We need to agree up front this is entirely the responsibility of library developers to make this stuff visible.

This seems like making it “Someone Else’s Problem”, but I have a feeling I might find myself in the role of “Someone Else” as a library developer down the road.

Meanwhile, much of the requirements for unfettered language level extensibility come from vendors which produce content via XML pipelines.

Isn’t it then a particularly bad idea to put the colon in local names? Wouldn’t one then need to audit all the code in the XML pipeline to make sure it doesn’t enforce XML NCName rules for local names? How would you work with such an infoset in an XML pipeline whose tools have been designed with the assumption that you cannot ever find colons in local names. How would you select “language extension” elements in XPath 1.0, for example?

It was also widely observed that those that those with such pipelines don’t find the distinction between HTML and XHTML to be a useful one as it is only the one that controls the final transfer (and therefore the content-type) that has any control over the serialization.  The obvious implication of this is that markup requirements for language features will bleed through between one serialization and the other.  Direct observation has corroborated this.

I think things will become worse if you create a situation where either colons in local names will try to enter an XML pipeline or XML pipelines will have to use a different text/html to infoset mapping than browsers.

My conclusion is that xmlns attributes, and both element and attribute names containing colons need to be allowed in conformant HTML.  It needs to be noted that such nodes are placed into the DOM today differently by HTML and XML parsers.  This is unfortunate, but given the experience of Opera, it appears to be beyond our ability to correct at this point.  Judging by the people who have coded RDFa and ubiquity-xforms libraries, this is accepted as the cost of doing business on today’s web, and a little JavaScript in a library can smooth over the differences for applications built upon these libraries.  Yes, this will impact people who code validators, but such an impact is much smaller than any impact on content creators.

Why do RDFa or ubiquity-xforms need to use colons in local names? Why would “language extensions” in general need to use colons in local names? Since colons are already special in XML in a way that can no longer be reverted and you believe full Namespaces processing in text/html is not feasible, shouldn’t the logical conclusion be that any extension syntax use anything but colons?

Effectively this means that Design Principle 3.5 ("DOM Consistency") be relaxed — but for language extensions ONLY.

Earlier, you said you’d like to retain the Design Principles intact. The DOM Consistency Principle is not just for browsers. It is also very much for non-browser apps that use XML pipelines with an HTML parser at the start of the pipeline when the input is text/html.

RDFa and ubiquity-xforms libraries require zero support from browsers today.

Time and again RDFa proponents mention Firefox extensions that are sensitive the RDFa. When something is implemented as a Firefox extension, you should assume that if the feature becomes successful, it should be able to migrate into the native feature set of Firefox and that browsers that do not support Firefox extensions should be able to implement support for the feature in their native feature set.

Therefore, any extension to HTML for which a Firefox extension is put forward as a client-side implementation mechanism should be subject to the design criteria for “platform extensions” if one accepts that there’s a dichotomy between “language extensions” and “platform extensions”.

I’d like to see an exercise be taken to decide which elements and attributes are to be considered “conformant”, and by that I only mean that such can exist in HTML5 content and not cause a conformance checker to throw up all over the page.  Any additional features required of browsers in support for XForms would have to meet a very high bar indeed and should not be presumed to be included in any revision earlier than HTML6 and quite possibly not even then.

If an “XForms” implementation is just another JavaScript library next to the others, why does it need special treatment that the other JavaScript libraries don’t need? In particular, if the JS library itself consumes the local names in the DOM, why can’t the local names be chosen not to venture in the “here be dragons” territory of colons? This way the library could work in both text/html and application/xhtml+xml instead of being limited to one of them.

On the other hand XForms and RDFa would only be “in” to the extent that they don’t require any browser changes, anything more will have to meet a very high bar.

This assumes that the right question is “Does this require changes in browsers?” as opposed to “Is this good for the Web?” In the case of the latter question, there is no “platform features” / “language features” dichotomy.

Beyond this the only responsibility we are talking about here for implementers of HTML parsers in general (and browsers in particular) is to place this data into the DOM.

What about parsers that feed into a SAX2 pipeline? What about parsers that build a XOM tree?

RDFa is a language level feature and defines a small number of attributes without a colon which are to be placed on pre-existing elements which also do not have a colon.

Do you mean a new flavor of RDFa that uses a prefix attribute where RDFa currently uses xmlns:foo? The current flavor introduces a countably infinite number of attributes with a colon.

ARIA introduces elements and attributes with a common prefix of aria- (as well as a very few attributes which do not follow this convention).  This is a form of lesser-distributed extensibility (specifically, it is non URI based and the prefix is fixed), but the exact same syntax that works in the HTML serialization needs to be supported in the XHTML serialization.

If we must do extensibility, why not do it like ARIA without colons?

If there are those who wish to pursue a non-normative definition of the infoset produced as a result of a parse in terms of an XML Schema, they should be free to do so as long as it remains entirely optional and without requiring any runtime overhead for parsers.

As a developer of a RELAX NG schema, I’m curious why might want an XSD schema instead.

I don’t presume that this approach is ideal by any means.  Instead the question is whether or not something along these lines could be something everybody could live with?

That seems like a recipe for “committee design”.

Posted by Henri Sivonen at

Oops. In “I’m curious why might want an XSD schema instead.” s/why/who/

Posted by Henri Sivonen at

Hmmm... I wonder if twitter’s mapping to tinyurls is stable?

[link]

Posted by Sam Ruby at

the Priority of Constituencies principle places users and authors ahead of implementors

I’ve seen enough feedback from users and authors to question whether or not this goal was faithfully met.

Earlier, you said you’d like to retain the Design Principles intact.

... for platform features.

why can’t the local names be chosen not to venture in the “here be dragons” territory of colons

Potential fodder for a “Best Current Practices” document perhaps, but not as a conformance requirement.

Is this good for the Web?

This is a question upon which reasonable and intelligent people can disagree.

As a developer of a RELAX NG schema, I’m curious why might want an XSD schema instead.

I didn’t say instead.  I would support publishing your RELAX NG schema as a non-normative appendix or document.  Perhaps people could even generate XSD from that.

Posted by Sam Ruby at

If an “XForms” implementation is just another JavaScript library next to the others, why does it need special treatment that the other JavaScript libraries don’t need? In particular, if the JS library itself consumes the local names in the DOM, why can’t the local names be chosen not to venture in the “here be dragons” territory of colons? This way the library could work in both text/html and application/xhtml+xml instead of being limited to one of them.

Disambiguation comes to mind.

Posted by Julian Reschke at

Whether Tiny URL is stable, yes, but with caveats. Another search shows another Tiny URL which differs because of the use of the www. as part of the original URL--all of which explains why Twitter hash tags, such as #htmlreunification, have become so popular. The use of hash tags by the users was a best practice derived, over time, by the tool users without any adverse impact on Twitter’s ability to parse the text in which they are included.

Speaking of which, on the question of whether your suggestion is good: Emphatically, yes.

Posted by Shelley at

I didn’t say instead.  I would support publishing your RELAX NG schema as a non-normative appendix or document.  Perhaps people could even generate XSD from that.

I’ll rephrase: If there is a RELAX NG schema and an XSD schema, who wants to use the XSD schema? (This is entirely a side question in the “out of curiousity” department compared to my other points.)

More interestingly: Do does one write a RELAX NG schema or an XSD schema for “language extensions” that put a colon in the local name?

Disambiguation comes to mind.

Who does the author need to disambiguate for other than the potential other JavaScript libraries (s)he includes on the page? Why wouldn’t a non-colon delimiter work for that purpose if there is indeed a need to disambiguate?

Posted by Henri Sivonen at

[from callmepep] Sam Ruby: HTML Reunification

[link]...

Excerpt from Delicious/network/simonh4 at

"The goal here is not to repeat the exercise where people present use cases and continue to have the..."

“The goal here is not to repeat the exercise where people present use cases and continue to have the HTML working group be a gatekeeper and king maker and sole determinant as to which features are permissible and which are not in the open web.” -...

Excerpt from adactumblr at

Sigh. Lots of typos today:

How does one write a RELAX NG schema or an XSD schema for “language extensions” that put a colon in the local name?

Posted by Henri Sivonen at

OK, I realise that this is a blog frequented by people far cleverer than me, and I’m therefore exposing  myself  to much mockery and contempt by asking the question, but I don’t get it.

What is gained, and what problem(s) solved, by filleting out the HTML 5 spec in the manner Rob describes: https://bugzilla.mozilla.org/show_bug.cgi?id=478665#c1

Without new elements,  sectioning algorithm,  client-side storage,  Undo history, new forms stuff, audio and video, what is left that is of benefit to anybody?

(I don’t have an agenda here: I genuinely don’t comprehend how  emasculation of the  spec is helpful. Although please, take the damn dialog element or at least spell it correctly).

Posted by bruce at

bruce: nobody is suggesting that there never be an HTML 6.

HTML 5, as defined by Rob, has well defined error handling and a very select few features (e.g. canvas).

HTML 5, as defined by Ian, has a rather long gestation period.

Posted by Sam Ruby at

If there is a RELAX NG schema and an XSD schema, who wants to use the XSD schema?

A reasonable question to ask should anybody step forward and produce such an XSD schema for HTML 5.  If nobody does so, I see no reason why such a schema needs to be included in the spec.

Who does the author need to disambiguate for other than the potential other JavaScript libraries (s)he includes on the page? Why wouldn’t a non-colon delimiter work for that purpose if there is indeed a need to disambiguate?

The key is to have the portion before the delimiter to be treated as a prefix and resolved correctly — in application level code, e.g. JavaScript.  It might be useful to get an update on the 4 actionable bugs from Yahoo!.

Posted by Sam Ruby at

Sam Ruby: HTML Reunification

submitted by ndanger [link] [0 comments]...

Excerpt from programming: what's new online at

bitter shitter

Sam Ruby: HTML Reunification The current draft hasn’t been built based on consensus, and this needs to be resolved prior to Last Call. The goal here is not to repeat the exercise where people present use cases and continue to have the HTML working...

Excerpt from Last Week in HTML5 at

Thanks Sam

So, are you proposing

Doesn’t this just delay/ defer the religious wars that we see now?

Posted by bruce at

As far as I can tell, nobody is suggesting deferring anything.  Everybody wants the WHATWG work to continue, and the WHATWG work is proceeding according to the WHATWG schedule.  Whatever “wars” there might be over those features (and frankly, I don’t see much controversy there other than perhaps a mild suggestion that standards bodies are a bad place for innovation) will presumably continue unabated.

In other words, once those features are ready and have proven themselves, they should be included into subsequent releases of HTML.  Whether those releases are numbered with integer values or should asymptotically approach, but never reach, the magic number 5, I care not.

Meanwhile the questions that make sense to me to be asking at this point are:

As for me, I am definitely interested in seeing if we can produce a CR by next year.

Posted by Sam Ruby at

DOM Consistency

This is in reply to Sam Ruby ’s HTML Reunification —go read that first. (When I tried to post this as a comment, his comment system didn’t seem to be working.) As a web developer, one of the most appealing (if not the most appealing) parts of HTML5...

Excerpt from Edward O’Connor at

Thanks; I think I got your point.

Posted by bruce at

The key is to have the portion before the delimiter to be treated as a prefix and resolved correctly — in application level code, e.g. JavaScript.

Resolve to what?

It seems that resolving the prefix is not the key for the ubiquity-xforms case: baseline, demo (credit: Philip Taylor)

Sam, do you personally think that prefix-based indirection in URI-based extensibility is a good idea? It seems to be failing even for early adopters as predicted.

Posted by Henri Sivonen at

do you personally think that prefix-based indirection in URI-based extensibility is a good idea?

Me personally?  I have a confused muddle of thoughts on the matter.

The above list is clearly not ordered.  My strongest convictions lie in the last one.  Whether it is a good idea or not, it will be done.  And the people who will do so outnumber us.  We figuratively are in Pamplona and 7 July is rapidly approaching.  The question is one of whether we chose to pave the cowpaths or be trampled by the bulls.  Defiance doesn’t tend to feel very satisfying, at least not for very long.

Posted by Sam Ruby at

@Henri: Tool support? I’m not saying it’s perfect, but e.g. Eclipse comes with built-in Schema support. Same for Visual Studio. There is merit in having schemas readily available. I’d also like HTML5 to benefit from the XHTML Modularisation effort.

Posted by Laurens Holst at

@Henri: I could point to many similar examples of e.g. HTML construction by string-concatenation is failing. The absolute need for error lenience as advocated by html-proponents and the odd ways in which this manifests itself is an example of how badly this practice fails. Yet everybody is doing it, and I don’t see you pointing that out all the time :) (well, I don’t read the list mail much, so I might’ve missed it).

In the case of RDFa, if prefixes are used that are not defined, I suppose the triples it would generate are ignored. Kinda like CSS. No significant harm done... And either way, I don’t see any other way to achieve distributed extensibility. Simple prefixes without URIs assigned to them aren’t unique enough. Using only full URIs is not acceptable either; they are so verbose, nobody would use them. I’m sure you also realise this, but you probably do care for RDFa to succeed.

By the way, I’m sure there are also numerous examples on the interweb of unparsable microformats because of typos or other structural errors, even though they do not use namespace prefixes. The underlying cause of these kind of errors going undiscovered is not complexity, but the fact that you are putting metadata somewhere with no direct obvious output in today’s browsers. E.g. the ALT attribute suffers from the same problem.

Posted by Laurens Holst at

I’d also like HTML5 to benefit from the XHTML Modularisation effort.

What’s the benefit?

Posted by Henri Sivonen at

I believe that in contexts where the namespace declaration is likely to be found nearby to the actual use, that the problems that people like you delight in pointing out are grossly overstated.  Examples of this include the CC license which to most mortals is a blob no better or worse than the blobs that people copy/paste when the embed YouTube videos on their blog entries.

I don’t have enough data to know how common those errors are, so I’ll try anecdotal evidence. How to add a creative commons license to your blog: <a cc="http://creativecommons.org/ns#" href="http://armenzg.blogspot.com/" property="cc:attributionName" rel="cc:attributionURL">Zambrano Gasparnian, Armen</a>. (See cc vs xmlns:cc.)

(As a bonus, the attributionURL turned into attributionurl when the blog post was displayed on Planet Mozilla, and (as far as I can tell) RDFa is case sensitive.)

Posted by Philip Taylor at

Can we also take the chance to ammend the CSS spec in terms of this as well? The CSS spec actually says you shouldn’t parse and include unknown CSS rules and property/values into the CSS object model, which is a mistake in my opinion. This makes it impossible to experiment at the language level (like you suggest), or to grab things like -moz-radial-gradient from a JavaScript shim library to simulate native support for these things on other browsers. In reality most browsers include these in the parsed CSS DOM (such as IE), but Firefox may not if I believe and they regularly close bugs that ask for this by pointing to the CSS spec where it says you “WILL NOT” parse and represent unknown CSS rules/properties.

Posted by Brad Neuberg at

One thing to add to this: we should ensure that nested sets of unknown elements are correctly represented in the DOM, including empty nodes:

<unknown1>
  <unknown-child2></unknown-child2>
  <unknown-child3/>
</unknown1>

Firefox currently doesn’t do this, for example, which again makes JS shims harder.

Also, you will run into casing issues very fast in these situations. There is a tension between the HTML browser makers, who do not want casing in normal HTML for performance reasons, and the XML folks. The following would fail for example:

<cc:Work>

because of the upper-case Work. Not sure how to handle this.

Also, there is the tricky issue of id="" not working on unknown elements, due to the silliness of xml:id issues:

<cc:Work id="foobar">

getElementById("foobar") won’t work in this situation. This can again make JS shims to simulate these things (such as a JS XForms library for normal HTML) very hard. If your thinking above can automatically assume that an id attribute, even on unknown namespaced elements in a text/html situation, are automatically DTDed as IDs that will help greatly. xml:id does not help in this situation, as I highly doubt we will see it on all the browsers even with upcoming ones do to philosophical differences.

Posted by Brad Neuberg at

Everyone wants their thing to be HTML

Sam Ruby: The goal here is not to repeat the exercise where people present use cases and continue to have the HTML working group be a gatekeeper and king maker and sole determinant as to which features are permissible and which are not in the open...

Excerpt from Rob Sayre's Mozilla Blog at

@Bruce: Great point. If we follow Rob’s disassembly of HTML 5, there won’t be anything there to actually drive adoption. Sam, you suggest that his version has well-defined error semantics.... that didn’t work too well for XHTML though to drive adoption. We need a real-world standard with compelling features to keep up with Silverlight, AIR, etc., not an academic document that simply cleans things up. Unfortunately that’s not enough.

Posted by Brad Neuberg at

@bruce

One of the benefits is to publish now a specification with features already implemented and tested (and not in 2042). That could be an HTML 4.5. That would also be beneficial giving a bit of juice for implementers, by making baby steps in terms of implementations (those who don’t have the workforce of browsers vendors). 

Look at the time that CSS 2.1 and CSS 3 have been into discussions land. Not saying it is bad, but it just shows that sometimes it is better to make a smaller step. It is something which is good for the community. (I hear anne saying in my head “but that’s plain marketing”. Maybe. I would prefer to say that it is making a contract with the community and not discouraging the good will. [sorry anne if I mischaracterized you :) ])

Posted by karl dubost at

Brad: Rob’s approach does not preclude RDFa or <audio> or <svg>, it merely doesn’t endorse any of the above.  It can be viewed as a basis which can be (a) rapidly evolved, and (b) built upon.

I would be very interested in what your approach would be, particularly if it is expressed in terms such as “I would be prepared to...”.  If, instead, your opinion is that you would like to see Ian’s draft to be done earlier, or for Rob’s draft to include more... you should be talking to them and not me.

Posted by Sam Ruby at

@brad

We need a real-world standard with compelling features to keep up with Silverlight, AIR, etc., not an academic document that simply cleans things up.

This is a noble goal, but it really looks like a Geek Wet Dream. There are awesome things done with Canvas, but the thing that the hardcore geeks of html5 tend to forget that it is still far beyond what the Flash or Adobe Air platform propose. I’m not advocating for them. I really wish it was different.

You can definitely say that canvas is a baby step for making an application platform to compete with adobe air in 15 years indeed. (Though don’t forget the competition will advance.) But selling canvas and some of the features of html 5 as a competitive technology with regards to the others make us (people in the HTML 5 WG) ridiculous.

Just go see a flash developer in a Web agency with high profile. Show him Canvas and its Development/Authoring platform… ah ooops. None.

Again, there are über cool things which have been made such as porting Processing to Canvas… but…

Posted by karl dubost at

Brad said: We need a real-world standard with compelling features to keep up with Silverlight, AIR, etc., not an academic document that simply cleans things up.

Mr Dubost’s riposte: This is a noble goal, but it really looks like a Geek Wet Dream.

I’m with Brad, here. It might be  a noble goal (porting Processing to canvas is a great idea, btw) that one that is always playing catch-up with the proprietary (=anti-Web) frameworks.

(A lot of people’s use of Flash is replaceable by “full” HTML 5 and other web standards: SIFR is replaceable by CSS web fonts, Flash video by a genuinely interoperable video tag, random screen bling can be done with canvas or SVG.)

But (entering metaphor hell:) I’d rather strive for the wet dream that Brad proposes than the  drab committee chamber hall that is the stripped down error-corrected HTML 4.1 that is proposed above.

Posted by bruce at

(I really shouldn’t try to type before 7 am)

Posted by bruce at

felch

jimmy junior jgraham: hsivonen: Sure. But W3C people are being told that their baby is ugly and they are trying to find ways to route around the objections rather than consider their merits jgraham: It seems to be trying to be an attempt to create...

Excerpt from Last Week in HTML5 at

The evangelists have declared that my draft belongs in a drab committee chamber hall. Surely a good sign.

Posted by Rob Sayre at

Bruce, let me try it another way then: I take it then that you are entirely happy with the content and schedule of Ian’s draft.  If so, rest assured that nothing Rob is doing inhibits in any way the important work that Ian is doing.

Posted by Sam Ruby at

I have a confused muddle of thoughts on the matter.

I’m a bit concerned about a proposal getting made on that basis.

We figuratively are in Pamplona and 7 July is rapidly approaching.  The question is one of whether we chose to pave the cowpaths or be trampled by the bulls.

It seems to me that the cowpath is dispatching directly on the prefixed name—not resolving the prefix into something and dispatching on the expansion.

Thought experiment: Couldn’t ubiquity-xforms compare the string xf-input instead of xf:input if there were a “Best Practice” document saying that xf-input were the “best practice” for unilaterally extending HTML? If not, why not? If not, why would it be a problem for the HTML WG?

Aside: Why does ubiquity-xforms merit special attention from the HTML WG as opposed to e.g. YUI, Prototype, jquery or MooTools?

Defiance doesn’t tend to feel very satisfying, at least not for very long.

That’s an amusing cartoon, but I observe that there are unaddressed technical questions in my previous comments:

Where are the XML experts, who disapproved of my (misplaced) suggestion to document the WebKit XPath matching behavior, when you suggest putting colons in local names, which is sure to cause grief with XML pipelines?

(Posting without OpenID, because it is suspicious that myvidoop no longer has an EV cert today.)

Posted by Henri Sivonen at

@Henri: We can extend it easily. But you must already know that, because that is the core purpose of XHTML Modularisation:

“This modularization provides a means for subsetting and extending XHTML, a feature needed for extending XHTML’s reach onto emerging platforms.”

So I can only interpret your question as implying that you do not find this useful. To which I would respond that if your vision is limited to HTML5’s design-by-committee approach to making language additions/extensions (i.e. limited to majority groups or whatever browser vendors find useful), then yes you probably would not be interested in something that provides the people the opportunity to tie XHTML into extensions of their own.

E.g. I made a g:insertMail extension to XHTML on my website’s about page; if I would grow this library of extensions to the point that I might want to have support for code completion and validation in my editor, I would build this on top of XHTML Modularisation. For the product we created at work, we have done exactly that.

The very nature of g:insertMail (as a way to deceive spam bots) would not allow it to be standardised, because then the spam bots would learn how to interpret it.

Posted by Laurens Holst at

@bruce

Ah interessing: 2 mistakes from me. 1. assuming that we were talking about the same type of content. 2. using geek wet dream in a casual way, it seems the words have more impact that I expected :)

(I really shouldn’t try to type before 7 am)

I thought the message was quite cute.

(A lot of people’s use of Flash is replaceable by “full” HTML 5 and other web standards: SIFR is replaceable by CSS web fonts, Flash video by a genuinely interoperable video tag, random screen bling can be done with canvas or SVG.)

yes and that’s cool but it is exactly what I call baby steps. Nothing compared to what is done in the Flash Designer world (to my big regrets). The issue is that if we come to a Flah designer and tell him about Canvas without the same authoring framework and the same capabilities, we will fail and be ridiculous.

Posted by karl dubost at

If we must do extensibility, why not do it like ARIA without colons?

The key word in that sentence is “we”.

If you have the cycles to go write each and every last extension, go for it.

If not, we need an answer to Julian’s disambiguation point.

Failing that, people have proven that they will go with xmlns-like approaches, warts and all.  And, yes, that means that if you chose to use SAX or XOM, and you wish to harvest some of this wonderful metadata, you might actually have to code an if statement or two.

Posted by Sam Ruby at

@Laurens: No, I don’t think it is useful to subset HTML for emerging platforms. I think platforms for which a full browser engine that can deal with all of HTML (or a thin client to full engine such as Opera Mini or Skyfire) is unavailable are uninteresting.

Besides, XHTML Mobile Profile proves that the “emerging platform” use case for Modularization is a failure. XHTML-MP didn’t follow the boundaries set by the XHTML2 WG (then known as the HTML WG).

If you wish to validate your extension in your own site back end system, why do you need normative modules on the spec level? Isn’t it enough to take an Open Source schema (perhaps monolithic, perhaps non-normatively modularized) and edit it however you like without having to feel constrained by module boundaries contemplated by a working group in advance?

Note that the whattf RELAX NG schema for (X)HTML5 is already non-normatively modularized. If you don’t like the module boundaries, that’s OK. You can edit the boundaries to be different without violating any Modularization rules set by a WG.

@Sam:

The key word in that sentence is “we”.

Rephrasing: If the HTML WG must state a “best practice” for extending HTML, why not formulate a “best practice” that isn’t poisonous to XML tools (i.e. one that doesn’t put colons in local names in apps that use the same bytes-to-infoset mapping as browsers)?

If not, we need an answer to Julian’s disambiguation point.

But “language extensions” only need lesser disambiguation, right? That is, disambiguation that lets multiple JS libraries (all chosen by the page author) not step on each other’s toes. Global disambiguation is unnecessary when the Rule of Least Power is violated and a JavaScript program is sent along markup that looks like declarative data.

Consider embedding a TIFF inside a PostScript program that display’s the TIFF. Who needs to know it is a TIFF? Recipients need to run the program to get an image displayed.

And, yes, that means that if you chose to use SAX or XOM, and you wish to harvest some of this wonderful metadata, you might actually have to code an if statement or two.

I take this to mean that you suggest that browsers and other types of HTML consumers use different bytes-to-infoset mappings for text/html. I suspect that varying parsing by class of product will lead to trouble. What would non-browsers put as the namespace when there’s nothing to resolve a prefix to, for example? Hard-wire interesting prefixes on a per-app basis? Surely causing issues like this should not be put forward as “best practice” even if the HTML WG can’t stop IBM/webBackplane from shipping ubiquity-xforms with colons?

Are you OK with breaking browser-side XPath for “language extensions”?

Posted by Henri Sivonen at

But “language extensions” only need lesser disambiguation, right?

That could very well be true.  Seems to work for CSS.  Java’s model for package names also seems workable.

Of course, the “HTML WG can’t stop IBM/webBackplane”.  That’s not how things work on the Internet.  Instead, and to paraphrase Mark Nottingham: the WHAT WG and the HTML WG have no-one to blame but themselves for this situation.  Ian Hickson won’t address this, and Chris Wilson consistently hasn’t.

Do you care to make a concrete proposal?  If you do make a general proposal and show how it could be applied to RDFa and XForms, and can enumerate the advantages, I think such a document would make an excellent companion to Rob’s document.

Posted by Sam Ruby at

Of course, the “HTML WG can’t stop IBM/webBackplane”.  That’s not how things work on the Internet.

Indeed. So what problem would be solved if the HTML WG sacrificed its Design Principles to endorse ubiquity-xforms?

Do you care to make a concrete proposal?  If you do make a general proposal and show how it could be applied to RDFa and XForms, and can enumerate the advantages,

I notice that you are evading my technical questions.

I have on various occasions said how RDFa could be modified not to violate the DOM Consistency Design principle, so in the interest of time, I opt not to repeat myself. I’m waiting at least until Hixie emerges with his proposal.

As for ubiquity-xforms: HTML5 already addresses the use case of using script-private markup to initialize the state of a JavaScript library for attributes: the data-* attributes. It seems that HTML5 is lacking analogous data-* elements.

Anyway, as I already pointed out, the minimal adjustment to ubiquity-xforms to make it not violate the Design Principle you are putting at jeopardy is simple: removing the colon (xfinput) or using another separator (xf-input). (Note that ubiquity-xforms may violate other Design Principles that you didn’t suggest the HTML WG “relax”.)

Frankly, I’m disappointed that you suggested sacrificing one of our Design Principles when the adjustment needed in order not to sacrifice it is trivial.

The advantage is that the resulting markup can be dispatched on in applications of any kind without special casing around the differences in the bytes-to-infoset mappings of text/html and application/xhtml+xml.

(Previously, you’ve asked me to call you on things that might be perceived as lack of transparency, so I note that you didn’t mention in your initial post that one of the two examples you were using to motivate the “relax[ation]” of the Design Principle in question is an IBM-sponsored product.)

Posted by Henri Sivonen at

So what problem would be solved if the HTML WG sacrificed its Design Principles to endorse ubiquity-xforms?

The question I would like to pose is this: under what conditions must another W3C Working Group be expected to comply with to the design principles established by this working group?

Regarding "evading technical questions”, I want to first understand why my particular point of view is relevant before I go throwing my weight around.  We are not talking about any changes to any browser here.  Instead we are talking about author conformance criteria, criteria which quite evidently is getting in the way of people doing what they consider to be useful and productive work.  That’s the part that I think needs to be justified.

And to date, the most sane observation I’ve heard on the matter is that such an objection wouldn’t be relevant to a document with no “author conformance requirements”.

Posted by Sam Ruby at

rob,i  only count as one evangelist.

sam, there is plenty to dislike about ian’s draft. but what it does offer is a vision of the web that normal mortals can code using open standards, which is what has excited people i’ve spoken to on university tours in india, indoonesia and the uk

Posted by bruce at

The question I would like to pose is this: under what conditions must another W3C Working Group be expected to comply with to the design principles established by this working group?

If the other WG wishes to collaborate with the HTML WG on deploying stuff in text/html and/or application/xhtml+xml media types and/or the http://www.w3.org/1999/xhtml namespace, I’d think, although I’m only speaking for my own guesses here and not a spokesperson of the HTML WG.

Instead we are talking about author conformance criteria, criteria which quite evidently is getting in the way of people doing what they consider to be useful and productive work.  That’s the part that I think needs to be justified.

If authors violate DOM Consistency, the markup is more complex to process in any application that has a unified above-parsing code path for HTML and XHTML (not just browsers; I, for one, care about the DOM Consistency Design Principle also from the point of view of developing a non-browser app and a library meant for non-browser apps in addition to caring it about it from a browser development point of view). I accept as a fact that markup for ubiquity-xforms (xf:input) and, for example, output from MS Word (o:p) violate the Design Principles of the HTML WG. It isn’t a matter of existence of violations of DOM Consistency out there. It is about how much of it will be out there, which is relevant to the probability that a given Web author or software developer needs to deal with the issue. I think the issue will be more contained and fewer Web authors and software developers will be exposed to it if the HTML WG doesn’t start endorsing practices that violate its Design Principles. Currently, authors and software developers can usually avoid having to deal with the issue of non-NCName local names.

Putting colons (or uppercase ASCII letters, which Brad mentioned, for that matter) in the names of elements or attributes in HTML extensions is a practice but not the “best practice”. Let’s not endorse it as such.

Why would ubiquity-xforms need an HTML WG endorsement? I cannot think of technical reasons, only marketing reasons. You still did not say what problem you are trying to solve.

Posted by Henri Sivonen at

You still did not say what problem you are trying to solve.

I’ll take this slowly.  Let start with the design principles.  They state, up front:

This document describes the set of guiding principles used by the HTML Working Group for the development of HTML5

Now, let’s take your example:

output from MS Word (o:p) violate the Design Principles of the HTML WG

That may be going too far.  I believe that it is fair to say if the HTML Working Group were attempting to address whatever use case o:p is purported to address, this Working Group would likely come up with a different solution.  Undoubtedly a “better” solution, but I’m skeptical it would be one that would be as widely adopted at o:p as it appears to be pretty much everywhere.  But as to the question as to whether the work that Microsoft clearly started well before the Design Principles were published violate said, carefully scoped, Design Principles?  I think that conclusion may be a bit overreaching.

So: what problem am I trying to solve?  I am trying to figure out if principles defined by the HTML Working Group designed to apply to the HTML Working Group for the purposes of the development of HTML5 apply elsewhere, specifically to other work products and to other Working Groups, much less other companies past, present, and future.

You seem to be assuming that they do, and that everybody who assumes otherwise has the burden of proof.

Meanwhile, it is not obvious to me that these Principles are universally true.  At the AC meeting, attended by 80-100 people, in a breakout session with perhaps 20-25 people, TV Raman made an observation that made some sense to me.  If we have features, like SVG, that are expected to be implemented by browser vendors and are expected to interact with other parts of the “platform” (his term, not mine), then YES, it does make sense to apply some of the same principles to such work.  Meanwhile, if other features, like RDFa, that are not expected to be implemented by the platform, and do not interact with other features in ways that can’t be easily addressed via, say, a JavaScript library, then the answer could very well be NO.

You continue to try to make the case that perhaps XForms could be designed “better”, for some definition of “better”.  As someone who is co-chair of the HTML Working Group, and as someone who authors content, and even as someone who writes code that parses and processes HTML content, I have been struggling to figure out why I should care.

At the moment, my opinion on the topics of ubiquity-xforms and RDFa is decidedly mu.  To date, I have found precious little that they “violate” that I care about.

So, just to make certain that I have thoroughly and exhaustively answered the question of "what problem I am trying to solve?":  I would like to get back to the business of defining HTML 5.  I do not like the fact that there are two different definitions for the q element.  I’m a bit concerned about the various definitions of the rel attribute.  But beyond that, I would love to see HTML 5 and features like XForms evolve completely independently.

Is that clear enough?

Posted by Sam Ruby at

output from MS Word (o:p) violate the Design Principles of the HTML WG

That may be going too far.  I believe that it is fair to say if the HTML Working Group were attempting to address whatever use case o:p is purported to address, this Working Group would likely come up with a different solution.  Undoubtedly a “better” solution, but I’m skeptical it would be one that would be as widely adopted at o:p as it appears to be pretty much everywhere.  But as to the question as to whether the work that Microsoft clearly started well before the Design Principles were published violate said, carefully scoped, Design Principles?  I think that conclusion may be a bit overreaching.

I see. o:p would violate the Design Principles if it were in their scope. I think having the HTML WG endorse o:p would put it in scope. However, the HTML WG presently doesn’t endorse o:p.

So: what problem am I trying to solve?  I am trying to figure out if principles defined by the HTML Working Group designed to apply to the HTML Working Group for the purposes of the development of HTML5 apply elsewhere, specifically to other work products and to other Working Groups, much less other companies past, present, and future. You seem to be assuming that they do, and that everybody who assumes otherwise has the burden of proof.

Only to the extent the HTML WG is expected to say something about work product of others or to the extent it is proposed the Design Principles be relax due to work product of others.

If you believe things are out of scope of the Design Principles, why do you propose relaxing the Design Principles due to things that are out of scope?

Meanwhile, it is not obvious to me that these Principles are universally true.

Without a doubt, there are other WGs that do not subscribe to the Design Principles to the HTML WG. It doesn’t follow that the HTML WG should relax its Design Principles to cater to those who do not subscribe to them.

You may have noticed that I have carefully tried to avoid debating any WG “reunification” in this thread. I’ve tried to focus only on your proposal not to uphold the Design Principles adopted by the HTML WG. Whether something violates the Design Principles is relevant to the extent you found it relevant to propose relaxing them.

You continue to try to make the case that perhaps XForms could be designed “better”, for some definition of “better”.  As someone who is co-chair of the HTML Working Group, and as someone who authors content, and even as someone who writes code that parses and processes HTML content, I have been struggling to figure out why I should care.

I think you shouldn’t care, and the HTML WG shouldn’t care. But you seemed to care to the point of suggesting the HTML WG relax its Design Principles.

At the moment, my opinion on the topics of ubiquity-xforms and RDFa is decidedly mu.  To date, I have found precious little that they “violate” that I care about.

Why propose “relaxing” the Design Principles, then?

So, just to make certain that I have thoroughly and exhaustively answered the question of "what problem I am trying to solve?":  I would like to get back to the business of defining HTML 5.

I don’t see how activities that would reopen the Design Principles would help with getting back to that business.

I do not like the fact that there are two different definitions for the q element.  I’m a bit concerned about the various definitions of the rel attribute.

I don’t like that situation, either, but I like relaxing the Design Principles less.

But beyond that, I would love to see HTML 5 and features like XForms evolve completely independently. Is that clear enough?

Not really. If you want them to evolve independently, what was the proposal to relax the Design Principles about?

Posted by Henri Sivonen at

Cool.  We seem to be making progress.  Now, lets move on to what I perceive to be the real crux of the issue.

The current HTML charter states:

The Group will define conformance and parsing requirements for ‘classic HTML’

Judging by the success criteria defined on that page, it appears that the term conformance as used on this pages is meant to focus specifically and only conformance as it relates to complete and interoperable implementations, but that is not clear.  Rereading the Design Principles I see nothing that indicates that they were intended to cover author conformance requirements, but I might have just missed it.

However, looking at the current HTML 5 draft, I see a document chock full of author conformance requirements.  Looking at your validator, (with the exception of ARIA) it identifies things that are not defined by the specification as non-conforming.

If that weren’t the case, I there would be no issue.

This leads to a number of related questions:

As I understand it, Rob’s answer to these questions would be no, and mu.  Your use of the word “violate” leads me to conclude that you would answer yes to the first question.  Ian’s path seems to be of striving towards the goal of answering yes to the second question.

My use of the word “relax” was an (apparently unfortunate) shorthand.  If the answers to these questions are yes, and no respectively, a reasonable follow on question would be to what extent should these design principles apply to “foreign” markup?  When I left the AC meeting, where my head was at was that they would apply in full to SVG, and most, but not necessarily all, should apply to RDFa.  I was not, in any way, thinking about this in terms of “endorsement”, instead I was looking to place a clear andn explicit limit on the scope of things that I personally cared about.

But it now seems like we (or rather I) are/am getting ahead of ourselves.  Answering the questions listed in the bullet points above is probably a better next step.

Posted by Sam Ruby at

@bruce

has excited people i’ve spoken to on university tours in india, indoonesia and the uk

which kind of people? Engineers or Web designers?

Posted by karl dubost at

I think that all this problem is caused by Extensibility, that is the desire to add new elements in HTML. I know that ubiquity-xforms and Microsoft Office rely on the possibility to add colons in local names (to fake namespaces).
But why should we ever endorse this pratice?

We currently have two serialization for HTML5: XML and HTML.
Why should one use XML?

- Because it can use different languages and technologies, such as XForms, XHTML2, SMIL, XInclude, XLink...
- Because it can use RDFa
- Because it can virtually use any element, anywhere, given a meaningful set of CSS
- Because the DOM will be always consistent, it doesn’t matter if the element is the HTML namespace or in the example.com namespace (or in the about:foo namespace)
- Because with XML namespaces have a real existance, while in HTML they’re fake

Ok, using XML is difficult. But it is not impossible, and this is blog is the demonstration of this: I can put <, >, “, control characters like "” (the BOM) and non utf-8 and everything is sanitized (at least I hope the process is automatic, not manual by Sam Ruby).

This is where extensibility belong: else, why should we keep XML? Just for standards geeks?

Giovanni

Posted by Giovanni at

Ok, using XML is difficult.

It is difficult, unforgiving, and not supported by IE, and requires an ability to set the Content-Type of a web page to a degree higher than evidence supports that one can reasonably expect.  We also have plenty of evidence of people evangelizing serving XHTML 1.x as text/html, and of people cross-copy/pasting from and to HTML and XHTML.  Finally, we have plenty of evidence that people don’t seem to take kindly to being told where they “belong”.

Posted by Sam Ruby at

@Henri: I was referring to extending XHTML, not subsetting it. I am aware that many think efforts such as XHTML MP are not very useful. Although I could think of other useful applications of subsetting, e.g. if you would want to use HTML markup as part of a publishing platform but do not want it to contain script, style, font etc., just pure semantic hypertext markup.

As for editing a schema instead of extending it, modularisation allows for easier maintenance as I clearly separate my extensions from the underlying schemas, and can easily update them as changes are being made. It provides a platform for extensibility, e.g. if I would offer my extensions as part of a framework, users of that framework would want to make their own additions as well. From a tag definition language such as XBL and/or documentation annotations, additions can even be auto-generated.

Of course it doesn’t need to be normative if that is a problem for some reason, but if you consider it as a platform, it is useful to at least normatively specify the module structure, conventions and extensibility mechanism. That said, actually using the XHTML Modularisation schemas could be easier. I would say it is fairly complex due in part to the limitations of XML Schema.

@Giovanni: At least for data exchange and storage formats, the syntax error recovering of HTML would not be good, it would silently corrupt your data and make parsing inefficient too. For authoring languages such as HTML, there are those who argue that the error lenience of HTML is more suitable, yet distributed extensibility is still important to have. Myself, I prefer that the user agent not try to guess my intentions when encountering an error, and instead to be notified of it so that they do not go by unnoticed.

Posted by Laurens Holst at

The current HTML charter states

Speaking of the charter, it also says: “This group primarily conducts its technical work on a Public mailing list public-html.”

I notice that Sean Fraser kindly informed the readers of the mailing list about this blog post. I suggest continuing the discussion on what the charter means and what the Design Principles apply to on the WG mailing list.

Posted by Henri Sivonen at

I notice that Sean Fraser kindly informed the readers of the mailing list about this blog post

Look closer at that email, in particular, look at who he quoted, and from where.

Posted by Sam Ruby at

Look closer at that email, in particular, look at who he quoted, and from where.

I had seen your reply to him first, because email arrived out of order to me. Somehow I had completely missed the footnote in you reply to Hixie. My apologies.

Posted by Henri Sivonen at

... ubiquity-xforms ... is implemented entirely by JavaScript and reportedly (I haven’t tried it) works across all popular browsers today with zero changes ...

This example (using the latest release of the library, with an example from their documentation) doesn’t seem to work for me in IE (I tried IE7, and IE8 in its three modes, and it creates the form fields (except in IE8 Standards mode) but doesn’t put the data in them). That could be an easily-fixable bug, though.

But whatever ubiquity-xforms is implementing, it doesn’t seem to be an accurate approximation of XForms, because this example doesn’t work in any browser. ubiquity-xforms seems to expect authors to use an (undocumented, as far as I can see) subset of the polyglot intersection of HTML and XML, e.g. you have to use the prefix “xf”, you have to be careful about self-closing tags, you can’t modify the document dynamically, you can’t use CSS3 Namespaces, you can’t use namespace-aware DOM APIs.

There’s a huge mismatch between XForms syntax and current text/html browser behaviour. Attempts to deal with the mismatch at a purely scripting level, like ubiquity-xforms, are necessarily going to suffer from these problems, and so authors will suffer from weird rules and bugs and inconsistencies. That’s a situation I’d really like to avoid. The mismatch could be partly avoided by changing the syntax to be more compatible with current text/html browser behaviour (e.g. using “xforms-input” instead of approximating XML Namespaces); or it could be partly fixed by changing text/html behaviour (e.g. parsing XML Namespace syntax in an XML-like way), if that can be done to a sufficient extent while supporting existing content.

Regardless of those issues, the script-based approach to implementing XForms is going to be pretty rubbish for users of Opera Mini, ELinks, NoScript, WWW::Mechanize::FormFiller, etc, because the language extension will be defined purely by the behaviour of scripts they can’t execute. The script approach is still useful for prototyping and as a compatibility hack, but the value of XForms on the web is severely limited unless it’s a ubiquitous cleanly-integrated part of the platform, so it needs to be designed in a way that can be cleanly integrated with the platform, not as a “language feature” that ignores the platform’s Design Principles.

Posted by Philip Taylor at

There’s a huge mismatch between XForms syntax and current text/html browser behaviour.

As always, Philip, excellent input.  Sometimes I have the feeling that if there were anybody on this earth who could make a convincing argument that the humble <p> tag could never work in HTML5, you would be that guy.  That being said, this particular input appears to be quite serious.  I’ll try to see if there is somebody in the XForms community who might not happen to read my weblog who would be willing to comment.

Posted by Sam Ruby at

It is difficult, unforgiving, and not supported by IE, and requires an ability to set the Content-Type of a web page to a degree higher than evidence supports that one can reasonably expect.  We also have plenty of evidence of people evangelizing serving XHTML 1.x as text/html, and of people cross-copy/pasting from and to HTML and XHTML.  Finally, we have plenty of evidence that people don’t seem to take kindly to being told where they “belong”.

Difficult: okay.
Not supported by IE: many features of HTML5 are not supported by IE (some are supported by nobody), yet they’re still there and author are starting to use them. You use <section>, <article>, <nav>, for example (and you had bugs in IE).
Requires the ability to set Content-Type: PHP supports setting Content-Type (using header()), Perl supports (and requires) Content-Type: (using print()), ASP.NET supports Content-Type, Java supports Content-Type; static files also support Content-Type, if they’re extension is .xhtm or .xhtml (I checked with clean latest Apache).
People use XHTML as text/html: that is not XHTML, is HTML4.0 with the wrong DOCTYPE declaration (that browser ignore, btw)
People copy-paste: I don’t copy some Javascript into a PHP page, or Visual Basic into COBOL sources, and expect them to work. The same, people should not expect that copy-pasting HTML into XML “just works”. Remember that here we’re talking about author conformance requirement, not UAs (that still need to parse <xf:input> in HTML).

Philiph said

There’s a huge mismatch between XForms syntax and current text/html browser behaviour.


but this is just because nobody really used XForms, they used an invalid extension mechanism for HTML, which looked like XForms.

That said, I don’t see any reason because all this mess, although it must be specified (and if we want, allowed), should ever be defined “conforming”.

Posted by Giovanni at

Giovanni: you skipped unforgiving.

The fact that XML is difficult and unforgiving means that you will make mistakes and that the consequences of such mistakes are dire.

Visual Basic looks markedly different than COBOL.  The differences between PHP and Ruby aren’t quite as visual striking, enough so that a common error in PHP refugees is to put a $ sign in front of a variable reference.  To the untrained eye, XML and HTML are indistinguishable.

Add to this the fact that XHTML 1.0 endorses the practice of serving XHTML as text/html, and a number of web standard activists in blue hats have advocated this practice, and the consequences are not only predictable, they are positively inevitable.

Posted by Sam Ruby at

Not supported by IE: many features of HTML5 are not supported by IE (some are supported by nobody), yet they’re still there and author are starting to use them. You use <section>, <article>, <nav>, for example (and you had bugs in IE).

Yes, but at least the failure mode is more graceful than “the page is not even rendered at all and a download dialog is shown instead”, which is how application/xhtml+xml is treated.

Posted by Aristotle Pagaltzis at

The fact that XML is difficult and unforgiving means that you will make mistakes and that the consequences of such mistakes are dire.

This is something that should be solved in the UAs, not in the Markup Spec. A better fault page is surely possible, they could even add a proposed correction.

Visual Basic looks markedly different than COBOL.  The differences between PHP and Ruby aren’t quite as visual striking, enough so that a common error in PHP refugees is to put a $ sign in front of a variable reference.  To the untrained eye, XML and HTML are indistinguishable.


JavaScript and ActionScript should be the same language (both implementations of EcmaScript 3rd/5th Edition), yet I would never put the same code in different contexts.

Add to this the fact that XHTML 1.0 endorses the practice of serving XHTML as text/html, and a number of web standard activists in blue hats have advocated this practice, and the consequences are not only predictable, they are positively inevitable.


XHTML 1.0 removed that text completely in the PER. XHTML Media Types endorses that practice only for backward-compatibility with existing user agents that don’t support XML (Internet Explorer).

Yes, but at least the failure mode is more graceful than “the page is not even rendered at all and a download dialog is shown instead”, which is how application/xhtml+xml is treated.

You can sniff IE and send text/html. Or you can send the appropriate XSL-T and a MIME of application/xml, which is perfectly supported in IE (only application/xhtml+xml is unknown, not any xml MIME type).

Posted by Giovanni at

This is something that should be solved in the UAs

And IE should support application/xhtml+xml.  But I see no evidence that either is likely to happen

I would never put the same code in different contexts.

If only all authors were as conscientious as you.

XHTML 1.0 removed that text completely in the PER.

Here’s the latest recommendation.

You can sniff IE and send text/html.

I do so.  Have done so for years.  Wouldn’t recommend that practice.

Posted by Sam Ruby at

You can sniff IE and send text/html. Or you can send the appropriate XSL-T and a MIME of application/xml

Yes I can, and in both cases all of the Javascript and CSS on the page needs to be audited because there have subtly different rules for how those are processed in application/xhtml+xml, text/html and application/xml contexts. So actually it’s no good solution at all.

Worse is that you need to send Vary: User-Agent, which will do a number on the cacheability of your site.

You can improve on that by prepending a cookie-setting routine that gives IE one cookie value and all other browsers another, and then use Vary: Cookie, in effect turning cookies into a custom cache key. But this in turn has its own set of problems and won’t prevent visitors from having to hit the origin server at least once.

On and on the list of issues goes. I have been thinking about it for years and never found a satisfactory solution.

If you think this is an easy issue to solve, think again.

Posted by Aristotle Pagaltzis at

Comic Update: Madness? This is HTML5!

Warning: this post falls into diatribe territory. I strongly feel that important technologies should be determined by consensus and not closed circles, and I’m not convinced that this is currently the case of HTML5. I seriously doubt that Ian...

Excerpt from CSSquirrel at

This example (using the latest release of the library, with an example from their documentation) doesn’t seem to work for me in IE (I tried IE7, and IE8 in its three modes, and it creates the form fields (except in IE8 Standards mode) but doesn’t put the data in them). That could be an easily-fixable bug, though.

Unfortunately, the wiki page with that example is a bit out of date in terms of the release it suggests (points to an October, 2008 SVN tag).  You should have better luck running with the current trunk.  In fact, I’d suggest a more aggressive example is loan.html available at [link] which should run fine in FF3, IE7, and Safari/Chrome (this link does load a bunch of files directly from SVN which are not yet packaged for production use so be patient and/or hit reload if required).

But whatever ubiquity-xforms is implementing, it doesn’t seem to be an accurate approximation of XForms, because this example doesn’t work in any browser. ubiquity-xforms seems to expect authors to use an (undocumented, as far as I can see) subset of the polyglot intersection of HTML and XML, e.g. you have to use the prefix “xf”, you have to be careful about self-closing tags, you can’t modify the document dynamically, you can’t use CSS3 Namespaces, you can’t use namespace-aware DOM APIs.

We use CSS to attach dynamically the javascript functions implementing xforms behaviors to elements in the DOM — whether loaded from the initial document or created dynamically by script during execution, which is certainly allowed and indeed core to the support for dynamic data-driven lists ("repeats" in xforms).  It’s a known requirement for us to generate these CSS rules dynamically by passing prefix declarations to the library’s namespace manager rather than looking only for xf or xforms as now.  The current code does not do this but this is clearly an implementation issue not fundamental for the library.

Issues such as being careful about self-closing tags are ones we’re looking to this group to help with in the future by supporting well-defined behavior for unknown tags.  We do focus considerable effort at unifying scripting behavior across local names with and without prefixes by providing utility functions to deal with these variations...a strategy not unlike other AJAX libraries.  I would hope that the outcome of a more standards-based means for extensibility would be to reduce or eliminate the need for such abstraction functions.

There’s a huge mismatch between XForms syntax and current text/html browser behaviour. Attempts to deal with the mismatch at a purely scripting level, like ubiquity-xforms, are necessarily going to suffer from these problems, and so authors will suffer from weird rules and bugs and inconsistencies. That’s a situation I’d really like to avoid.

Agreed this is a situation we’d like to avoid, which is why we’re very interested in nailing down preferred mechanisms for extensibility — whatever they wind up to be.  ubiquity-xforms is simply one example of a vocabulary that could benefit from more robust mechanisms for dealing both with the syntax and behavior of language extensions...hence this discussion.

The mismatch could be partly avoided by changing the syntax to be more compatible with current text/html browser behaviour (e.g. using “xforms-input” instead of approximating XML Namespaces); or it could be partly fixed by changing text/html behaviour (e.g. parsing XML Namespace syntax in an XML-like way), if that can be done to a sufficient extent while supporting existing content.

My sense from the discussion at the recent AC meeting is that while namespaces are seen as one required extension mechanism for XHTML, others may be introduced for HTML.  If so, they would almost certainly need to be supported as well in XHTML so that authors could easily move content between these formats...so perhaps the debate comes down to which community needs to support the compatibility mode.  In any event, libraries such as ubiquity-xforms would benefit from having this debate resolved clearly one way or the other.

Regardless of those issues, the script-based approach to implementing XForms is going to be pretty rubbish for users of Opera Mini, ELinks, NoScript, WWW::Mechanize::FormFiller, etc, because the language extension will be defined purely by the behaviour of scripts they can’t execute. The script approach is still useful for prototyping and as a compatibility hack, but the value of XForms on the web is severely limited unless it’s a ubiquitous cleanly-integrated part of the platform, so it needs to be designed in a way that can be cleanly integrated with the platform, not as a “language feature” that ignores the platform’s Design Principles.

I don’t see it at all as a compatibility hack.  Clearly, if a platform doesn’t support scripting then it can’t support script-based tag libraries.  But I believe it’s a valuable practice to support innovation and evolution in the community by means of script-based tag extensions which over time might or might not find themselves implemented deeper in the platform.  Indeed, there may be vocabularies targeted to specific industry or vertical application communities that never need platform support but nonetheless would help raise the level of abstraction of their content by means of well-defined script-based runtime support.

Posted by Charlie Wiecha at

Sam Ruby on HTML Reunification

Sam Ruby, recently appointed as co-chair of the W3C HTML Working Group, is starting to explore directions for unifying HTML 5 and XHTML.  For anyone who cares about Web technology or HTML, Sam’s posting is highly recommended.
For those who don...... [more]

Trackback from Arcane Domain

at

About XForms support on browser-side, it appears that the XSLT+Javascript approach is much cleaner than a Javascript only approach.

I have reengineered AJAXForms project to process XForms to (X)HTML+Javascript conversion with a unique XSLT 1.0 stylesheet. My project is named XSLTForms.

Browsers support XSLT 1.0 very well (namespace axis is not yet supported by FireFox but it could be fixed soon...) and their XSLT engine is fast and stable.

It is effective XML that is sent from the server to the client with effective namespace support.

Adding a processing instruction is not more difficult than adding a <script src="..."> element. With XSLTForms, there is no other change to apply to the XForms document : xf:model is in the head element,...

About namespaces in external CSS stylesheets, XSLT 1.0 can parse CSS instructions only if a document element is added so the document() function can be used. XSLTForms implements this.

I’m now convinced that the XSLT approach is a good one to process XML dialects on browser side.

Posted by Alain COUTHURES at

Thoughts on Ruby's "HTML Reunification"

I was off-the-grid when Sam Ruby of IBM, co-chair of W3C HTML Working Group wrote an essay on HTML 5 which sparked many comments, followups. Some of it went off into details, but here are some of the higher-level ideas I pulled out of it. Sam...

Excerpt from Adobe Blogs at

Sam,

Typing this up by email because I still dont trust browsers for such things.

In the somewhat Talmudic expose' that your reunification of xhtml2 and html5 essay generated, one of the  tributary threads discussed the use of the Ubiquity Forms library to implement XForms in today’s browsers. In that process, it (incorrectly I believe) concluded that if that were done, then other non-browser consumers e.g. Perl’s WWW:Mechanize would be left holding the short end of the stick, because they’d have to consume the Ubiquity Forms library written in JS.

The above conclusion is erroneous. In the world where there is no agreed upon declarative markup to author forms, Perl libraries like WWW:Mechanize are indeed left in a position where they must implement all of JS  in order to consume online forms. But when you first create an authoring syntax — along with defining the underlying processing model — and that independent of a given runtime, then you can later consume content that is authored to that specification in multiple runtime environments — so:

Notice that this is all possible because authors and content creators are  creating content — not code — and different implementations consume that content using  whatever runtime (JS, Perl, or Python) that is appropriate for them.

Please feel free to include this in your blog as appropriate.


Best Regards,
--raman

Posted by Sam Ruby at

If vendors of UAs built on WWW::Mechanize, and any other UAs that don’t have a scripting engine, will have to write their own implementation of the XForms spec in order to successfully process content on the web, then we’ll want it to fit cleanly into the UAs' existing processing models and to not conflict with the processing of existing content and to be clearly specified so we can get interoperable implementations and so on. That sounds like the definition of a ‘platform feature’, not of a ‘language feature’.

Projects like ubiquity-xforms are a useful compatibility bridge for bringing new features to users; but I don’t see how they can be used as excuse to allow “[d]ifferent burdens of proof and policies”. Violating the HTML5 design principles will result in the same problems regardless of whether it’s possible to write a script library that emulates the feature in browsers, except to the extent that the library mitigates the “degrade gracefully” principle.

Rather than ‘platform feature’ vs ‘language feature’, the issues discussed in the original blog post seem to be more accurately described as ‘features for which we have to cooperate with the major browser vendors’ vs ‘features for which we can ignore the browser vendors’. In that case I can understand how ubiquity-xforms is an example of a feature in the second category. But that doesn’t excuse it from following the same design principles as features in the first category.

As it happens, the highest concentration of people who understand the reasoning behind the design principles are in the HTML WG and are often employed by browser vendors. As I see it, those people only want the HTML WG to have the “gatekeeper” role because it’s the best way to discourage features that disregard the design principles. If other groups demonstrated that they could follow the principles without any involvement from the HTML WG, that’d be great. But that currently doesn’t seem to happen much - e.g. XForms in text/html (as currently implemented by ubiquity-xforms, and needing to be implemented independently by non-scripted UAs) requires changes to the processing of namespace syntax (risking “support existing content"), fails to "degrade gracefully” (unlike most of Web Forms 2), fails at “evolution not revolution” by replacing the entire form system with something completely new, fails at “DOM consistency” by using namespace syntax that is parsed very differently in text/html and XML.

The newer ‘XForms-for-HTML’ draft (which I believe was developed largely as a response to the concerns of the HTML WG) seems to address many of those issues, so that’s promising, and it probably wouldn’t have happened if the ill-fitting XML-centric XForms syntax had been accepted as part of HTML, instead of being argued against by HTML WG members who want it to follow the same design principles as any other feature.

(Hmm, now that I read some of the earlier comments, it looks like Henri already said much the same things in comment #2. Oh well.)

Posted by Philip Taylor at

TV Raman on XForms and Screen-Scraping

TV Raman , who ought to know a thing or two about screen scraping, comments on Sam Ruby’s HTML Reunification and shows that the shibumiscript approach makes things easier for scripts , not harder. Needless to say, we agree....

Excerpt from shibumiscript at

HTML 5

Awhile ago I posted a pointer to Sam Ruby’s efforts to unify XHTML and HTML 5.  Indeed, lots of people have considered lots of ways of getting the best of both of these technologies, but the net result  is that W3C has now decided to focus...

Excerpt from Arcane Domain at

The HTML5 Equilibrium

HTML5 is a strange character with what appears to be a split personality. Hardly surprising then that something so divided would appear to be so divisive. First of all, there’s the spec itself. The specification HTML5 walks a fine line between...

Excerpt from Adactio at

Add your comment