http://www.intertwingly.net/blog/2485.atom ../favicon.ico Sam Ruby It’s just data Sam Ruby rubys@intertwingly.net . 2007-02-05T12:06:57-05:00 tag:intertwingly.net,2004:2485 XML 2.0?

Anne van Kesteren: The time has probably arrived to define graceful error handling for XML, put some IETF, W3C or WHATWG sticker on it, label it XML 2.0 and ship it. Perhaps we can drop this internal subset thing in the process.

I’ve been slowly but steadily prototyping this in the html5lib svn repository.  Since this post, I’ve added both W3C DOM support and can produce SAX2 events (with or without namespaces) from that DOM.  A SAX w/namespaces interface will make it easier for me to replace sgmllib as the fallback in the event of XML parsing errors in the Universal Feed Parser.

2007-01-26T19:05:13-05:00
tag:intertwingly.net,2004:2485-1169879697 http://dehora.net/journal 83-70-225-120.b-ras1.prp.dublin.eircom.net form Bill de hOra XML 2.0?

XML 1.7 would be fine, thanks.

no dtds
no default namespaces, less wordfudging around xmlns
no qnames in content (as ws-* is expiring, I assume no-one will need this in 2010)
a processing model for inherited/assumed attributes
a processing model for empty v not present (good luck with that)
mI defaulting for foreign markup
hixie as an invited expert

Most crappy XML results from string concat/interpolation (I’m ignoring encoding). Why not allow publishers to mark up elements as 'nofail'?

2007-01-26T20:34:57-05:00
tag:intertwingly.net,2004:2485-1169883685 http://intertwingly.net/blog/ http://intertwingly.net/id/ form Sam Ruby XML 2.0?

no default namespaces

WTF?

2007-01-26T21:41:25-05:00
tag:intertwingly.net,2004:2485-1169883919 http://www.megite.com/scobleizer/1169858260/1#item_14 excerpt Megite Scobleizer News: What's Happening Right Now XML 2.0? Anne van Kesteren: The time has probably arrived to define graceful error handling for XML, put some IETF, W3C or WHATWG sticker on it, label it XML 2.0 and ship it. Perhaps we can drop this internal subset thing in the process. (Read on Source)... 2007-01-26T21:45:19-05:00 tag:intertwingly.net,2004:2485-1169894720 http://dev.upian.com/hotlinks/archives/2007/01/27/#item69165 excerpt HotLinks - Level 1 Sam Ruby: XML 2.0? wearehugh : Sam Ruby: XML 2.0? Tags : dracos standards xml... 2007-01-27T00:45:20-05:00 tag:intertwingly.net,2004:2485-1169900105 http://franklinmint.fm user-12ld2ss.cable.mindspring.com form Robert Sayre XML 2.0? No DTDs, no namespaces and no URIs used for anything syntactic, no processing models, no working group, no standards organization, no mailing list, no wiki, no blog, no trackbacks, no +1s (do people know how stupid that is?), no conferences, no meetings, no telcons, no charters. Do allow decentralized extensions. 2007-01-27T02:15:05-05:00 tag:intertwingly.net,2004:2485-1169925310 http://diveintomark.org/archives/2007/01/27/links-for-2007-01-27 excerpt dive into mark links for 2007-01-27 Googlebombs Defused? interesting mostly for the list of attempted googlebombs (tags: google) XML 2.0: XML with graceful error handling? - Anne’s Weblog (tags: xml standards dracos) Sam Ruby: XML 2.0? (tags: xml standards dracos) Results of mobile... 2007-01-27T09:15:10-05:00 tag:intertwingly.net,2004:2485-1169928448 http://dehora.net/journal 83.70.225.120 form Bill de hOra XML 2.0?

“WTF?”

Here’s an exercise: take the namespaces recc and remove the default namespaces feature. See whether you think the document is technically better or worse, more or less ambiguous.

“no +1s (do people know how stupid that is?)”

Hi Rob. Nice to see you haven’t lost any of your charm.

2007-01-27T10:07:28-05:00
tag:intertwingly.net,2004:2485-1169928825 http://diveintomark.org/id/ http://diveintomark.org/id/ form Mark XML 2.0?

no working group, no standards organization, no mailing list

Just because you’ve been kicked out of every working group you’ve ever disrupted — to the point that the IETF felt compelled to write an RFC explaining how to deal with difficult people who disrupt working groups — doesn’t mean the format is broken.

2007-01-27T10:13:45-05:00
tag:intertwingly.net,2004:2485-1169930576 http://dehora.net/journal 83.70.225.120 form Bill de hOra XML 2.0?

“WTF?”

Dammit Sam Ruby, I was about to start on my weblog code, and you stopped me cold with an ugly fact. My feed is now sane. I didn’t use atom: as a prefix (i’ll be curious to see if I get any error reports from downstream).

Back to work.

“ Nice to see you haven’t lost any of your charm.”

No need for that I suppose. Sorry Rob.

2007-01-27T10:42:56-05:00
tag:intertwingly.net,2004:2485-1169933173 http://www.詹姆斯.com/ http://www.xn--8ws00zhy3a.com/openid form James Holderness XML 2.0?

My feed is now sane. I didn’t use atom: as a prefix (i’ll be curious to see if I get any error reports from downstream).

You can obtain a partial list of the clients that are now incapable of reading your feed here.

2007-01-27T11:26:13-05:00
tag:intertwingly.net,2004:2485-1169937146 http://www.25hoursaday.com/weblog c-67-168-165-27.hsd1.wa.comcast.net form Dare Obasanjo XML 2.0?

Bill,
I have to agree that default namespaces is probably one of the worst features of the XML namespaces spec [the worst being namespace names being URIs instead of URLs]. However I don’t think it would have been politically feasible to do anything else due to requirements from XHTML [if memory serves me correctly].

James,
  Interesting test cases but they don’t really show which readers have problem with namespaces in XML document, it shows a problem with passing around XML fragments [and the correct XML namespace context] with those fragments. Most of the readers including RSS Bandit, are just sending the contents of atom:content to the browser without fixing up the namespace context. Which means you end up with some HTML and an island of XML with funky tag names (h:div, h:li) in your reader. Funny enough, this is an example of where default namespaces make thing work “as expected” most of the time.

2007-01-27T12:32:26-05:00
tag:intertwingly.net,2004:2485-1169944124 http://franklinmint.fm user-10876hc.cable.mindspring.com form Robert Sayre XML 2.0?

Hi Mark,

I am not sure why that brief opinion on the necessity of a WG and trappings made you feel the need to attack me personally, but I am not surprised.

I have been kicked out of the Atompub WG before, and I don’t regret it. The rest of your comment is incorrect.

It seems to me that the idea is good enough to start implementing and using it, and then producing an RFC as a by-product. This how JSON went, and it worked pretty well.

2007-01-27T14:28:44-05:00
tag:intertwingly.net,2004:2485-1169955256 http://waffle.wootest.net/ 1-1-6-29a.upv.sth.bostream.se form Jesper XML 2.0? What is it about angle brackets that makes people start insulting each other? 2007-01-27T17:34:16-05:00 tag:intertwingly.net,2004:2485-1169955870 http://lurking.org/lurker/ ppp182-135.lns2.mel4.internode.on.net form Adam Fitzpatrick XML 2.0? It’s the pointy corners, Jesper. I reckon we should just move everything to S-expressions, with their soothing round parentheses. Then we could be free of these bitter, divisive arguments about markup forever. 2007-01-27T17:44:30-05:00 tag:intertwingly.net,2004:2485-1169957415 http://waffle.wootest.net 1-1-6-29a.upv.sth.bostream.se form Jesper XML 2.0? Ah, yes, replaced by bitter, divisive arguments about LISP. 2007-01-27T18:10:15-05:00 tag:intertwingly.net,2004:2485-1169961310 http://del.icio.us/url/2d27dd5ef153cb0bd64c4dc3a27c9225 excerpt del.icio.us/wearehugh Sam Ruby: XML 2.0?
[link]...
2007-01-27T19:15:10-05:00
tag:intertwingly.net,2004:2485-1169978732 http://www.詹姆斯.com/ http://www.xn--8ws00zhy3a.com/openid form James Holderness XML 2.0?

Interesting test cases but they don’t really show which readers have problem with namespaces in XML document

I’m sorry, I should have been clearer. I was referring specifically to test case 1 (Atom namespace mapped to a prefix). While RSS Bandit handles that quite happily, there are many aggregators that do not.

2007-01-28T00:05:32-05:00
tag:intertwingly.net,2004:2485-1170121701 http://sethg-prime.livejournal.com http://www.livejournal.com/openid/server.bml form Seth Gordon XML 2.0?

No matter how XML’s error-handling rules are refined, there will be some XML documents where the author intends the document to be interpreted as A, the rules state the document should be interpreted as not-A, and clients that interpret the document as A will win market share over clients that follow the rules.

This will be true for as long as (a) there are widely-used tools that generate XML in error-prone ways, e.g., by string concatenation, and (b) there are widely-used clients for reading XML whose users expect to see nicely formatted text in a browser.

2007-01-29T15:48:21-05:00
tag:intertwingly.net,2004:2485-1170122696 http://intertwingly.net/blog/ http://intertwingly.net/id/ form Sam Ruby XML 2.0?

Seth: the statements you make are true, even for well-formed documents produced by serializing a DOM.

However, the conclusion you draw doesn’t follow.

What the WHATWG is doing is looking at the problem from the other end: what are browsers doing?  Can we write that down so that others who wish to consume HTML can at least do it consistently with what the browsers with marketshare are doing? 

I also believe that what the browsers are doing captures years of experience of how best to deal with issues like character encoding issues, unescaped ampersands, unmatched quotes and the like.  Many of these same issues apply to poorly produced XML.

2007-01-29T16:04:56-05:00
tag:intertwingly.net,2004:2485-1170152738 http://golem.ph.utexas.edu/~distler/blog/ https://golem.ph.utexas.edu/cgi-bin/MT-3.0/plugins/openid-server/server.cgi form Jacques Distler XML 2.0?

What the WHATWG is doing is looking at the problem from the other end: what are browsers doing?  Can we write that down so that others who wish to consume HTML can at least do it consistently with what the browsers with marketshare are doing?

To the extent that browsers have converged on a certain behaviour, it makes sense to standardize that, so that others do not have to reinvent the wheel.

Since HTML-producers are conditioned to expect the behaviour of the dominant browsers (and have correspondingly adjusted their error-laden content), there’s very little competitive advantage to be had in trying to “do better.”

But there clearly isn’t any such consensus in the XML-parsing world. Which makes the argument for adopting a particular error-handling behaviour in XML much less compelling.

2007-01-30T00:25:38-05:00
tag:intertwingly.net,2004:2485-1170152905 http://golem.ph.utexas.edu/~distler/blog/ https://golem.ph.utexas.edu/cgi-bin/MT-3.0/plugins/openid-server/server.cgi form Jacques Distler XML 2.0? Hmmm. It seems the old OpenID server works. I guess I’ll have to find an OpenID client, for MovableType, that doesn’t suck... 2007-01-30T00:28:25-05:00 tag:intertwingly.net,2004:2485-1170173951 http://intertwingly.net/blog/ http://intertwingly.net/id/ form Sam Ruby XML 2.0?

But there clearly isn’t any such consensus in the XML-parsing world. Which makes the argument for adopting a particular error-handling behaviour in XML much less compelling.

I will point out that both HTML and XML tend to be produced using similar processes and therefore tend to produce similar errors.  Enough so that the Universal Feed Parser, upon which Venus is based, falls back to an SGML parser if processing with a “real” XML parser is unsuccessful.  While that process consistently produces demonstrably good results, I’ve looked at what has been defined for html5, and it (with a few minor tweaks) would produce even better results.

And of course, there’s the distributed extensibility that only XML at this point can bring.  There is no way at this time to define new grammars in HTML, like MathML and SVG.  And the prospect of requiring draconian error processing to the entire page as a pre-requisite for embedding even the smallest amount of either has proven to be a non-starter.

2007-01-30T06:19:11-05:00
tag:intertwingly.net,2004:2485-1170203839 http://devonyoung.com/ cpe-24-169-143-244.rochester.res.rr.com form Devon XML 2.0? I just hope that if there’s an XML 2.0 that has WHATWG style error handling, that people implement it and use it widely enough that XML 1.x becomes like Netscape 4.x... no need to support it. 2007-01-30T14:37:19-05:00 tag:intertwingly.net,2004:2485-1170227276 http://www.tbray.org/ongoing/When/200x/2007/01/30/XML-2 excerpt ongoing XML 2.0? Anne van Kesteren suggests an XML 2.0 mostly defined by less-Draconian error handling, provoking further discussion over chez Sam Ruby.... 2007-01-30T21:07:56-05:00 tag:intertwingly.net,2004:2485-1170230003 http://antone.geckotribe.com/alpha-gecko/ 68-118-151-112.dhcp.gdis.ne.charter.com form Antone Roundy XML 2.0?

no default namespaces


I’m for keeping default namespaces, but changing the interpretation of un-prefixed attributes to put them in the default namespace (rather than no namespace). To enable omission of prefixes for attributes within elements that aren’t in the default namespace, we could make a colon (with no prefix before it) shorthand for “same namespace as the parent element”. For example:

<myprefix:foo :bar="asdf" />

could be used as shorthand for:

<myprefix:foo myprefix:bar="asdf" />

That would make for more straightforward procedures for interpreting attributes, and even more so, for adjusting prefixes when combining documents.

2007-01-30T21:53:23-05:00
tag:intertwingly.net,2004:2485-1170380223 http://dehora.net/journal 83.70.221.27 form Bill de hOra XML 2.0?

“but changing the interpretation of un-prefixed attributes to put them in the default namespace”

I’ve seen that done in practice; it doesn’t work out. It means you can’t compose markup safely due to scoping issues.

“To enable omission of prefixes [...] could be used as shorthand for:”

I think inventing shorthands and conveniences and assumed values and is one of the problems with XML in the field. It always causes problems because markup/declarative types don’t think through the consequences of acquisition and inheritance semantics and lexical scoping the way software types do.

2007-02-01T15:37:03-05:00
tag:intertwingly.net,2004:2485-1170713216 http://antone.geckotribe.com/alpha-gecko/ 68-118-151-112.dhcp.gdis.ne.charter.com form Antone Roundy XML 2.0?

“but changing the interpretation of un-prefixed attributes to put them in the default namespace”

I’ve seen that done in practice; it doesn’t work out. It means you can’t compose markup safely due to scoping issues.

Could you give a concrete example of what you mean? I don’t see how it’d be any different from having a default namespace for element names. Even for element names, you always have to know which namespace name is associated with which prefix (or no prefix) in the scope in which the markup is being composed to avoid problems. And you always have to take care to carry the namespace declarations with you when copying markup from one context to another.

“To enable omission of prefixes [...] could be used as shorthand for:”

I think inventing shorthands and conveniences and assumed values and is one of the problems with XML in the field. It always causes problems because markup/declarative types don’t think through the consequences of acquisition and inheritance semantics and lexical scoping the way software types do.

As a “software type”, it seems pretty straightforward to me:

if (AttributeNameDoesntContainColon())
  ApplyDefaultNamespace()
else if (AttributeNameDoesntHavePrefix())
  ApplyParentElementNamespace()
else
  ApplyNamespaceSpecifiedByPrefix()

...which seems at worst no worse than the current state of things:

if (AttributeNameHasPrefix())
  ApplyNamespaceSpecifiedByPrefix()
else
  AttrIsntInAnyNSInterpretUsingContextProvidedByParentElement()

...which I wouldn’t have so much problem with (a little, but not as much) except for two things:

1) Given that attributes sometimes are prefixed and thus are in a namespace, and given that default namespaces exist for elements, its not intuitively obvious to someone who hasn’t read and correctly understood the specs that unprefixed attribute names aren’t in the default namespace.

2) In the following example, the same namespace has to be referenced twice if it is to be used as the default namespace:

<foo xmlns="tag:a" xmlns:taga="tag:a" xmlns:tagb="tag:b">
  <tagb:bar taga:asdf="qwerty" />
</foo>

2007-02-05T12:06:56-05:00