Note: the correct display of some characters in the presentation
may depend on what fonts you have installed. Some pages may
display better on Mozilla on large screens, but everything should
mostly work cross browser.
RE: Attractive Nuisance
I didn't understand what you meant in the slide "Escaping in XML is broken". Can you expand on that a little bit.
The fact of the matter is that people will (and have!) created content in Atom that they declare to be well formed that is not, and declare that is properly double escaped, again that is not.
Being a text format, XML is an "attractive nuisance" in that it encourages people to create documents with technologies as simple text based templates. My weblog being a prime example. I've gone to extraordinary lengths in an attempt to compensate, but things still slip through momentarily.
While I wish XML had picked a less error prone syntax, at this point I don't have a suggestion for solving this problem (frankly, the thought of binary XML gives me pause); but the point is (1) this is something that people need to be aware of, and (2) not all the blame can, or should, go to the people who have fallen into this trap - the authors of the spec need to shoulder some of the responsibility.
Sam,
So it seems you aren't saying anything is broken per se. Just that merely by being text based and having a similar syntax to HTML it encourages the ViewSourceClan to make mistakes. I buy that, that has been my experience working with XML over the past few yers.
Dare: let me make an analogy: the WSDOT periodically identifies some intersections as HALs. It then treats this list as bugs reports, and schedules projects to correct these problems.
Sam,
This goes back to my original question then. What exactly is broken in XML that you think needs to be fixed to correct these problems? So far it seems the main issue you've pointed out is that it looks deceptively simple to the average web developer. However I don't think coming up with a more complex looking replacement for XML is a feasible suggestion. So what do you think can and should be done?
per Attractive Nuisance slidus atėjau į gerą svetainė, kur rašoma daug naugingų dalykų apie eXP ir šiaip programavimą ir planavimą (design). Beto labai fainas (smagus, esminis, ++ - kaip pasakytų kolegos) projektas yra Yellow bike. Žmonės panašūs į poną Zuoką (bet, pasirodo, taupeni, atsagesni) rado senų dviračių sutaisė, rado kas juos padažytų, rado kas paženklintų dviračius ir paliko dviračius gatvėje. Pradėję nuo 10, istorijos užrašymo moemntu jau turi 100 ir ruošė dar 100. Tokios istorijos.......
[more]
Sam Ruby’s slides on the pitfalls around Unicode, XML, HTTP etc. The analogy is a box of matches to an 8 year old. Ruby postulates: The accuracy of metadata is inversely proportional to the square of the distance between the data and the...
Is the Planet RDF code available? I would gladly provide a patch.
The problem is not in your input feed. You contain the following code: ’. Planet RDF converts that into binary, which I will express in hex: xC3A2C280C299. The correct formulation would be xE28099. I've got a good idea about what is going on under the covers as
â => xC3A2
€ => xC280
™ => xC299
In other words, for some reason, RDF Planet is effectively doing a iso-8859-1 to utf-8 conversion, on utf-8 data.
I had a funny experience reading these slides, since my browser window isn't that big and I'm using Konqueror; the title was half-hidden, the >> sign was outside the visible area, and the << sign was not to be seen, so I didn't find anything to click on.
That provoked me to think about the only text that I actually saw -- "How did you learn to read HTML?" -- and take it as the first step in a quiz.
And, though it turned out that that was not what you'd intended, "View Source" did eventually help me to figure out how I was supposed to navigate the slides ;-)
tidy emits UTF-8 encoded bytes and python attempts to read them as ASCII
If the code is reading a stream of utf-8 encoded bytes as if it were a stream of characters, and then one attempts to output those characters as utf-8, you would effectively end up with a iso-8859-1 to utf-8 conversion being done on utf-8 encoded characters.
Is the Planet RDF code available? I would gladly provide a patch.
Your patch does utf-8 encoding earlier, making a Python Unicode string (u'foo'). I
had to remove a later bit of code that tried to utf-8 encode things again that writes
the RSS content body. I'm not sure if that's fixed what you thought was broke. There
is still a mess in encoding titles from rss:title and my attempt to fix that just gives
a rather unhelpful python error that it cannot write Unicode to a file, only allowing ascii.
Python's Unicode support remains user hostile at every turn.
My patch does utf-8 de-coding earlier, making a Python Unicode string. Note: the latest feedparser uses a real XML parser whenever possible, so you would start out with a unicode string for this feed if a current version of the feedparser were used.
Python has two string-like data types: str and unicode. Python's support for unicode data in str objects is very user hostile. The reverse is not true. There, IMHO, the strategy should be to cleanse the data as early as possible, and to encode to utf-8 as late as possible - thereby keeping the data in unicode as long as possible.
The biggest dilemma I see for XML and the related technologies is that their reason for existience is to provide an agreed-upon mechanism for interoperability across platforms and applications. but those agreements are hard to update without...
The biggest dilemma I see for XML and the related technologies is that their reason for existience is to provide an agreed-upon mechanism for interoperability across platforms and applications. but those agreements are hard to update without...
Attractive Nuisance contains a link to an interesting slideshow about XML encoding problems, starting from charsets going over to attribute order, whitespaces, entities, double escaping etc. The slide titled “QNames” just reads "don’t even get me...
Paraphrasing Sam Ruby: Jon Udell: HTTP toolkits make it easy to do the wrong thing, hard to do the right thing. Dare Obasanjo: del.icio.us, flickr, and Bloglines use GET for edit resources. Sam Ruby: AJAX toolkits must beware of how they use GET....
Oh, look at the date. It’s been a while, hasn’t it? Indulge me if you will in part 6 of the Things Fall Apart series. A slight detour perhaps, or rather a journey into the heart of dar- ... technology, and matrimony.A Social Bookmarking Affair...
A long, slightly rambling, but deeply technical, entry follows, so if you aren't interested in software internationalization, character sets and variable type systems, you might want to skip this entry entirely; you have been warned.
how could Planet RDF do things better? Danny, you can see the problem above. Is the Planet RDF code available? I would gladly provide a patch. The problem is not in your input feed. You contain the following code: ’. Planet RDF converts...
DevCon: Fundamentalism - “The accuracy of metadata is inversely proportional to the square of the distance between the data and the metadata.” ([link])...
So Jon Udell, in the midst of demonstrating the increasing maturity of query languages (XPath and XQuery) on xml databases, generates statistics of bloggers he reads who most frequently cite books on Amazon.com. In analyzing the data, it is clear...
An open note to my some of my favourite loosely-coupled people Phil Wainewright , Jon Udell , Sam Ruby , Tessa Lau and Monsieur Feinberg amongst others (connecting once again). Glue Layer People | Technology Adoption and Systems Design | A lighter...
I give you the slightly updated slides to a talk I gave on Friday to the Lotus Workplace Architecture Board. The topic was REST - The web style An argument about an outlook on technology, on complexity, layering and leverage. (It’s also available a...
Ruby’s postulate : The accuracy of metadata is inversely proportional to the square of the distance between the data and the metadata. Alternatively: Keep your friends close but your meta-data closer...