DontUseXml - Atom Wiki

(moved from SyntaxConsiderations)

[Bigor RefactorOk] XML is about *extensibility* and backward compatibility. Name-value syntax that is used in RSS 3.0 proposal is derived from SMTP and HTTP-like message format, which is applicable in context of messages. And BTW the plain text syntax proposed in RSS 3.0 is not any different than XML - it uses assigned names instead of tag names. And it's flat - no hierarhies - just bunch of assigned names : and values. I suppose ":" have to be escaped in *special* cases. Than what to do if assigned name is suddenly absent (a.k.a *optional*)? What to do if description body includes two blank lines - see RSS 3.0 - "An item ends at the first blank line (that is, a line with no characters)".

Plain text is *seeming* simple every time the process parse-consume is done as single transaction.

Bad example of Name-value pair is Microsoft MAPI - MAPI Storage was designed as "object" storage - everything is an Field and every Field can be accessed using MAPI_TAG. MAPI_TAG is a typical example of assigned name. And there are multiple HardToRememberTagNames in MAPI.H file.

I think that plain text is NOT ALWAYS a roadmap to simplicity - take a look at http://www.laputan.org/gabriel/worse-is-better.html - in the section named The Rise of Worst is Better - "Completeness -- the design must cover as many important situations as is practical. All reasonably expected cases must be covered. Simplicity is not allowed to overly reduce completeness". Thanks.

[AaronSw RefactorOk] XML is too complicated; it should be a simple text syntax like RSS 3.0. [MorbusIff] Or YAML. [WhyTheLuckyStiff] A YAML equivalent is presented on EchoExampleYaml for reference.

[DiegoDoval RefactorOk] I think it should be as simple as possible. I have no issue with whether something's difficult to parse or not (and I don't consider XML difficult to parse, and I like its clear structure, so my vote is for XML, although I don't mind something like Aaron's proposal if the i18n issues, boundaries (is the limit of an entry a line? with \n? or \r?, etc), and other things are well defined, which seems to basically duplicate a lot of what XML would do). Parsing is the least of our problems. Building useful, interesting functions that use the data is the problem, and a standard format helps because everyone has the same base of data. That is: in my opinion, creating new "tags" through namespaces isn't the issue, parsing those tags isn't the issue, the issue is who will do anything useful with those tags, either from an information-processing point of view, or from a UI point of view (by giving more richness to the data presented). From an application-writer's POV, one can only spend time supporting features that affect a large number of users: features that can ride on the "network effect" created by the same format being used by all. App writers (particularly small sw developers) will end up targeting the lowest common denominator and so the support of namespaces does become an unnecessary burden. On the other hand, namespaces provide extensibility for custom apps (and for new ideas to emerge), so what Arve said above sounds good to me: "I'd prefer a base specification that is complete enough to make usage of them rare". I'd just qualify that by saying "I'd prefer a base spec that is complete so that namespaces are required for a) niche applications and b) extensions, new functions, ideas not contemplated in the original spec, etc".

[NeilDunn RefactorOk] I think if we want a fresh start we should move away from XML, plain text is simple, can express the same things, and a parser can be written so simply. My general notes:

HTML should be allowed in line as "data", however data should be described by a MimeType, so that in the same way people can use text/plain etc, if that is how their content is released.
A "short description" of the post is pointless, there should be a valid title, followed by the whole data. People who are using aggregators (not me, I'm a browser) don't like reading a stripped version of the post, and want to read the whole thing.
The core should be very small, namespaces should be allowed to extend the model, however I think they should all be worked on in a fresh way in the same way as Echo itself, using previous namespaces could result in the same problems we are tying to get out of at the moment.
When a defined namespace has a conflicting element with the core element (such as date and time), the core element should always have precidense.

After reading through Aaron Swartz's RSS 3.0 proposal, I shun the "No HTML" statement, I feel an item should be portrayed in the way it was written, if that uses HTML, then HTML should be a valid way of representing the data and should be allowed within the markup.

Another main point to make is that the specification of Echo should be written to be as descriptive as possible for simple users, it seems that everyone shot off making their own style RSS feeds using shit examples from people that haven't properly interpreted the specification.

Lex: I'm in the don't-use-XML camp for one reason: XML is scary. As I see it, the web caught on because HTML caught on, and HTML caught on because it was so damn easy. How did he make that text <blink>? I view the source and learn. XML is a thousand things. It's used for so many different purposes and is difficult to define to a person with no interest in knowing the definition. I propose using the spirit of XML, but not in its strictest sense, so that we can make the rules fit the task.

<echo> 
     <pubDate = "[date-in-any-format]">
     <poster name = "Joe Blogger" email="email@me.you" location="where i am">
     <content>Whatever I have to say, including escaped html</content>
</echo>

It's not perfect, but that's the direction I'd go if I were the boss. Or Dave

AsbjornUlsberg

much better off

Enamel

[HenriSivonen RefactorOk] The crucial question is whether application programmers are expected to write a byte stream parser for this new format or whether they are expected to use a ready-made off-the-shelf parser component. If the app programmers are supposed to write a byte stream parser, then XML way too complicated. Writing XML parsers should be left to experts. OTOH, if app programmers are expected to use ready-made low-level parsers, then XML is excellent, because open-source XML parsers are readily available. If the data structures are complicated, it is easier to work on the SAX or DOM level than to parse complex structures out of a byte stream. However, if XML is chosen, then everyone must use a real XML parser, because regexp hacks would not enforce the XML rules and, thus, would cause interoperability problems. (Also, in order to be able to use ready-made off-the-shelf XML parsers, everyone has to adhere to the XML rules. That is, in order to be able to leverage ready-made parsers, documents must be well-formed and it doesn't make sense say ä to a non-validating parser without declaring the entity in the internal DTD subset.)

PhilWolff

[LeonardoHerrera RefactorOk] To add more to the same: at this day, it's far, far more easy to write a tool that parses XML than text/plain with some magic formatting. The key is to reuse existing libraries. A decent XML parser with DOM support is all you need, with little actual coding. The only issue is, we developers have to commit to 'produce valid XML' and forget about the little perl/python/awk hacks without using proper tools

HenriSivonen

RefactorOk

well-formed

validity

http://iki.fi/hsivonen/blog-server.html

[AlexSchroeder RefactorOk] I prefer the simple plain-text RSS 3.0 mentioned above. If we decide to use RSS 3.0, then libraries for Perl, Python, Ruby and other languages will appear within days, so that is not a problem. Note that on Aaron's weblog entry for RSS 3.0 http://www.aaronsw.com/weblog/000574 there is a one-line Python parser for RSS 3.0 by Sean B. Palmer:

 [dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(item)) for item in s.split('\n\n')]

If you care about I18N, this might be a better one-liner:

 [dict(re.compile('(?s)([^\n:]+): (.*?)(?=\n[^ \t]|\Z)').findall(item)) for item in unicode(s, "utf-8").split('\n\n')]

XML is complex because it is generic. RSS 3.0 is simple because the problem set is constrained. I like that. I actually implemented RSS 3.0 for my wiki engine tonight http://www.emacswiki.org/cgi-bin/oddmuse.pl?action=rc&raw=1.

[KennethLeFebvre RefactorOk] This is a funny page, right? I can understand concern that XML uses up bandwidth... but, "XML is too complicated" and "XML is scary"?! Regex is scary, but that doesn't stop me from using it, because it's also cool. XML is anything but scary, and it's only complicated when you're trying to do complicated things with it. It's simpler than HTML, in my opinion.

I think RSS is useable because it's simple XML. It's simple to transform into anything else I want with XSLT. I don't have to build parsers or anything, I can do everything I want in the ubiquitous web browser... in addition to whatever dedicated aggregators are out there.

[MovGP0 RefactorOk] XML is better because:

Backward-Compatibility
Software like Stylusstudio and XML Spy does (with a proper scheme) a lot of auto-complete for the user so that:

XML is easier to use than other Standards
XML leads to less errors

Existing XML-Parser for testing validity
Only one Parser and only one Checkrun is needed to check the Atom-XML and the XML-based Message-Body.
XSLT-Engines as needed ie. for News-Readers and showing feeds directly in Browser
Includes Support for Namespaces as needed for RDF/OWL witch adresses:

Future needs of next-generation Webservices
Custom extensions at users will

Developers don't need to reinvent the wheel (new syntax, classes and mappers)

XML is worse because:

People who never used XML might be confused at first sight. Editing-Tools for Blogging will be a need. (I think that problem keeps with other syntaxes too until there is a human-like-software parser)
Some overhead - ie. closing-tags (Witch is not a good argument to users with broadband-access)

I want to say too that we can save more overhead when defining a (optional) Web-Service-Standard. ie. sending a SOAP-Message with

<request> 
  <since>2005-08-20T14:22:03Z</since>
  <autor>Otto Example</autor>
</request>

returns an feed showing only entries since the given date. This saves a lot of transfer-ammount, specially when there is a big Blog with is not updated very often. Another Idea would be to use an relation like:

<link rel="next" title="Next 20 Entries" href="previos-month-posts.xml" />

[MovGP0 RefactorOk] Currently there is the RestEchoSearchApi witch solves this like:

GET /search?search-entry.author.name-contains=James&search-entry.modified-gt=2003-08-05

or by POST a XQuery

<?xml version="1.0" encoding="utf-8"?>
<search-entries xmlns="http://purl.org/atom/ns#search-query-centric">
  <query>//entry[fn:contains(author/name, "James") and modified gt "2003-08-05"]</query>
</search-entries>

CategoryArchitecture, CategorySyntax, CategoryRss