It’s just data

xml5.js

I’ve posted a rough beginnings of an implementation of xml5 for node.js.  The core of this work is the tokenizer, for which I wrote a simple script to do the conversion of Anne van Kesteren’s implementation of the parse state methods to the style that Aria Stewart used for html5.  Pretty much the remainder was “borrowed” from html5.

While this is not yet complete, you can see how it parses and dumps simple files with the following command:

node parse.js filename

Plenty still needs to be done.  In particular:


That message is coming from jsdom. You need to explicitly pass an instance of Aria’s parser to jsdom’s HtmlToDom method.

Posted by Edward O'Connor at

Edward: thanks!  Verified and committed.

Posted by Sam Ruby at

Why is it Node.js specific? It seems like the parser itself would be pure JavaScript.

Posted by Tom Robinson at

Wow. This is actually pretty neat stuff. I like the graceful error-handling, any-input-will-work style. Beats XML conformance issues any day.

Posted by Aria Stewart at

Tom: it may not be all that Node.js specific.  At a minimum, I’m currently using the CommonJS Module System, the EventEmitter, and some peripheral usage of the file system.

As the core dependency is jsdom which implements the W3C DOM standard, it should be possible to strip out the require statements and usage of Events, concatenate the rest, minimize the results and use the results in the browser of your choice.

Posted by Sam Ruby at

lib/jsdom/level1/core.js:

  get nodeName() {
    var name = this._nodeName || this._tagName;
    if (this.nodeType === this.ELEMENT_NODE &&
        this.ownerDocument &&
        this.ownerDocument.doctype &&
        this.ownerDocument.doctype.name.indexOf("html") !== -1)
    {
      return name.toUpperCase();
    }
    return name;
  },

Based on this, it looks like the workaround is to create a DOCTYPE for every XML document that doesn’t have one.  Either that, or monkey-patch jsdom.  Or switch to using localName, which brings up another potential issue: if I’m reading this code correctly, simple names without a colon are treated as being a prefix, and localName will not be set in such cases.

Posted by Sam Ruby at

I was under the impression that jsdom was terrifyingly slow... have you done any quick perf testing yet?

Posted by Jeff Waugh at

My quick performance testing has not shown any issues.  In fact, due to the way events are pipelined, I’ve yet to see a case where the completion of parsing is more than 1 millisecond after the completion of the http request.  That may not be the case for disk access.

var http  = require('http'),
    html5 = require('html5'),
    jsdom = require('jsdom'),
   window = jsdom.jsdom().createWindow(null, null, {parser: html5});

var rubix = http.createClient(80, 'planet.intertwingly.net');
var request = rubix.request('GET', '/', {'host': 'planet.intertwingly.net'});
request.end();
var start = new Date();
request.on('response', function (response) {
  response.on('end', function () {
    console.log('Response complete: ' + (new Date() - start));
  });
  var parser = new html5.Parser({document: window.document});
  parser.parse(response);
  parser.on('end', function() {
    console.log('Parse complete: ' + (new Date() - start));
  });
});
Posted by Sam Ruby at

http://intertwingly.net/blog/2011/01/13/xml5-js #nodejs

[link] #nodejs...

Excerpt from Topsy - http://twitter.com/ryah at

karlpro: XML5 implementation for node.js http://www.intertwingly.net/blog/2011/01/13/xml5-js

karlpro’s status on Thursday, 13-Jan-11 19:00:43 UTC...

Excerpt from mparienti and friends at

13yo: RT @ryah: http://intertwingly.net/blog/2011/01/13/xml5-js #nodejs

13yo: RT @ryah: [link] #nodejs...

Excerpt from Twitter / 13yo at

jsdom issue 24: localName and prefix are not set correctly and consistently

Posted by Sam Ruby at

Web Standards Links: 10 January 2011 to 16 January 2011

Promoting inter opera bility is at the heart of Open The Web. If you had missed the topic of the week is Google planning to drop the support of H264 in Chrome. I’ll spare you the links, there were plenty all over twitter. Web standards...

Excerpt from karlCOW at

First post to the newly minted jsdom mailing list: xmlns issues.

Posted by Sam Ruby at

MicroLark

John Cowan:  I’ll openly admit at this point that I’m skeptical about the prospects of MicroXML.  It doesn’t contain enough of XML to correctly parse feeds (be they RSS 1.0, RSS 2.0, or Atom).  It doesn’t... [more]

Trackback from Sam Ruby

at

samruby: @tmpvar http://intertwingly.net/blog/2011/01/13/xml5-js

samruby: @tmpvar [link]...

Excerpt from Twitter / samruby at

This is some nifty work you’re doing with Node.js here, Sam.  The possibilities for expanding upon this are swirling around in my head right now.  They seem limitless!

Posted by Scott Johnson at

Add your comment