It’s just data

Nokogiri

Yesterday, I was looking to move some code that I have running on one machine to a server which has Ruby 1.8.6 installed.  Once again, I encountered yet another difference with the version of REXML that was contained in that version of Ruby.  This time, instead of looking for a monkey patch, I looked for alternatives.

As of 2.3, Rails continues to default to REXML, but supports Nokogiri and LibXML as faster alternatives.  In addition to being faster, both are more spec compliant, both have HTML parsing capabilities (albeit one that is not tracking to HTML5).  In addition Nokogiri has a superior API (based on hpricot) and support for CSS3 selectors.

Installation on Ubuntu and configuration of Rails:

sudo apt-get install ruby1.8-dev libxml2-dev libxslt1-dev
sudo gem install nokogiri
ActiveSupport::XmlMini.backend='Nokogiri'

Locating an element based on an id, using REXML:

node.elements[//*[@id="sidebar"]

Alternatives using Nokogiri:

node.at('//*[@id="sidebar"')
node.at('#sidebar')
node/'#sidebar'

Extracting an attribute given a node, using REXML:

node.attributes['href']

Using Nokogiri:

node['href']

Individually, the differences don’t seem major, but the effects are cumulative.  Which would you rather write:

REXML::Document.new('<a b="c"/>').elements['//a'].attributes['b']

or

Nokogiri::XML('<a b="c"/>').at('a')['b']

Suffice it to say that Nokogiri is now a part of my toolbox, and likely the first tool I will reach for when dealing with XML/XHTML/HTML content in ways beyond the ability of simple regular expressions.


And (an significant advantage over libxml) it now works on JRuby. Many thanks to Mike Dalessio.

Posted by Damian at

Provisioning a New Machine

A failure on a five year old, and now rarely used, machine caused me to invest in an inexpensive replacement.  I chose that machine due to the low-wattage... [more]

Trackback from Sam Ruby

at

Sam Ruby: Nokogiri

[link]...

Excerpt from Delicious/binaryape at

Add your comment