It’s just data

MonkeyPatch for Ruby 1.8.6

One of the joys of Ruby and HTML5 is that one can easily extract data from a web page with an XPath expression.  For example, the following extracts that URI of the RSD document from a weblog that supports RSD:

require 'open-uri'
require 'html5/html5parser'
doc = HTML5::HTMLParser.parse(open(ARGV[0]))
rsd = doc.elements['//link[@type="application/rsd+xml"]/@href'].to_s

Unfortunately, there is a bug in Ruby 1.8.6 that affects documents with a default namespace (even a vestigial one, like those sported by WordPress weblogs) which prevents non-namespace qualified attribute names from working in XPath expressions.

The following monkey-patch fixes this:

require 'rexml/document'
doc = '<doc xmlns="ns"><item name="foo"/></doc>'
if not doc.root.elements["item[@name='foo']"]
  class REXML::Element
    def attribute( name, namespace=nil )
      prefix = nil
      prefix = namespaces.index(namespace) if namespace
      prefix = nil if prefix == 'xmlns'
      attributes.get_attribute( "#{prefix ? prefix + ':' : ''}#{name}" )

As I am bound to hit this issue frequently, I’ve added it to my monkey_patches file:

export RUBYOPT='-rubygems -r/home/rubys/bin/monkey_patches'