It’s just data

MonkeyPatch for Ruby 1.8.6

One of the joys of Ruby and HTML5 is that one can easily extract data from a web page with an XPath expression.  For example, the following extracts that URI of the RSD document from a weblog that supports RSD:

require 'open-uri'
require 'html5/html5parser'
doc = HTML5::HTMLParser.parse(open(ARGV[0]))
rsd = doc.elements['//link[@type="application/rsd+xml"]/@href'].to_s

Unfortunately, there is a bug in Ruby 1.8.6 that affects documents with a default namespace (even a vestigial one, like those sported by WordPress weblogs) which prevents non-namespace qualified attribute names from working in XPath expressions.

The following monkey-patch fixes this:

require 'rexml/document'
doc = REXML::Document.new '<doc xmlns="ns"><item name="foo"/></doc>'
if not doc.root.elements["item[@name='foo']"]
  class REXML::Element
    def attribute( name, namespace=nil )
      prefix = nil
      prefix = namespaces.index(namespace) if namespace
      prefix = nil if prefix == 'xmlns'
      attributes.get_attribute( "#{prefix ? prefix + ':' : ''}#{name}" )
    end
  end
end

As I am bound to hit this issue frequently, I’ve added it to my monkey_patches file:

export RUBYOPT='-rubygems -r/home/rubys/bin/monkey_patches'

Or put another way:

“There is a bug in Ruby 1.8.6. Until a version is released which fixes this, here you go.”

Interesting concept.

Posted by James Abley at

Sam Ruby: MonkeyPatch for Ruby 1.8.6

[link]...

Excerpt from del.icio.us/tag/ruby at

[from gerd.storm] Sam Ruby: MonkeyPatch for Ruby 1.8.6

[link]...

Excerpt from del.icio.us/network/cmrsampaio at

Thanks for steering me in right direction.

Additionally to the problem you’ve described, I found that this method does not work if there are phantom namespaces left after XSLT translation.

Basically, the following test case does not work

require 'rexml/document'
doc = REXML::Document.new(
  '<doc xmlns="ns" xmlns:phantom="ns"><item name="foo">text</item></doc>'
)
p doc.text( "/doc/item[@name='foo']" )
p doc.root.elements["item"].attribute("name", "ns")
p doc.root.elements["item[@name='foo']"]

These are the test results in ruby 1.8.6

$ ruby test.rb 
nil
nil
nil

With the following monkey-patch...

require 'rexml/document'
doc = REXML::Document.new(
    '<doc xmlns="ns" xmlns:bar="ns"><item name="foo"/></doc>'
)
if not doc.root.elements["item[@name='foo']"]
  class REXML::Element
    def attribute( name, namespace=nil )
      prefix = nil
      prefix = namespaces.index(namespace) if namespace
      prefix = nil if prefix == 'xmlns'

      ret_val =
        attributes.get_attribute( "#{prefix ? prefix + ':' : ''}#{name}" )

      return ret_val unless ret_val.nil?

      return nil if prefix.nil?

      # now check that prefix'es namespace is not the same as the
      # default namespace
      return nil unless ( namespaces[ prefix ] == namespaces[ 'xmlns' ] )

      attributes.get_attribute( name )
    end
  end
end

... the test produces expected results

$ ruby test.rb
"text"
name='foo'
<item name='foo'/>

I’ve submitted a bug and proposed fix with REXML

Posted by Alexander Pogrebnyak at

Living In a Fool's Paradise

Why I hate REXML....

Excerpt from Musings at

Add your comment