MonkeyPatch for Ruby 1.8.6
One of the joys of Ruby and HTML5 is that one can easily extract data from a web page with an XPath expression. For example, the following extracts that URI of the RSD document from a weblog that supports RSD:
require 'open-uri' require 'html5/html5parser' doc = HTML5::HTMLParser.parse(open(ARGV[0])) rsd = doc.elements['//link[@type="application/rsd+xml"]/@href'].to_s
Unfortunately, there is a bug in Ruby 1.8.6 that affects documents with a default namespace (even a vestigial one, like those sported by WordPress weblogs) which prevents non-namespace qualified attribute names from working in XPath expressions.
The following monkey-patch fixes this:
require 'rexml/document'
doc = REXML::Document.new '<doc xmlns="ns"><item name="foo"/></doc>'
if not doc.root.elements["item[@name='foo']"]
class REXML::Element
def attribute( name, namespace=nil )
prefix = nil
prefix = namespaces.index(namespace) if namespace
prefix = nil if prefix == 'xmlns'
attributes.get_attribute( "#{prefix ? prefix + ':' : ''}#{name}" )
end
end
end
As I am bound to hit this issue frequently, I’ve added it to my monkey_patches file:
export RUBYOPT='-rubygems -r/home/rubys/bin/monkey_patches'
[from gerd.storm] Sam Ruby: MonkeyPatch for Ruby 1.8.6
[link]...Excerpt from del.icio.us/network/cmrsampaio at
Thanks for steering me in right direction.
Additionally to the problem you’ve described, I found that this method does not work if there are phantom namespaces left after XSLT translation.
Basically, the following test case does not work
require 'rexml/document'
doc = REXML::Document.new(
'<doc xmlns="ns" xmlns:phantom="ns"><item name="foo">text</item></doc>'
)
p doc.text( "/doc/item[@name='foo']" )
p doc.root.elements["item"].attribute("name", "ns")
p doc.root.elements["item[@name='foo']"]
These are the test results in ruby 1.8.6
$ ruby test.rb nil nil nil
With the following monkey-patch...
require 'rexml/document'
doc = REXML::Document.new(
'<doc xmlns="ns" xmlns:bar="ns"><item name="foo"/></doc>'
)
if not doc.root.elements["item[@name='foo']"]
class REXML::Element
def attribute( name, namespace=nil )
prefix = nil
prefix = namespaces.index(namespace) if namespace
prefix = nil if prefix == 'xmlns'
ret_val =
attributes.get_attribute( "#{prefix ? prefix + ':' : ''}#{name}" )
return ret_val unless ret_val.nil?
return nil if prefix.nil?
# now check that prefix'es namespace is not the same as the
# default namespace
return nil unless ( namespaces[ prefix ] == namespaces[ 'xmlns' ] )
attributes.get_attribute( name )
end
end
end
... the test produces expected results
$ ruby test.rb "text" name='foo' <item name='foo'/>
I’ve submitted a bug and proposed fix with REXML
Posted by Alexander Pogrebnyak at
Or put another way:
“There is a bug in Ruby 1.8.6. Until a version is released which fixes this, here you go.”
Interesting concept.
Posted by James Abley at