Ruby HTML5 Parser
I got enough of this running to demonstrate proof of concept:
require 'open-uri' require 'html5lib/html5parser' uri = 'http://www.whatwg.org/' doc = HTML5lib::HTMLParser.parse(open(uri)) doc.elements.each('//p[@class="what-to-do"]/a') {|link| link.elements.each('em') {|title| print title.children} puts ":\t#{link.attribute('href')}" }
REXML is used for the TreeBuilder
I’m looking for help. Interested? Join the group.
Update: First patch