intertwingly

It’s just data

FeedValidator.rb?


This started out as a Random Thought (RT).

background

The Feed Validator is organized as a recursive descent parser for various feed formats.  It is implemented in an object oriented fashion, where each element ‘knows’ what the possible children are for that element.

This was all well and good when the vocabulary is relatively small and stable.  But now we are getting some rather large new extensions being defined.  Some even change the validation rules for existing elements.

The problem is that the current design requires each element needs to know all potential child elements that can occur — even from the most obscure and rarely used namespaces.

What would be better is a more modular approach.  One where the loading of additional definitions were triggered by the xmlns attribute itself.

Modifying existing classes is impossible in statically compiled languages, like Java.  Modifying existing classes is possible in dynamic languages like Python, but difficult enough to be rarely used.  Modifying existing classes is trivial and commonplace in Ruby.

listener

The design starts with a SAX2 listener.  For prototyping purposes, I started with REXML, but the more I use it, the more I am convinced that it is not a suitable base for building a validator.  My current nemesis: SAX character events receive the text data in a partially digested form.  But that’s why I chose SAX2, as that permits me to plug in another parser with relative ease.

The Listener's job is pretty easy:

element

The Element's job is also straightforward:

But the real work is in the Element metaclass, which defines methods for defining rules for attributes and elements, and methods for retrieving these rules.

Several specialized subclasses are defined:

modules and rules

Modules effectively make use of a domain specific grammar for defining elements, attributes, and their associated validation rules.  This is largely declarative, with the ability to seamlessly drop down into code in the instances where it is necessary.

Rules typically involve a regular expression or a table lookup.

While initially, the split between elements and rules seemed to make sense, as implementation has proceeded, this distinction has become increasingly less self evident.  Ultimately, it may need to be refactored away.

test

test overrides the logging and comment mechanisms of the listener to check if the test was successful.  It also initializes an xml:base value.

ultimately, this would be converted to use Test::Unit.  For the moment, I want to stop on first error.

overall

Overall, I’m impressed by how clean and simple a Ruby implementation could be.  If I do proceed further with this (at the moment, there probably is only about 20% test coverage), I will definitely need to look into converting to libxml2.

At the moment, there is essentially no UI, but this could easily be provided by Rails.  Rails would also make it trivial to add an HTTP Test Suite.