FeedValidator.rb?
This started out as a Random Thought (RT).
background
The Feed Validator is organized as a recursive descent parser for various feed formats. It is implemented in an object oriented fashion, where each element ‘knows’ what the possible children are for that element.
This was all well and good when the vocabulary is relatively small and stable. But now we are getting some rather large new extensions being defined. Some even change the validation rules for existing elements.
The problem is that the current design requires each element needs to know all potential child elements that can occur — even from the most obscure and rarely used namespaces.
What would be better is a more modular approach. One where
the loading of additional definitions were triggered by the
xmlns
attribute itself.
Modifying existing classes is impossible in statically compiled languages, like Java. Modifying existing classes is possible in dynamic languages like Python, but difficult enough to be rarely used. Modifying existing classes is trivial and commonplace in Ruby.
listener
The design starts with a SAX2 listener. For prototyping purposes, I started with REXML, but the more I use it, the more I am convinced that it is not a suitable base for building a validator. My current nemesis: SAX character events receive the text data in a partially digested form. But that’s why I chose SAX2, as that permits me to plug in another parser with relative ease.
The Listener's job is pretty easy:
- initialize name, stack, and parser
- define a default log action of writing to STDERR
start_prefix_mapping
looks up the xmlns, and does arequire
on that name in the module directory. Subsequent calls torequire
have no effect, which is exactly what we want.- for all other methods,
method_missing
simply forwards the message to the rules on the top of the stack start_element
callsmethod_missing
and then pushes all child element rules on the stack, and directly executes all attribute rules.end_element
also callsmethod_missing
and then pops the stack.
element
The Element's job is also straightforward:
- initialize various stuff
log
adds attribute/element/parent name information to the log message and delegate upwards to the parent element- Three “rule” methods do some minor housekeeping
- Include the
SAX2Listener
mixin to define default (null) behavior for all SAX2 events
But the real work is in the Element metaclass, which defines methods for defining rules for attributes and elements, and methods for retrieving these rules.
Several specialized subclasses are defined:
TextElement
captures the character value for a given element, useful for elements liketitle
DataElement
extendsTextElement
, but also throws an error if there is extra whitespace, useful for elements likeupdated
.Cardinality
is stubbed out right now, ultimately it will be used to implementREQUIRED
andMANY
— the latter will allow multiples of elements likecategory
DiscriminatedUnion
is a fancy name for elements whose definition depends on the value of an attribute. Useful for elements likesummary
, and amazingly easy to implement.
modules and rules
Modules effectively make use of a domain specific grammar for defining elements, attributes, and their associated validation rules. This is largely declarative, with the ability to seamlessly drop down into code in the instances where it is necessary.
Rules typically involve a regular expression or a table lookup.
While initially, the split between elements and rules seemed to make sense, as implementation has proceeded, this distinction has become increasingly less self evident. Ultimately, it may need to be refactored away.
test
test overrides the logging and comment mechanisms of the listener to check if the test was successful. It also initializes an xml:base value.
ultimately, this would be converted to use Test::Unit. For the moment, I want to stop on first error.
overall
Overall, I’m impressed by how clean and simple a Ruby implementation could be. If I do proceed further with this (at the moment, there probably is only about 20% test coverage), I will definitely need to look into converting to libxml2.
At the moment, there is essentially no UI, but this could easily be provided by Rails. Rails would also make it trivial to add an HTTP Test Suite.