intertwingly

It’s just data

Last Line of Defense


GIGO.  It is easier to produce correct XML output if you have correct XML input.  One way to achieve this is to ensure that data that is not well formed XML can never to be stored.  With Ruby on Rails, this can be enforced with validation rules that invoke a parser, and throw an error upon failure, thus:

require 'xml/parser'
 
class Entry < ActiveRecord::Base

  validates_each :title, :summary, :content do |model, attr, value|
    @@xmlparser ||= XML::Parser.new
    begin
      @@xmlparser.parse "<div>#{value}</div>" if value
    rescue
      model.errors.add attr, 'is not well formed XML'
    ensure
      @@xmlparser.reset
    end
  end

end

And tests such as these can verify the correct operation:

class EntryTest < Test::Unit::TestCase
  fixtures :entries

  def setup
    @entry = Entry.find(:first)
  end

  def test_title_not_wellformed
    @entry.title = "AT&amp;T"
    assert @entry.save, message="well formed title can't be saved"

    @entry.title = "AT&T"
    assert ! @entry.save, message="not well formed title saved"
    assert_equal "is not well formed XML", @entry.errors.on(:title)
  end

end

As a footnote, the verification logic took three attempts to get right.  My first attempt was to use REXML.  While it is certainly the most elegant Ruby XML API, it seems to accept a variety of ill-formed XML fragments, for example the following produces no error:

require 'rexml/document'
REXML::Document.new("<div>at&t")

Next, I tried libxml2.  While the following correctly reported the errors, it also did so on STDERR.

require 'xml/libxml'
p = XML::Parser.new
p.string = "<div>at&t"
p.parse

My third attempt uses Expat and serves my needs just fine.