Last Line of Defense
GIGO. It is easier to produce correct XML output if you have correct XML input. One way to achieve this is to ensure that data that is not well formed XML can never to be stored. With Ruby on Rails, this can be enforced with validation rules that invoke a parser, and throw an error upon failure, thus:
require 'xml/parser'
class Entry < ActiveRecord::Base
validates_each :title, :summary, :content do |model, attr, value|
@@xmlparser ||= XML::Parser.new
begin
@@xmlparser.parse "<div>#{value}</div>" if value
rescue
model.errors.add attr, 'is not well formed XML'
ensure
@@xmlparser.reset
end
end
end
And tests such as these can verify the correct operation:
class EntryTest < Test::Unit::TestCase
fixtures :entries
def setup
@entry = Entry.find(:first)
end
def test_title_not_wellformed
@entry.title = "AT&T"
assert @entry.save, message="well formed title can't be saved"
@entry.title = "AT&T"
assert ! @entry.save, message="not well formed title saved"
assert_equal "is not well formed XML", @entry.errors.on(:title)
end
end
As a footnote, the verification logic took three attempts to get right. My first attempt was to use REXML. While it is certainly the most elegant Ruby XML API, it seems to accept a variety of ill-formed XML fragments, for example the following produces no error:
require 'rexml/document'
REXML::Document.new("<div>at&t")
Next, I tried libxml2. While the following correctly reported the errors, it also did so on STDERR.
require 'xml/libxml' p = XML::Parser.new p.string = "<div>at&t" p.parse
My third attempt uses Expat and serves my needs just fine.