Last Line of Defense
GIGO. It is easier to produce correct XML output if you have correct XML input. One way to achieve this is to ensure that data that is not well formed XML can never to be stored. With Ruby on Rails, this can be enforced with validation rules that invoke a parser, and throw an error upon failure, thus:
require 'xml/parser' class Entry < ActiveRecord::Base validates_each :title, :summary, :content do |model, attr, value| @@xmlparser ||= XML::Parser.new begin @@xmlparser.parse "<div>#{value}</div>" if value rescue model.errors.add attr, 'is not well formed XML' ensure @@xmlparser.reset end end end
And tests such as these can verify the correct operation:
class EntryTest < Test::Unit::TestCase fixtures :entries def setup @entry = Entry.find(:first) end def test_title_not_wellformed @entry.title = "AT&T" assert @entry.save, message="well formed title can't be saved" @entry.title = "AT&T" assert ! @entry.save, message="not well formed title saved" assert_equal "is not well formed XML", @entry.errors.on(:title) end end
As a footnote, the verification logic took three attempts to get right. My first attempt was to use REXML. While it is certainly the most elegant Ruby XML API, it seems to accept a variety of ill-formed XML fragments, for example the following produces no error:
require 'rexml/document' REXML::Document.new("<div>at&t")
Next, I tried libxml2. While the following correctly reported the errors, it also did so on STDERR.
require 'xml/libxml' p = XML::Parser.new p.string = "<div>at&t" p.parse
My third attempt uses Expat and serves my needs just fine.