Genshi Filters for Venus

2007-04-14T21:06:13Z

Joe Gregorio recently IM’ed me and asked me if I had looked into Genshi, suggesting that Genshi might interest me because it seemed to use XPath expressions in templates to manipulate templates. I said that I had seen it, but it didn’t seem useful for most of my purposes. But the more I thought about the XPath remark, the more obvious it became that I didn’t yet fully understand what Genshi could do.

Until this point, my impression that it was yet another templating engine that, given a dictionary and a template, would perform variable substitution for you, and also be able to do simple conditionals and loops, as well as provide a limited ability to shell out to the host language, much like Velocity or Cheetah.

Yes, Genshi can do all that. But that doesn’t explain where XPath fits in. Even when you factor in that the templating language has an XML grammar.

So, I took a look at Genshi again. And this time it clicked.

Genshi markup templates are XML (there’s another, more Velocity/Cheetah kind of template too, but let’s not digress). These XML documents are processed as a stream of what amounts to SAX events.

The first twist is that if the value of a variable to be substituted is a Genshi Stream object, then the stream itself gets injected into the template, not the string representation of the same. This means that the events in the stream gets processed.

The second twist is that certain elements in the Genshi namespace, or elements that contain certain attributes in the that namespace, are treated as templates. In other words, there is no strict separation between templates and documents, like there is XSLT, and both kinds of data can be mixed together.

By itself, that’s not all that useful, but when combined with the ability to inject in one or more streams, you have a text substitution based templating system that can do double duty as a rule based substitution markup language (like XSLT). And to top it off, such a markup language can also provide you with access to the underlying host language (in this case, Python). And in the process, this neatly explains where XPath fits in.

Sweet.

Once I got the concept, I set out to apply it. I took an existing XSLT template that I use — one that would benefit from access to the richer library of functions that Python provides as compared to XSLT — and set out to convert it.

The original stylesheet has some pro-forma stuff at the top (and a line at the bottom) which declares namespaces and the like, and a total of five templates.

The first matches a div element with an id of 'sidebar' and appends a <h2> and a <form> with a single input named 'q'.
The second implements a library function which returns a baseuri for a given string, using recursion.
The third matches the head element and appends an opensearch autodiscovery link, using the baseuri template defined in the step above.
The fourth ensures that script tags don’t use the empty tag syntax, in order to accommodate browsers like IE.
The fifth is a standard catch-all that passes through everything else.

Here’s the Genshi equivalent.

The pro-forma stuff at the top is actually an idiom which declares the namespace then causes the element itself to be stripped. There is no need for the identity catch-all in Genshi, but in its place, there is a need to inject the input document stream into the template.

In the remaining four templates, the translation from XSLT to Genshi markup is straightforward. And generally, the Genshi markup is both more compact and more powerful. Key points:

In general templates are named by their output element. This optimizes for a common case. To consume an element and produce no output, you would either need to use the py:match element (as opposed to the attribute) or make use of the py:skip attribute.
Instead of defining a baseuri template myself, I can simply import python’s urljoin.
While one can easily tunnel out to Python in order to evaluate expressions, and from there tunnel back into Genshi evaluate an XPath expression, there are only two quote characters to chose from, so one will quickly need to escape quotes.
The result of evaluating an XPath expression is actually a stream. This will usually be handled as you expect, but if you want to pass the results of evaluating an XPath expression as a argument to a function expecting a string, you will need to convert it yourself first. No biggie, but it surprised me at first.
In general, I prefer the way whitespace is handled better in XSLT. Genshi will try to intelligently remove blank lines in the serialized output, whereas XSLT will not serialize text nodes consisting entirely of whitespace (unless xml:space="preserve" is defined in this or an enclosing scope). This combined with xslt:text gives you complete control of the output.
I still don’t completely understand the scoping rules. For example, if the import statement is moved inside head template then the urljoin symbol won’t be resolved.

I’m sure that I’ve only scratched the surface of what Genshi can do, but it was enough to convince me to rough in the ability for people to use Genshi as a markup language for Venus filters and templates.

In the process, I added another function to Venus: template filters, i.e., filters that are used to post-process the output of a template. The templates presented above to add a search form and autodiscovery to an properly constructed HTML page is but one such example of what can be done with a post processing template. Ultimately, I hope to bang on my mememes logic until it too can be executed as a filter.

What I’ve done for Genshi templates is very limited. For input filters, the input is an Atom element and the output is an Atom element, so XML in and XML out is appropriate. For templates, input is an Atom feed, and output can be pretty much anything you want, so XML to XML may work, but other options are available: in particular a HTML serializer is a possibility. But for filters that post process the output of a template, well the input can be pretty much anything you like. For this case, an HTML parser may be handy — if for no other reason than it will allow you to post process the output of HTMLTmpl outputs.

Additionally, at the moment I’ve not done all I can to enable the “simple template” approach. In the case of templates and input filters, parsing the data into dictionaries would be helpful. I could make use of the the variables defined for htmltmpl usage; but those only expose a subset of the data and are engineered around some limitations of htmltmpl itself. I’m inclined to simply pass the data through the feedparser and be done with it.

But these need to be filter options, and they need testcases. If anybody is interested, here’s where the unit tests go, and here’s the interfacing code for Genshi.

Of course, you will need to install Genshi first. But by now, you probably want to anyway, don’t you. :-)