Genshi Filters for Venus
Joe Gregorio recently IM’ed me and asked me if I had looked into Genshi, suggesting that Genshi might interest me because it seemed to use XPath expressions in templates to manipulate templates. I said that I had seen it, but it didn’t seem useful for most of my purposes. But the more I thought about the XPath remark, the more obvious it became that I didn’t yet fully understand what Genshi could do.
Until this point, my impression that it was yet another templating engine that, given a dictionary and a template, would perform variable substitution for you, and also be able to do simple conditionals and loops, as well as provide a limited ability to shell out to the host language, much like Velocity or Cheetah.
Yes, Genshi can do all that. But that doesn’t explain where XPath fits in. Even when you factor in that the templating language has an XML grammar.
So, I took a look at Genshi again. And this time it clicked.
Genshi markup templates are XML (there’s another, more Velocity/Cheetah kind of template too, but let’s not digress). These XML documents are processed as a stream of what amounts to SAX events.
The first twist is that if the value of a variable to be substituted is a Genshi Stream object, then the stream itself gets injected into the template, not the string representation of the same. This means that the events in the stream gets processed.
The second twist is that certain elements in the Genshi namespace, or elements that contain certain attributes in the that namespace, are treated as templates. In other words, there is no strict separation between templates and documents, like there is XSLT, and both kinds of data can be mixed together.
By itself, that’s not all that useful, but when combined with the ability to inject in one or more streams, you have a text substitution based templating system that can do double duty as a rule based substitution markup language (like XSLT). And to top it off, such a markup language can also provide you with access to the underlying host language (in this case, Python). And in the process, this neatly explains where XPath fits in.
Sweet.
Once I got the concept, I set out to apply it. I took an existing XSLT template that I use — one that would benefit from access to the richer library of functions that Python provides as compared to XSLT — and set out to convert it.
The original stylesheet has some pro-forma stuff at the top (and a line at the bottom) which declares namespaces and the like, and a total of five templates.
- The first matches a
divelement with anidof'sidebar'and appends a<h2>and a<form>with a single input named'q'. - The second implements a library function which returns a baseuri for a given string, using recursion.
- The third matches the head element and appends an opensearch autodiscovery link, using the
baseuritemplate defined in the step above. - The fourth ensures that
scripttags don’t use the empty tag syntax, in order to accommodate browsers like IE. - The fifth is a standard catch-all that passes through everything else.
The pro-forma stuff at the top is actually an idiom which declares the namespace then causes the element itself to be stripped. There is no need for the identity catch-all in Genshi, but in its place, there is a need to inject the input document stream into the template.
In the remaining four templates, the translation from XSLT to Genshi markup is straightforward. And generally, the Genshi markup is both more compact and more powerful. Key points:
- In general templates are named by their output element. This optimizes for a common case. To consume an element and produce no output, you would either need to use the
py:matchelement (as opposed to the attribute) or make use of thepy:skipattribute. - Instead of defining a
baseuritemplate myself, I can simply import python’surljoin. - While one can easily tunnel out to Python in order to evaluate expressions, and from there tunnel back into Genshi evaluate an XPath expression, there are only two quote characters to chose from, so one will quickly need to escape quotes.
- The result of evaluating an XPath expression is actually a stream. This will usually be handled as you expect, but if you want to pass the results of evaluating an XPath expression as a argument to a function expecting a string, you will need to convert it yourself first. No biggie, but it surprised me at first.
- In general, I prefer the way whitespace is handled better in XSLT. Genshi will try to intelligently remove blank lines in the serialized output, whereas XSLT will not serialize text nodes consisting entirely of whitespace (unless xml:space="preserve" is defined in this or an enclosing scope). This combined with xslt:text gives you complete control of the output.
- I still don’t completely understand the scoping rules. For example, if the
importstatement is moved insideheadtemplate then theurljoinsymbol won’t be resolved.
I’m sure that I’ve only scratched the surface of what Genshi can do, but it was enough to convince me to rough in the ability for people to use Genshi as a markup language for Venus filters and templates.
In the process, I added another function to Venus: template filters, i.e., filters that are used to post-process the output of a template. The templates presented above to add a search form and autodiscovery to an properly constructed HTML page is but one such example of what can be done with a post processing template. Ultimately, I hope to bang on my mememes logic until it too can be executed as a filter.
What I’ve done for Genshi templates is very limited. For input filters, the input is an Atom element and the output is an Atom element, so XML in and XML out is appropriate. For templates, input is an Atom feed, and output can be pretty much anything you want, so XML to XML may work, but other options are available: in particular a HTML serializer is a possibility. But for filters that post process the output of a template, well the input can be pretty much anything you like. For this case, an HTML parser may be handy — if for no other reason than it will allow you to post process the output of HTMLTmpl outputs.
Additionally, at the moment I’ve not done all I can to enable the “simple template” approach. In the case of templates and input filters, parsing the data into dictionaries would be helpful. I could make use of the the variables defined for htmltmpl usage; but those only expose a subset of the data and are engineered around some limitations of htmltmpl itself. I’m inclined to simply pass the data through the feedparser and be done with it.
But these need to be filter options, and they need testcases. If anybody is interested, here’s where the unit tests go, and here’s the interfacing code for Genshi.
Of course, you will need to install Genshi first. But by now, you probably want to anyway, don’t you. :-)
It looks like I have some hacking to do this weekend. This looks like fun!
Posted by Scott Johnson at
Wow, I’m glad you took a look at this, Sam. I’d been meaning to put a bug in your ear about it myself. Many of the concepts Genshi borrowed from Kid were conceived developing a system similar to Venus. It was called “Splice”. I was trying to build an aggregator that brought in different feed formats, normalized them to some common format (I think Atom was still called Echo), and then used XSLT to transform them. I kept on running into problems with XSLT and got a nasty itch for a simple alternative.
The aggregator project was a horrible mess. Looking at Venus, I can see now many of the problems in my approach. But I’ve always thought Kid, and especially the refinements added in Genshi (XPath!!), was something people ought to have a look at.
I wrote a piece a while back that goes into more detail on why Kid’s design is the way it is. I think you’d find it interesting: [link]
The first half is pretty blathering so skip down about half way or so and there’s some meat.
Posted by Ryan Tomayko atLooking at Venus, I can see now many of the problems in my approach.
You know, there are two different ways to parse that statement. Now, come to think about it, I won’t ask. :-)
Posted by Sam Ruby atSam Ruby: Genshi Filters for Venus
“In the remaining four templates, the translation from XSLT to Genshi markup is straightforward. And generally, the Genshi markup is both more compact and more powerful.”...Excerpt from del.icio.us/tag/python at
Yes. That was poorly stated. I don’t think you’re a cynic, so parse it the first way :)
Posted by Ryan Tomayko at
Neat! Hadn’t seen Genshi before now, but I like Kid quite a bit. I rather suspect I will end up stealing some of Genshi’s concepts in the future.
Posted by Bob Aman at
People loving Genshi and HTML5 might like genshihtml5.
I don’t think the “HTML5 Template Language” is really usable yet, but HTML5 input and output surely are (and serialization now omit optional tags to save some bytes on the wire).
Posted by Thomas Broyer atRe: Genshi Filters for Venus; Genshi + Trac-AtomPP
This news is excellent. One of my side projects (although, it was pretty low on my list) was to figure out how to use Genshi templates in Venus. I started out by copying the Django template code/unit tests and adapted them for Genshi. However, I...Excerpt from Egocity: Developer's Edition at
Genshi Templates for Venus
Earlier, I explored Genshi Filters for Venus, and compared them to XSLT. Today, I implemented Genshi Templates. Lets compare them to htmltmpl. Excerpt from genshi_fancy: <h3 py:if="entry.new_feed"><a href="$entry.link" title="$entry.s... [more]Trackback from Sam Ruby at
Re: Genshi Filters for Venus; Genshi + Trac-AtomPP
This news is excellent. One of my side projects (although, it was pretty low on my list) was to figure out how to use Genshi templates in Venus. I started out by copying the Django template code/unit tests and adapted them for Genshi. However, I...Excerpt from Egocity: Developer's Edition at
This is why I mention things in an off-hand manner to Sam in IM.
Posted by Joe at