intertwingly

It’s just data

libxml2 screams


Inside the libxml2 Python distribution is a few tests.  One of them is named, simply enough, xpath.  The purpose of this test apparently is to parse a small file, evaluate an xpath expression against it, cleanup, and repeat this in a loop one thousand times.  This runs in subsecond time on my machine.

What this leads me to conclude is that libxml2 is optimized for parsing lots of small files.  So I tested the theory by running a more realistic query against all of the weblog entries on my site.  The result was still subsecond.

Sweet.

That does not mean that I shouldn't migrate to an XML database, but merely that I don't need to do so today.

What it does mean is that I can spend my time thinking about what I want my url space to look like and designing the schema I chose to expose.  There are some obvious things, like it makes sense to have all of the structure exposed instead of obscured.  And a date format that can easily be collated.

As far as the url space goes, I want to make sure that the results are readily cachable.  Thinking about the usage pattern, what I am likely to find is:

Given this usage pattern, it would seem that my existing cache exactly fits this requirement.  Sweet.

I'll probably play with this for a few days before I deploy it publically.