intertwingly

It’s just data

Data Flow


Les Orchard types in this comment without needing to worry about formatting.  It it stored here in blosxom format as well formed XHTML.  The index page is regenerated with an updated comment count.  The http request is then complete, but processing continues.  First the parent blog entry page for this comment is decached, to be regenerated the first time somebody clicks on it. 

Then the blog entry is atomized into here using a number of regular expressions that are unique to my weblog.  This one page is then indexed using swish-e and then all non-stale queries received within the past 24 hours are rerun against either this index or applied using xpath against the atom feed directly to see if the results would have changed.  Those queries are marked stale.

Then all of the atom feeds are indexed, again using swish-e.  This means that you can now search for words that I personally don't use but appear all to frequenly in my comments.  Furthermore, you will get to see the full context into which they appear.

This entire process typically completes in 10 seconds, with 80% of the time being spent creating the full text index.

What does all this mean to you?  An example is personalized feeds.  In your choice of syndication formats.  With ETag, Last-Modified, and gzip support.

What does this feed contain?  Simply the last 20 comments received within the last 30 days against any blog entry that Les has commented on.