LRU caching weblogs

CornerHost provides servers more than capable of keeping up with the load of dynamically generating every page on this site on every request, with more than enough cycles to spare for other bloggers on the same machine. Despite this, caching provides better responsiveness on the most frequently accessed pages and insurance against the unlikely event that I get slashdotted.

My expectations were like Phil's, but the reality is subtly different. Upon generating a new post, index.rss often regenerated faster than I can ping weblogs.com. Then comes index.html, comments.rss, index.rdf, and index.rss2, generally in roughly that order. What's more interesting to me is what happens to the other pages. By running the following script every night, I can keep my cache small:

 find . -type f -atime 1 | xargs rm -f
 find . -type f -ctime 7 | xargs rm -f
 find . -type d -empty | xargs rmdir

This deletes all files that haven't been accessed in over a day, files that were created over a week ago (ensuring that even minor template changes get propagated), and deletes all directories which have been made empty as a result.

Not surprisingly, most of the files that remain are html files, despite all the various alternative formats I support. Even so, the total cumulative effect of all the various bots running throughout the day is to only touch 20 to 50% of my blog entries. This works out to be approximately 1 to 2 a minute, though the reality is considerably more bursty than that. Only 2 to 3% of all my entries (currently this number is 28 out of 1207) are touched more than once in a day, many of them by actual humans, typically be following inbound links or google queries.

All of this data could have been obtained by analyzing my Apache logs, but it is much more readily apparent by looking at my cache.


I remember when reading over Phil's post thinking, "What this guy needs is a Squid httpd accelerator.", but I got distracted, and forgot to post.

So, I'll mention it here. If you're wanting the flexibility of "fried" publishing, why not simply use Squid to proxy frequently requested pages, and serve them out of memory?

kellan

Posted by Kellan at

Kellan - at the moment, I appear to have all the advantages of Apache (e.g., ETags, high speed serving of "static" pages", htaccess and htpasswd, etc.) with all the advantages of dynamic pages (e.g., posting a comment on any blog entry is instantly visible).

The times I have checked, the machine is typically 98% idle.

Posted by Sam Ruby at

I am using a technic similar to Sam's instead of Squid for a few reasons:

- Cache invalidation is easy. All I need to do is delete some files.

- This technic is a lot easier to get going than installing and configuring Squid. It only requires a couple of Apache configuration directives and a few extra lines of code in the CGI script.

- My goal was to eliminate the latency due to invoking a CGI script. The performance of Apache's default handler is more than adequate for my little site.

- It's possible that I will move my site to a host where I cannot install Squid.

Posted by Gary Burd at

Fried AxKit

This is one of the nice things about AxKit <http://www.axkit.org>.
The default is a "fried" approach since you a lot of dynamic control
along with AxKit's caching.

Of course, because of the way caching is done, you can't (easily)
analyze the site stats in the way that Sam has done with his fried
content.

Emailed by Mark A. Hershberger at


comment here?

Posted by penis enlargement at


CornerHost provides servers more than capable of keeping up with the load of dynamically generating every page on this site on every request, with more than enough cycles to spare for other bloggers on the same machine. Despite this, caching...

Excerpt from phil ringnalda dot com: Half-baked, and a little fried: Comments at

Add your comment












Nav Bar