CornerHost provides servers
more than capable of keeping up with the load of dynamically
generating every page on this site on every request, with more than
enough cycles to spare for other
bloggers on the same machine. Despite this, caching provides
better responsiveness on the most frequently accessed pages and
insurance against the unlikely event that I get slashdotted.
My expectations were like
Phil's, but the reality is subtly different. Upon generating a
new post,
index.rss
often regenerated faster than I can
ping
weblogs.com. Then comes
index.html,
comments.rss,
index.rdf,
and
index.rss2,
generally in roughly that order. What's more interesting to me is
what happens to the other pages. By running the following script
every night, I can keep my cache small:
find . -type f -atime 1 | xargs rm -f
find . -type f -ctime 7 | xargs rm -f
find . -type d -empty | xargs rmdir
This deletes all files that haven't been accessed in over a day,
files that were created over a week ago (ensuring that even minor
template changes get propagated), and deletes all directories which
have been made empty as a result.
Not surprisingly, most of the files that remain are html files,
despite all the various alternative formats I support. Even so, the
total cumulative effect of all the various bots running throughout
the day is to only touch 20 to 50% of my blog entries. This works
out to be approximately 1 to 2 a minute, though the reality is
considerably more bursty than that. Only 2 to 3% of all my entries
(currently this number is 28 out of 1207) are touched more than
once in a day, many of them by actual humans, typically be
following inbound links or google queries.
All of this data could have been obtained by analyzing my Apache
logs, but it is much more readily apparent by looking at my
cache.
I remember when reading over Phil's post thinking, "What this guy needs is a Squid httpd accelerator.", but I got distracted, and forgot to post.
So, I'll mention it here. If you're wanting the flexibility of "fried" publishing, why not simply use Squid to proxy frequently requested pages, and serve them out of memory?
Kellan - at the moment, I appear to have all the advantages of Apache (e.g., ETags, high speed serving of "static" pages", htaccess and htpasswd, etc.) with all the advantages of dynamic pages (e.g., posting a comment on any blog entry is instantly visible).
The times I have checked, the machine is typically 98% idle.
I am using a technic similar to Sam's instead of Squid for a few reasons:
- Cache invalidation is easy. All I need to do is delete some files.
- This technic is a lot easier to get going than installing and configuring Squid. It only requires a couple of Apache configuration directives and a few extra lines of code in the CGI script.
- My goal was to eliminate the latency due to invoking a CGI script. The performance of Apache's default handler is more than adequate for my little site.
- It's possible that I will move my site to a host where I cannot install Squid.
This is one of the nice things about AxKit <http://www.axkit.org>. The default is a "fried" approach since you a lot of dynamic control along with AxKit's caching.
Of course, because of the way caching is done, you can't (easily) analyze the site stats in the way that Sam has done with his fried content.
CornerHost provides servers more than capable of keeping up with the load of dynamically generating every page on this site on every request, with more than enough cycles to spare for other bloggers on the same machine. Despite this, caching...