Caching with mod_proxy
Caching can greatly decrease Zope's load but unfortunately there is is not much documentation about how to set it up.
In this How-To I put together what I found out. Everything was done on OpenBSD but should be transferable to other flavors of Unix or Linux. I would be very thankful to get comments and hear experiences. You can read more about caching in another How-To: HTTP Caching and Zope. And there even is another How-To about configuring Squid as an accelerator for Zope instead of using mod_proxy.
There are (at least) two ways to caching: Apache's module mod_proxy and Squid. The official Squid can so far only tunnel https and mod_proxy is convenient, because it's one of the default configurations for running Zope with https. So I stuck with mod_proxy. For information on how to set up mod_proxy as "frontend" to ZServer please refer to these How-Tos:
I configured my server following these How-Tos. To set up caching I put the following lines in the main server configuration section:
CacheRoot "/var/www/proxy" CacheSize 5 CacheGcInterval 4 CacheMaxExpire 24 CacheLastModifiedFactor 0.1 CacheDefaultExpire 1 CacheForceCompletion 100
The CacheForceCompletion 100 seems to be important to keep apache from caching broken images and other incomplete stuff.
Then I created the proxy directory and made it readable and writable by the user the server-CHILDREN are running as. This is "www" for OpenBSD and I think it's "nobody" for Linux:
cd /var/www mkdir proxy chown www proxy
To actually enable caching I still had to put the following line in each virtual server section:
Now Apache knows that it CAN cache stuff. But to make it actually DO it I had to make some changes to the standard_html_header. Here is a sample:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <dtml-call "RESPONSE.setHeader('Expires', _.DateTime(_.DateTime().timeTime() + 3600).toZone('GMT').rfc822())"> <dtml-call "RESPONSE.setHeader('Last-Modified', bobobase_modification_time().toZone('GMT').rfc822())"> <HTML> <HEAD><TITLE><!--#var title_or_id--></TITLE></HEAD> <BODY>
There are two dtml-calls in it: One to set the Expires header and one to set the Last-Modified header. From what I tried out so far the Last-Modified header is what's important for Apache to calculate how long to cache the page. For this calculation Apache uses the "CacheLastModifiedFactor 0.1" set earlier. The mod_proxy documentation didn't help me a lot to exactly understand the calculation so if anybody knows the details: Please send me an email. The Expires header seems to be important for the browser the know when to reload the page.
At this point my Athlon 650MHz 256MB RAM served more than 500 requests per second wheras before it was 67 rps and 47 rps using pcgi.
O.k., what else? I don't want everything to be cached so I put together another header that should tell everybody "Don't cache me!":
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <dtml-call "RESPONSE.setHeader('Expires', '-1')"> <dtml-call "RESPONSE.setHeader('Cache-Control', 'no-cache')"> <dtml-call "RESPONSE.setHeader('Pragma', 'no-cache')"> <HTML> <HEAD><TITLE><!--#var title_or_id--></TITLE></HEAD> <BODY>
That's it. The lack of documentation almost killed me and I feel that I still haven't completely understood caching with mod_proxy. So if you can contribute any information or have suggestions or wishes how to improve this How-To please let me know. Here is my email: firstname.lastname@example.org
-- Ragnar Beer
P.S.: Thanks to Digital Creations! Zope is great!
Here is some information I meanwhile got from the file "proxy_cache.c" in the Apache distribution. It will later be integrated into this document:
so we now have the expiry date (= from the expires header if there is one) if no expiry date then if lastmod (= Is there a LastModified header?) expiry date = now + min((date - lastmod) * factor, maxexpire) else expire date = now + defaultexpire
There is a security problem insofar that if you setup your server just like this it will be a public proxy that people can use to cover their traces when accessing other websites. To disable this behavior you could e.g. add a LocationMatch directive to your httpd.conf that denies every request that doesn't start with a slash.
<LocationMatch "^[^/]"> Deny from all </LocationMatch>
To try out the problem and see the effect of the fix configure your browser to use www.mysite.org port 80 as a proxy (assuming the name of your site is www.mysite.org). Then try to access a page from your browser and look at your access_log.