HTTP Caching and Zope
Isn't Caching Automatic?
There are several layers of caching built in to Zope:
These caches reduce the time taken to execute any Zope query, however they are all ultimately limited. They can not reduce the time for a single request below a hard limit that is made up of:
Zope itself cannot do anything about these overheads. They can only be eliminated by something outside of Zope - and that means exploiting the HTTP caching mechanisms.
This HowTo is a brief guide to using caching in Zope. For the full story, see RFC 2616.
Why Isn't Caching Automatic?
Caching isn't a win-win solution. It carries risks of making your Zope system more complicated than it needs to be, and the possibility of clients seeing outdated information. Without care, it is possible for caching to waste more time than is saved (calculating and checking the HTTP caching headers)
Think of caching as an Optimisation technique. Like any other optimisation, its only worth using after performance measurements show that there is a problem worth solving.
Who Provides the Caches?
The closer a cache is to the client, the more effective it is for him. The first few techniques presented here target caches built in to browsers, or caches shared by a group of users.
The technique of using a cache close to the Zope server (possibly on the same machine) is discussed further below.
What can Benefit from Caching?
Frequently accessed documents
Any document that is accessed many more times than it changes is a good candidate for caching. Examples are:
Users browsing though the site should, ideally, only have to download each of these items only once. This can be achieved by using dtml to set the Cache-Control header. At its simplest:
The 3600 is the number of seconds for which a client (or intermediate cache) should not re-request the object. You need to tune the number for your application. In HTTP terms, the response is considered fresh until the specified time has elapsed. In general, an object will not be re-requested again if the client (or intermediate cache) knows it is fresh.
The Expires header is an alternative to max-age if your documents expire at a specific time. For example, daily news that is always updated at 9AM.
<dtml-call "RESPONSE.setHeader('Expires','Thu, 01 Aug 2000 09:00:00 GMT')">
Using this with images on html pages is particulary effective because it allows the browser to display the image (using the cached data) as soon as the html is loaded.
Unfortunately Zope's standard Image and File objects do not provide an easy way to set this header (as of version 2.2. This may get added in a later version). A cheat is to provide a DTML method that does just that, put it somewhere in the acquisition context, and name it in the Precodition field. This method will then be called before the File or Image data is returned.
Documents that are not secret
A browser will send authorization information for every request after the first one that needs it. This authorization information prevents subsequent responses from being stored in a shared cache. If your objects would be publicly visible anyway, then this can result in underusing a shared cache.
This can be avoided by adding a
<dtml-call "RESPONSE.setHeader('Cache-Control','public, max-age=3600')">
If you are developing a reusable product (ZClass or Python product) then you may not know whether your object should be public. The following python code can be used to determine whether an anonymous user can call that method object - and whether the public directive is appropriate.
class MyObject...... def mymethod(self,REQUEST,RESPONSE): """Something that gets used alot by authenticated users, but probably isnt private. """ if anonymous_access(self.REQUEST,self,self.mymethod__roles__): RESPONSE.setHeader('Cache-Control', 'max-age=60, public') else: RESPONSE.setHeader('Cache-Control', 'max-age=60') def anonymous_access(REQUEST,ob,roles): """Check whether an anonymous user would be able to access the given object, with the given roles. """ while 1: if hasattr(ob,'__allow_groups__'): other_user = ob.__allow_groups__.validate(REQUEST,None,roles) if other_user is not None: return 1 if hasattr(ob,'aq_parent'): ob = ob.aq_parent else: break
If a document is sufficiently large that the overhead of transmitting its content is a problem, then using the If-Modified-Since header can be effective. This allows the server to return a small response (including a 304 status code) if the document is unchanged since the client last retrieved it.
This technique is used by the standard File and Image objects, and that code is easy to steal if you are developing a product that behaves in a similar way. Beware that this code has a flaw: The Last-Modified, Content-Type and Cache-Control headers really should be set in the response even when returning a 304 status code.
The real complication with use of this technique is calculation of a last-modified time
for each document. For objects that store their data in one ZODB object the last-modified time
is easily obtained from the
For most complex documents the answer involves a kludgey process of determining which ZODB objects and files make up the document, and finding the maximum of all the individual modification times. This might be a problem to consider at an early stage when designing a cache-aware product.
Documents that are a little bit secret
It is possible to allow documents which normally require authorization to be stored in a shared cache, by including a cache-control header that force the cache to revalidate the response every time (allowing Zope to check authorization). This is only a benefit when combined with If-Modified-Since processing This is not a high-security solution since it relies on the shared caches to be well behaved.
This can be achieved using the following mix of cache-control directives:
Caching is also useful for documents that are expensive to generate, since it is possible to avoid recalculating the document on subsequent requests. Caching is used to reduce server load, rather than to improve performance for individual users.
The Zope logic is no different to that already described, however the same techniques are more effective when an external HTTP proxy cache is used close to the Zope server. This might be:
Note that an external cache is only useful if your expensive documents are not confidential. If they require authorization then the external cache will not store them.
When Caching is a Problem
Some caches can be configured to return documents that have not been proved to be fresh. If
it is a problem for users of your application to ever see out-of-date information then add the
But, note that they might still see old pages by pressing a Back button.
HTTP 1.1 includes a mechanism to cover documents that depend on the value of a request header
(perhaps user-agent, or accept-language). Unfortunately the
The presence of cookie headers in requests or set-cookie header in responses does not affect whether or not an HTTP reply can be cached. If the content of a document depends on a cookie then you should treat the document as described for 'Dynamic Content'.
Although a response that included a
If your documents are intended only for a single user, include the
If the documents contain sensitive information that should not even be kept in a private cache (where it
might escape onto a backup tape, for example) the include