You are not logged in Log in Join
You are here: Home » Members » mcdonc » HowTos » Prevent And Debug Memory Leaks

Log in
Name

Password

 

Prevent And Debug Memory Leaks

Prevent And Solve Memory Leak Problems

As you probably know, Zope is an extensible framework written almost entirely in the Python scripting language. Certain Python code manipulations can lead to memory "leaks", in which RAM is consumed and never given back to the operating system. Continued exercise of the program with the "leak" eventually leads to an out of memory condition. Unless the operating system has a "hard" limit on the resource consumption of the process, or unless the Python/Zope process is frequently restarted, the out of memory condition can become a serious problem. An out of memory condition will almost certainly cause Zope to fail, and it has a very good probability of causing other OS services on the same box to fail as well. This can have an impact on the perceived and actual stability of your codebase.

I'm sorry to say that these errors are unfortunately usually caused by... well.. you, the programmer. Poor programming or programming without knowing the bounds of the system can cause memory leaks. Because Zope is a "long-running" process (as opposed to a "one-shot" CGI script), poor programming practices cause more damage than you may be used to if you're an old hand at CGI coding.

In this how to, we'll explore how to code defensively in order to not introduce memory leaks into your code in the first place. We'll also explore how to track down and fix these types of errors in your Zope programs in case they do occur.

A Garden-Variety Memory Leak

One example of a common source of memory leaks is an operation which rapidly increases the size of a global list or dictionary without any sort of bound or any sort of "clear" operation at a checkpoint.

As an example, here's a bit of code that leaks like a sieve:

     l = []
     def leak(self, item):
         l.append(item)

Note that every time the leak method is called, it appends an item to the global l list. What happens when it's called a hundred thousand times? A hundred thousand items are added to the l list. If this method is part of your code, it's called freqently, and there is no point in the code at which the list is cleared, you will eventually run out of memory. It's not a question of if, it's a question of when, where when is dependent on how large the objects are that are added to l, and how frequently the leak method is called.

How do you solve this? Well.. don't do this. Don't append to global lists or dictionaries without eventually clearing them. As a matter of fact, it's often wise to use globals very sparingly. If you need to use a global, reconsider your application design. If you've reconsidered your application design, and you still need to use a global, make sure your code cleans up after itself by removing items from the global (or by deleting the global itself).

This is a common sort of memory leak, but it's by no means the only kind. Another more insidious memory leak is caused by a nonobvious leaking of references.

What Is An Object Reference?

Central to the Python object model is the concept of references.

Creating a Python object via Zope code is possible and happens all the time. For example, this code creates a Zope DateTime object and stuffs it into the REQUEST:

     <dtml-call "REQUEST.set('date', ZopeTime()">

We can also pass an object we create along to other functions and methods. For example, we can pass the DateTime object we just created to a parse_date function if we make this call somewhere else in the same set of code:

     <dtml-var "parse_date(REQUEST['date'])">

When Python first creates an object, it increments a reference count related to that object by one. A fancy way of saying the same thing is that when an object is first created, it has a "reference count" of one. As you pass the object around, or assign it in different ways as belonging to other objects, its reference count gets incremented.

So for instance, if the parse_date function we're passing our newly-created Zope object to assigns it to another object, its reference count goes up by one. Let's see this in action. We'll assume that parse_date is an external method in this case:

     def parse_date(self, date):
         self.date = date

There, we just created another reference to the DateTime object we created by assigning it to the name self.date.

When an object's reference count dwindles to zero, Python frees the memory consumed by the state of the object. This is what's termed as a "reference counting garbage collection strategy". It's perfectly normal for reference counts to grow. After all, it's completely normal to assign an object to a name in a function. But a problem surfaces when references to an object begin to grow out of control, beyond the expectation of the programmer.

Each reference requires a finite amount of memory. If an object's reference count is always growing without bound, it causes memory to leak. Such a phenomena is often caused by "circular" references. In Python versions before 2.0 (and Python 2.0+ without garbage collection turned on), circular references are a problem and can be the source of memory leaks.

What Is A Circular Reference?

A circular reference is

>>> a = [1,2,3] >>> a = [1,2,3,a] >>> a [1, 2, 3, [1, 2, 3]]

If an object is hanging around in memory.

Tracking Down A "Live" Memory Leak
  • isolate the server
  • use a load testing tool like "ab" or OpenSTA.
  • use exclusion via binary search in conjunction with manage_debug and the load tester.
Test Your Code Before You Ship It
  • use "ab" or (better) OpenSTA.
Do's and Dont's

DO use global variables sparingly.

DO use REQUEST.set and REQUEST.__setitem__ sparingly.

DO use SESSION.set and SESSION.__setitem__ sparingly (if you've installed a session manager).

DO test your code for leaks before you release it using a load-testing tool.

DONT stuff things into a global variable without later clearing or deleting the variable.

DONT stuff things that are acquisition-wrapped into REQUEST or into a session, or into any other object that is transient.

DONT intentionally create circular references unless you understand exactly what you're doing. If you have an application where you need to do so, instead try the new "weak references" feature of Python 2.1. Additionally, turn on garbage collection under Python 2.0+.

DONT write "tricky" code. There's usually a better, easier, clearer way to do it.

See also Sam Rushing's Tracking Down Memory Leaks In Python