Report into Solaris Zope Performance - The PostscriptLast updated 3rd September 2002
This report (March 2002) painted a rather depressing picture for "poor little rich kids" obliged to run Zope on Solaris. The following is offered by way an update. The residual crunch issue seems to be what happens when you put a database adapter in the mix. We're not yet sure whether the various tweaks we've applied cure Zope applications that make heavy use of back-end Oracle databases.
For advice on how to optimise Python on multi-processor boxes in general see here.
> On Tue, 28 May 2002, Matthew T. Kromer wrote:
> I do *not* recommend running Zope on multiprocessor machines without an
> ability to restrict Zope to execution on a single CPU.
> The reason for this is that the Python Global Interpreter Lock is shared
> inside a Zope process. However, threads in Python are backed by
> underlying OS threads. Thus, Zope will create multiple threads, and
> each thread is likely to be assigned to a different CPU by the OS
> scheduler. However, all CPUs but one which are dispatching any given
> Zope process will have to then wait and attempt to acquire the GIL; this
> process introduces significant latency into Python and thus into Zope.
> Linux has no native mechanism for processor binding. In fact, there is
> a CPU dispatch mask for processes, but there is no facility to set the
> mask that I know of. Solaris can use a system command like 'pbind' to
> bind a process to a particular CPU.
>Thats what Solaris is for ;-) Processor affinity, processor sets, fair
The downside of Solaris is that Python executes more or less at the same speed on all CPUs, primarily determined by clock speed. Thus, a 2 Ghz Pentium IV or Athlon blows away expensive SPARC CPUs for running Zope.
>Hey Matt, while we are on the subject, I just wanted to make sure I
As long as *each* Python process (and thus each GIL) is processor bound, you'll be OK. Any python that isnt pbound (on Solaris) or have its processor dispatch mask set (on Linux) on a multiprocessor machine will end up slowing down when the machine is partially idle.
A machine that's going full-throttle isn't as bad, curiously enough -- because the other CPU's are busy doing real work, the GIL doesn't have as much opportunity to get shuffled between CPUs. On a MP box it's very important to set sys.setcheckinterval() up to a fairly large number, I recommend pystones / 50 or so. You can make the number higher still without much concern.
There ought to be a practical upper limit for sys.setcheckinterval() (because eventually you'll enter a routine that yeilds the lock anyway) but I dont know what that is. I want to say the minimum bytecode path thru Zope is about 24,000 bytecodes, but that's just a WAG -- but that would represent the "insanely high" outer bound for sys.setcheckinterval().
Why mention the check interval? On a busy enough system, increasing sys.setcheckinterval() decreases the amount the GIL is released, and as such, helps so that you can run more than one Zope on a multiprocessor machine *without* processor binding.
I found, when I did tests about 2 years ago (wow, long time ago!) that I got diminishing returns above pystones/50, but below that I got good results. You may discover that you can get performance close enough to processor binding that you chose not to bind.
In a multicpu environment you can create a set of 1 or more cpu's and
You can even force things like the NFS kernel threads to occupy a single
The big difference is with pbind you bind a process to a CPU, but other
Processor set's have really replaced pbind. You would only use pbind
try docs.sun.com and search for psrset.
Solaris 9 introduces a Resource manager and even finer grain scheduling
People really shouldn't be afraid of running python/zope on solaris,
Inserted the following line into the Zope startup script:
pbind -b 0 $$ > /dev/null
This confines the script and any subsequent children to processor '0'.
We are also running with a nice of -1.
We got an apparent two to three fold increase in the performance of Zope when we increased the Cache Parameters from the default of 400 objects to 10,000 objects. It is hard to quantify but CPU usage went from a range 21-28% to 7-14%.
We also recompiled Python and most of Zope using the Sun C compiler which gave us approximately 10% gain over gcc.