You are not logged in Log in Join
You are here: Home » Members » glpb » solaris » Report into Solaris Zope Performance - The Postscript

Log in
Name

Password

 

Report into Solaris Zope Performance - The Postscript

Last updated 3rd September 2002

This report (March 2002) painted a rather depressing picture for "poor little rich kids" obliged to run Zope on Solaris. The following is offered by way an update. The residual crunch issue seems to be what happens when you put a database adapter in the mix. We're not yet sure whether the various tweaks we've applied cure Zope applications that make heavy use of back-end Oracle databases.

For advice on how to optimise Python on multi-processor boxes in general see here.

Various snippets culled from mailing list archives

-----------------------------------
> On Tue, 28 May 2002, Matthew T. Kromer wrote:
>
> I do *not* recommend running Zope on multiprocessor machines without an
> ability to restrict Zope to execution on a single CPU.
>
> The reason for this is that the Python Global Interpreter Lock is shared
> inside a Zope process.  However, threads in Python are backed by
> underlying OS threads.  Thus, Zope will create multiple threads, and
> each thread is likely to be assigned to a different CPU by the OS
> scheduler.  However, all CPUs but one which are dispatching any given
> Zope process will have to then wait and attempt to acquire the GIL; this
> process introduces significant latency into Python and thus into Zope.
>
> Linux has no native mechanism for processor binding.  In fact, there is
> a CPU dispatch mask for processes, but there is no facility to set the
> mask that I know of.  Solaris can use a system command like 'pbind' to
> bind a process to a particular CPU.

-----------------------------------
Tim Hoffman wrote:

>Thats what Solaris is for ;-) Processor affinity, processor sets, fair
>share scheduler etc....
>
>Rgds
>
>Tim
>
>

The downside of Solaris is that Python executes more or less at the same speed on all CPUs, primarily determined by clock speed.  Thus, a 2 Ghz Pentium IV or Athlon blows away expensive SPARC CPUs for running Zope.

--
Matt Kromer
Zope Corporation  http://www.zope.com/
-----------------------------------
[email protected] wrote:

>Hey Matt, while we are on the subject, I just wanted to make sure I had
>something right (and figured you might know):
>
>I know that affinity helps a single Zope instance on an SMP box, but how
>about 2?  With Python's global interpreter lock (or whatever makes
>Python/Zope best suited to bind to a given CPU) - am I still going to run
>into problems if I am using the same Python to run 2 Zope instances in
>different software/instance homes, with all respective processes for each
>given Zope bound (via affinity) to respective CPUs?  Or is this okay?  I
>have some existing 2P AthlonMP and Xeon boxes, and I'd like to get the most
>zoom out of their horsepower as possible...
>
>Any thoughts are greatly appreciated.
>
>Thanks,
>Sean
>
>
 

As long as *each* Python process (and thus each GIL) is processor bound, you'll be OK.  Any python that isnt pbound (on Solaris) or have its processor dispatch mask set (on Linux) on a multiprocessor machine will end up slowing down when the machine is partially idle.

A machine that's going full-throttle isn't as bad, curiously enough -- because the other CPU's are busy doing real work, the GIL doesn't have as much opportunity to get shuffled between CPUs.  On a MP box it's very important to set sys.setcheckinterval() up to a fairly large number, I recommend pystones / 50 or so.  You can make the number higher still without much concern.

There ought to be a practical upper limit for sys.setcheckinterval() (because eventually you'll enter a routine that yeilds the lock anyway) but I dont know what that is.  I want to say the minimum bytecode path thru Zope is about 24,000 bytecodes, but that's just a WAG -- but that would represent the "insanely high" outer bound for sys.setcheckinterval().

Why mention the check interval?  On a busy enough system, increasing sys.setcheckinterval() decreases the amount the GIL is released, and as such, helps so that you can run more than one Zope on a multiprocessor machine *without* processor binding.

I found, when I did tests about 2 years ago (wow, long time ago!) that I got diminishing returns above pystones/50, but below that I got good results.  You may discover that you can get performance close enough to processor binding that you chose not to bind.

--
Matt Kromer
Zope Corporation  http://www.zope.com/
-----------------------------------
Tim Hoffman <[email protected]> wrote:
Processor set's have been in Solaris since around 2.6. and are very
usefull if you have multiple cpu's

In a multicpu environment you can create a set of 1 or more cpu's and
nothing else will run on them except for processes assigned to the set.

You can even force things like the NFS kernel threads to occupy a single
CPU (if that is something you really want to do ;-) by starting up nfsd
in a processor set, the nfsd process is schedulable entity for the
kernel threads, and if it runs in a processor set which has a single cpu
then the kernel threads will get scheduled to run there.

The big difference is with pbind you bind a process to a CPU, but other
stuff can and will run on the CPU. If you use processor set's you can
exclude all other processes from running on the CPU, other than those
processes that have been assigned to the set.

Processor set's have really replaced pbind. You would only use pbind if
you want to force processor affinity and limit the task to a single CPU,
but let other tasks use the cpu.

try docs.sun.com and search for psrset.

Solaris 9 introduces a Resource manager and even finer grain scheduling
of resources, and using resource pools and projects, and new scheduling
classes such as fixed priority and fair share scheduling.

People really shouldn't be afraid of running python/zope on solaris,
especially in multi-cpu environments, as long as they understand how
python works, because you actually have much better resource control
than pretty well any other O/S out there, but as they say use the right
tool for the job, and make informed decisions ;-)
-----------------------------------

What we've tried apparently successfully in Bristol

  1. Use pbind

  2. Inserted the following line into the Zope startup script:

        pbind -b 0 $$ > /dev/null

    This confines the script and any subsequent children to processor '0'.
     

  3. Use nice

  4. We are also running with a nice of -1.
     
  5. Tweaked Zope's Cache Parameters

  6. We got an apparent two to three fold increase in the performance of Zope when we increased the Cache Parameters from the default of 400 objects to 10,000 objects. It is hard to quantify but CPU usage went from a range 21-28% to 7-14%.
     
  7. Build Python using Sun C compiler

  8. We also recompiled Python and most of Zope using the Sun C compiler which gave us approximately 10% gain over gcc.
     

Thoughts of Chairman Matt (Hamilton not Kromer)

"....  I have yet to see a Zope site that needs multi-processors.  I think that there are many things (like caching) that can be done to improve the performance of a site before you need to consider multiple processors. "