You are not logged in Log in Join
You are here: Home » Members » glpb » solaris

Log in
Name

Password

 
Netsight logo Zope logo University of Bristol Logo

Report into Solaris Zope Performance

Matt Hamilton, Paul Browning and Tony McDonald
March 2002


Please see this Postscript.

Summary

Problems with performance of Zope under Solaris have been reported in various forums for at least a year. The problem is not universal - some sites report that Zope runs well under Solaris. To date, however, a clear identification of the issues and a well-publicised fix has not been forthcoming.

The University of Bristol was first exposed to the problem in April 2001. A year on the issue is now threatening the viability of growing number Zope-hosted applications at the institution. Moreover, the problem does nothing by way of reassuring decision-makers that Python offers a serious and practical alternative to the oncoming Java Juggernaut.

One of the challenges for the Zope community in pinning down the problem has been the number of variables involved. Another has been the absence of a generally accepted test of "Am I threading and, if so, How well am I threading?".

Initial suggestions for an empirical test focused on whether Python on Solaris (built with pthreads) was threading properly (as it appeared that Zope requests were only being served one at a time). It was reported that threading appeared to work on a small Sun Ultra 5 but not on a larger Sun Enterprise. This investigation confirms that multi-processor machines have trouble with the test and block all other threads from running whilst the test runs, but that uni-processor machines work without problems.

The initial test showed Solaris creating and running each thread in turn, whilst on Linux, FreeBSD and NT, the threads were created and run in an intermingled un-deterministic way - as would be expected if threading was working correctly. It was initially held that threads were not working correctly with Python on Solaris. This is not actually correct - it turns out to be a characteristic of the Solaris threading implementation.

Solaris has a different way of handling threads compared to Linux, FreeBSD and Windows - a two level implementation in which multiple threads are multiplexed onto fewer Light Weight Processes. This gives the advantage that fewer system resources are needed to manage a large number of threads, as only enough LWPs are created as are needed to ensure the threads do not block. This structure also allows better use of multiple processors as separate LWPs can run on separate processors.

Under Solaris it is possible to create bound threads. These are threads which are mapped one-to-one with LWPs, i.e. each thread runs on its own LWP and in essence is scheduled by the kernel and not the user-level thread manager. This has the advantage that each Zope thread will be independent and never cause other threads to block.

Solaris 8 offers a clean way of doing one-to-one mapping of threads by linking to an alternative threading library at runtime. The same result can be achieved with earlier versions of Solaris by recompiling the Python binary with an extra two lines of code to change the scheduling scope of the threads. Python 2.1.2 now includes this code.

This report documents tests aimed at assessing the performance of the latest Python code with and without LWP on both uni- and multi-processor machines. The results indicate that under simulated loading of a 'real' site, the Solaris platform can perform adequately. Additionally, the tests suggest that Zope on a multi-processor Solaris server can indeed benefit from the additional processors.

The report also investigates in a cursory way the effects of adding a database adapter to the equation. A test was carried out to compare the effect of changing the threading model on a site doing queries against an Oracle database using DCOracle2. It was found that queries that involved writes on the database were forced to be completed in serial (this is unsuprising), whilst read requests could be carried out in parallel. In addition, ZSQL Methods have the ability to cache the rows they fetch from a database for a given query. The results show that caching of SQL queries can produce a significant performance boost.


Introduction

This report documents an investigation carried out into the performance of Zope on Solaris. The investigation was intended to understand better the cause of several performance problems with Zope (or more accurately Python) on Solaris and, if possible, to recommend solutions.

This investigation was funded under the VIOLET Project, University of Bristol and carried out by Matt Hamilton of Netsight Internet Solutions.

Environment

The following machines were used in the testing:

NameHardwareProcessorsOSPystones
SerenaSun Blade 10002 x UltraSparc III 750MHzSolaris 88,400
VenusSun Ultra 601 x UltraSparc II 450MHzSolaris 84,830
FSASun Enterprise 2502 x UltraSparc II 400MHzSolaris 74,570
UberdarkDell Poweredge 23002 x Pentium II 400MHzFreeBSD 4.53,870
Lib-srvr3Viglen Genie 2+1 x Pentium II 400MHzWindows NT 44,820

Python 1.5.2 and 2.1 were used with Zope 2.3.2 and Zope 2.4.3 respectively. Both the Python and Zope were compiled from source with gcc. No attempt was made to compare Python binaries compiled with Sun's C compiler (reports indicate that it can provide up to 10% increase in pystone performance).

Background

At the start of 1999 Paul Browning wrote a Zope application to help students of the University of Bristol find accommodation in the city. The site queried an Oracle database for information maintained by the University's Accommodation Office. The original application used Zope 1 and ZODBCDA to make a database connection to Oracle via SQLnet.

The application comes under heavy load for one or two days each year during the spring term when the new information on accommodation available in the forthcoming academic year is first released.

The application first went live on Lib-srvr3 on 22nd April 1999 at 10:00. By 10:15 it was in serious trouble. During the first hour Paul had to re-start the Zope server three times. At peak load it was serving about 2 or 3 hits per second. It then ran happily for the next year.

In 2000, having upgraded to Zope 2 but with Lib-srvr3 otherwise unchanged, the new information went live at 10:00 a.m. on 16th March. In the first 20 minutes the site had 3171 hits (just under 3 per second). The server might have been slow at times but it never melted down. Over the next 24 hours the site had 28827 hits. There were 117 hits midnight - 01.00, 56 hits 01.00-02:00 and no hits 02:31 - 06:56 (it is good to know students sleep sometimes).

By 2001 the University was now evaluating Zope as a centrally supported service and Zope 2 had been installed on FSA complete with ZOracleDA-DCOracle as a database adapter. Paul had also ZOracleDA-DCOracle installed and working on Lib-srvr3 and had modified the application to work with this. The application was migrated to FSA and he waited for the big day without a care in the world - the mighty Solaris box would deal effortlessly with the expected peak in demand and provide an exceptional new level of service for students ....

On 7 March 2001 at 10:00 information about rented accommodation in the private sector for the next academic session went live - and the Zope component of the central web server (including that providing a gateway to electronic journals) ground to a halt minutes later.

Some hasty footwork saw the application migrated back to Lib-srvr3 and by 11:20 things were back on-line. From 11:20-13:00 the Lib-srvr3 server took 6953 hits on the search pages, giving an average rate of just over one database query per second. Over the next 21 hours the server took 17960 hits giving a total of about 25000 over the first day. The server ran without interruption until the application was returned to FSA later in the year where it seems able to cope with the typical background load.

The Accommodation application has not stood still. Originally Accommodation Office staff maintained the Oracle database via Microsoft Access. During 2000 Phil Harris was engaged to write a set of Web-based administration screens to sidestep our reliance on Access. Paul was also keen to incorporate a "shopping basket" feature for students browsing the various properties available and Phil developed a prototype for this using SQL Session (which worked fine against a local Access database but was too slow to be useable against Oracle).

In 2002 Tim Hicks re-developed the administration screens (in the light of additional functionality requested by the Accommodation Office) and also implemented a "shopping basket" using Core Session Tracking. This went live at 10:00 on 13th February - but on Lib-srvr3 and not FSA as the Solaris performance issues remained unresolved. As in previous years a feeding frenzy then followed. Here are some statistics for the Web site:

Period
Hits
Percentage over
24 hours
10-11
27933
19.5
11-12
19883
13.9
12-16
51487
36.0
17-10
43747
30.6
Total
143050

Is this busy by University of Bristol standards? On a typical day in February 2002 the central Apache-hosted Web server had something over 500000 hits.

The Accommodation application is now but one of many Zope applications in use at the University of Bristol. Inspired by the pioneering work of UWCM and Newcastle, Zope is being used to build bespoke Virtual and Managed Learning Enviroments (see the VIOLET Project). A major concern, therefore, is the performance of "mission critical" applications on our current standard hardware platform - namely, Solaris on Sparc. Tony McDonald has also reported problems with Zope and Solaris at Newcastle University and has repeatedly raised these issues on the Zope mailing lists but without a satisfactory resolution of the problem.

The motivation for this investigation, therefore, was to provide some more data points which might help in seeing these outstanding issues being resolved.

Original Findings

The original tests focused on whether Python on Solaris (built with pthreads) was threading properly as it appeared that Zope requests were only being served one at a time. Tony suggested two benchmarking tools:

  • Threader - a Python Script run within Zope that used ZopeFind to do do multiple traverses across the ZODB.
  • test_thread.py - A test script found in the Python distribution that creates multiple threads that run for a random amount of time and then exit.

Tony found that the Threader code appeared to work on a small Sun Ultra 5 but not on a larger Sun Enterprise. Similar reports on Usenet and our own findings showed the multi-processor machines had trouble with the test and blocked all other threads from running whilst the test ran, but that uni-processor machines worked without problems.

The test_thread.py test showed Solaris creating and running each thread in turn, whilst on Linux, FreeBSD and NT, the threads were created and run in an intermingled un-deterministic way - as would be expected if threading was working correctly. It was believed that this meant that threads were not working correctly with Python on Solaris. This is not actually correct - it is a characteristic of the Solaris threading implementation.

Whilst Linux, FreeBSD and NT context switch quite often and do time-slicing amongst threads, Solaris does not. Threads on Solaris will not yield unless they cannot continue or are explicitly told to yield. By adding time.sleep(0.1) calls in the loops of the thread bodies to force the threads to yield it was shown that the Solaris threads were indeed working as they should.

Solaris Thread Implementation - LWP

Solaris has a different way of handling threads compared to Linux, FreeBSD and Windows - a two level implementation in which multiple threads are multiplexed onto fewer Light Weight Processes. This gives the advantage that fewer system resources are needed to manage a large number of threads, as only enough LWPs are created as are needed to ensure the threads do not block. This structure also allows better use of multiple processors as separate LWPs can run on separate processors.

In applying the Threader test it was found that Zope running on multi-processor Solaris servers would block and no other requests would be served whilst the threader process was running. Strangely uni-processor Solaris machines did not exhibit this behavior and worked as normal.

On further investigation it was found that Solaris does not do any timeslicing of threads within an LWP. This means that once a thread has started then it will not be interrupted until it has finished. Should it block, the OS would create another LWP and move any threads waiting to be run onto this LWP. As shown above in practice this does not always happen on a multi-processor machine and a thread blocked on IO will hold up all other threads in the same LWP (in practice, all Zope threads).

One explanation for this could be that on a uni-processor machine the LWP is swapped on and off the processor as other processes require CPU time. When the LWP is put back on the CPU the user-level thread manager picks a thread read to run from the runnable queue. These interruptions prevent an IO bound thread from blocking the entire LWP.

Under Solaris it is possible to create bound threads. These are threads which are mapped one-to-one with LWPs, i.e. each thread runs on its own LWP and in essence is scheduled by the kernel and not the user-level thread manager. This has the advantage that each Zope thread will be independent and never cause other threads to block. The downside is that it may consume slightly more system resources and in theory take slightly longer to switch between threads - it is more costly to switch processes (LWPs) than it is to switch threads within the same LWP. In reality the overhead should be negligible for the small number of threads Zope uses (typically four).

Solaris 8 offers a clean way of doing one-to-one mapping of threads by linking to an alternative threading library at runtime (see man threads). The same result can be achieved with earlier versions of Solaris by recompiling the Python binary with an extra two lines of code to change the scheduling scope of the threads. Python 2.1.2 now includes this code. A description of these modifications can be found in the appendix. The revised code will detect whether the host system supports it at build time and will automatically enable it if supported.

Reports on Usenet indicate that this approach can provide a 10-20% performance increase in BEA Weblogic (a Java-based application server). Sun themselves describe this approach to threading for Java applications and indicate it might give performance increases on Solaris.

'Real Life' Load Test

In order to properly evaluate the effects of various changes in setup and to compare performance, Zope 2.4.3 was installed on Serena and Venus and a copy of a site designed by Netsight was imported into each of the Zope instances. Several months of log files were then taken from Netsight's web server and run through a sed script to change the host name and replayed against Serena and Uberdark to assess the performance differences. The tests were performed using http_load invoked as:

./http_load -p X -s 10 url_file

where X is the number of parallel requests and url_file is the file of URLs to fetch from the server.

Http_load attempts to make as many requests as possible within a given time (10 seconds in this case) and reports back the number of fetches per second. The tests were run 5 times each against each host, the highest and lowest result discarded, and the mean result taken. On the Solaris machines tests were done using both the standard threading implementation and also the alternative implementation (one-to-one mapping of threads to LWPs) these results are shown on the graph as "-lwp".

In each case Zope was started with the default setting of 4 threads.

The investigation was constrained by geography and time. Uberdark is on the same local network as the workstation used to simulate the client requests , whilst Serena and Venus are on another network 15ms away from the testing host. This may account for the relatively high request rates of Uberdark at low concurrency compared to Serena and Venus.

The results above indicate that under simulated loading of a 'real' site, the Solaris platform performs fairly well. However Uberdark with a pystone of 3,870 achieves almost double the requests/sec of Venus with a higher pystone of 4,830. This leads to the conclusion that raw Python speed alone is not the only factor that affects Zope's performance. FreeBSD's threads are only user-level and hence Python on a multi-processor FreeBSD server does not take advantage of the multiple processors which results in flat scaling. Serena, having a pystone result 2.2 times greater than Uberdark, achieves over 3 times the request/sec than Uberdark. This, along with the increased throughput at higher levels of concurrency, suggests that Zope on a multi-processor Solaris server does indeed benefit from the additional processors.

The -lwp tests show that the alternative threading model does not provide any benefit in this application, and if anything in this instance degrades the server performance slightly.

Oracle Test

A test was carried out to compare the effect of changing the threading model on a site doing queries against an Oracle database using DCOracle2. It was found that queries that involved writes on the database were forced to be completed in serial (this is unsuprising), whilst read requests could be carried out in parallel.

The graph below shows the number of requests/sec achieved by Lib-srvr3 and Serena serving the original Accommodation application (search01) as written by Paul Browning. Both servers are using DCOracle2 and connecting to the same database server (hprod). Both normal threading and the alternative threading implementation (shown as -lwp) were tested on Serena. The tests were carried out using the same methodology as the previous tests (http_load and a list of actual urls).

In addition, ZSQL Methods have the ability to cache the rows they fetch from a database for a given query. This can be found under the 'Advanced' tab of the ZSQL Method in the Zope Management Interface. We set the cache timeout to 20 seconds for each of the ZSQL Methods used in the site.

The results show that caching of SQL queries produces quite a significant performance boost. Interestingly, Serena benefits from caching much more than Lib-srvr3, possibly because the caching of the queries reduces the need to call the database driver which could be a contention point for a multi-processor system.

The threading implementation seems to have little effect on the overall performance.

Conclusions

The main findings of this investigation are:

  • Python (and therefore Zope) on Solaris Sparc can be made to work adequately, but better price/performance is available via other operating systems and platforms.

  • If you do use Solaris then use Python 2.1.2 or greater and if possible Solaris 8 or greater.

  • There are many other factors (such as SQL Query caching) that will have far more of an impact on the performance of a Zope application than the platform alone.

The actions Solaris sites should consider are summarised in the table below:

Python

Python <2.1.2

Python>= 2.1.2

ProcessorUniMultiUniMulti
Solaris <8

OK

Either: Upgrade to Python 2.1.2
Or: Recompile with modifications described in Appendix

OK

OK
(provided you build Python from source on the target server)

Solaris >=8

Either: Upgrade to Python 2.1.2
Or: Recompile with modifications described in Appendix
Or: Put /usr/lib/lwp into LD_LIBRARY_PATH before starting Zope

Python bundled with Zope binary installations*

Zope <=  2.3.3Python 1.5.2
Zope <=  2.4.3Python 2.1
Zope >=  2.5.0Python 2.1.2
*You are recommended to always build Python from source under Solaris

Recommendations for further investigation are:

  • Audit the state of Solaris system patches on the various machines used in the study and, if warranted, apply the latest patches that may impact on threading performance. Re-run the tests to assess whether performance has been effected.

  • Experiment with building Python using Sun's C compiler.

  • Explore the performance of adapters for other databases.

References

Solaris Threads an BEA performance
Java and Solaris Threading
Scheduling in the user threads library

Appendices

Threader

The code below should be put in a DTML Method. It causes lots of traversals across the ZODB. Try running it in one window, whilst at the same time trying to navigate the Zope Management Interface with another window.

<dtml-var standard_html_header> 
<dtml-var ZopeTime> 
  <dtml-in "[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]"> 
    <dtml-in "ZopeFind(this(), '', '', '', '', '' , '', '', '', 1)" 
       sort=bobobase_modification_time reverse> 
       * 
    </dtml-in> 
    <br> 
  </dtml-in> 
<dtml-var ZopeTime> 
<dtml-var standard_html_footer> 

test_thread.py

test_thread.py is installed with any python distribution and can be found in the 'test' directory of a python install, for python 2.1 on Unix this is something like:

/usr/local/lib/python2.1/test/test_thread.py

Building Python

Python versions 2.1.2 or greater automatically detect Solaris and by default use the alternative threading implementation. However previous versions running on Solaris 8 can be forced to the the alternative implementation by linking to a different thread library. To do so, simply set the environment variable LD_LIBRARY_PATH to /usr/lib/lwp, e.g. with /bin/sh:

LD_LIBRARY_PATH=/usr/lib/lwp; export LD_LIBRARY_PATH

Python versions prior to 2.1.2 on Solaris versions prior to 8 require the Python binary to be recompiled with the following line in Python/thread_pthread.h (around line 106) changed from:

pthread_create(&thread1, NULL, (void *) _noop, &dummy); 

to the following lines:

pthread_attr_t attr; 
pthread_attr_init(&attr); 
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); 
pthread_create(&thread1, &attr, (void *) _noop, &dummy); 

You then need to run 'Make' again from the root of the source to re-build Python.