You are not logged in Log in Join
You are here: Home » Members » matt » Stability HOWTO » View Document

Log in
Name

Password

 

Stability HOWTO

STABILITY HOWTO

This document describes what actions you can perform to identify and diagnose problems with your Zope site, particularly, how to recognize when a crash has occurred and how to report it.

AUDIENCE

This document is for all Zope administrators; people who have installed and run Zope on their servers.

AUTHOR

Matthew T. Kromer (matt@zope.com)

REVISION

  1. 3

DATE

April 1, 2002

OVERVIEW

Zope can crash for a number of reasons. Most of them are very esoteric; the average Zope administrator will be very frustrated trying to do blind diagnosis on Zope.

Generally, crashes occur when Zope or Python attempt to perform a machine instruction which is not legitimate for the current state of the processor. This includes attempting to dereference a NULL pointer, or overwriting memory in storage.

Most user code cannot directly cause a crash, but it can trigger bugs in the underlying implementation.

HOW TO KNOW ZOPE IS CRASHING

Zope is usually two processes working in combination. One process is a controller, and is responsible for restarting the other process, which is where the Zope application work is performed. If the normal work process crashes, it should be automatically restarted by the controlling process.

On the control panel, there are several pieces of important diagnostic information. These are:

  • Zope Version
  • Python Version
  • System Platform
  • SOFTWARE_HOME
  • INSTANCE_HOME
  • CLIENT_HOME
  • Process Id
  • Running For

In particular, observing the Running For value will identify the "uptime" of the current Zope process. If this time is much less than what an administrator knows it should be, then a stability issue is causing Zope to restart.

WHAT THE MOST COMMON CRASHES ARE

There were three recent causes of crashes in the Zope 2.4 and 2.5 series of code, all of which have been addressed (at the time of this writing) by Zope 2.5.1b1 and Python 2.1.2. Those causes were:

  • Incorrect byte code compilation for run-time compiled scripts (such as PythonScripts)
  • A reversed argument pair inside Python's main loop for a particular error condition
  • A reference counting bug in the accelerated C access control machinery in Zope 2.5

Each of these problems is currently resolved to the best of our knowledge.

Python 2.1.2 contains all known fixes to the Python run-time system, and also contains checks to identify known problems with the Python compiler package which were present in earlier versions of Python.

Zope 2.5.1b1 contains all fixes to Zope, including an updated Zope compiler package, and a fix to the security machinery. Zope 2.5+ should be run with Python 2.1.2.

Zope 2.4.4b1 contains all backports of known fixes. Zope 2.4.4+ should be run with Python 2.1.2.

OTHER KNOWN ISSUES

Other problems exist, often related to specific systems. Sometimes, there is a workaround, sometimes there is not. For example, the following list of conditions are known problems on some systems:

  • FreeBSD threads default to a 64K stack size, causing stack overflow in Zope when more than 64K is required. Known solution is to modify Python's Python/thread_pthreads.h file to alter thread initialization to include a larger stack. (See the zope-dev mailing list archives for January 4, 2002).
  • MySQL is being used in a non-threadsafe manner. The latest (beta at this time) version of ZMySQLDA corrects this problem.
  • Zope "freezes" when the controlling terminal it was started from is disconnected. This occurs when Zope is running with the -D switch for debugging, and Zope tries to write output to a console which is no longer there. Remove the -D switch from the Zope start parameters. Alternatively, direct stdout and stderr to files, so that the writes do not block, or use the nohup command or similar.
  • Python 2.1.2 includes a known crash with deeply nested structures being deallocated (via a special mechanism known as the trashcan). This will be fixed in Python 2.1.3.

THE LIKELY SUSPECTS

Usually in a crash, the problem is caused by a C module operating on erroneous data. Most of the time, this means doing things like releasing memory, then continuing to access it.

Any extra modules or components loaded into Zope which have compiled components COULD be suspect when there is a crash. Often, these include database adapters, or other special purpose modules.

Occasionally, normal Python can cause some recursion errors which consume all available memory. This is highly unusual, but it can happen.

EASY WORKAROUNDS

There are a number of things to try when Zope is crashing to see if the crashes can be contained, or parameterized to assist diagnosis and corrective action. These things are:

  • Start Zope single threaded. By specifying the argument -t 1 to Zope at startup, it will run with a single thread. This can ease symptoms where C libraries which are not thread safe are invoked by installed Zope products. If the crashes stop, a C extension is probably the culprit.
  • Set the environment variable ZOPE_SECURITY_POLICY=PYTHON and if the crashes stop, that means the problem is in the accelerated C security module.
  • Change the Script_magic constant to a higher number (just add 1) in lib/python/Products/PythonScripts/PythonScripts.py to force all PythonScripts to be recompiled in memory at runtime. If this stops the crashing, please report it to us, as it means something unexpected is present in the Zope release you are using.
  • Modify z2.py to import gc then call gc.disable(). This will turn off runtime garbage collection. The process of garbage collection usually finds faulty data left around by other modules. This can make the problem go away in the short term, at the expense of leaking storage and having potential memory corruption occur. If this results in enhanced stability, file a bug report with Zope Corporation.

EASY DIAGNOSTIC SWITCHES

Enabling the following will allow you to capture supplemental information about what Zope was doing when it crashed:

  • Run with -D as a startup option to enable debugging mode.
  • Set the environment variable STUPID_LOG_FILE=file and watch that log file for additional messages.
  • Run with the -M flag to see the "big M log" file. This file contains additional information about how Zope is handling the requests.

ATTACHING GDB

Sometimes, the easy workarounds don't fix the problem. Instead, it becomes necessary to attach the debugger to Zope to find out where Zope is crashing. Under unix systems, gdb is often installed and available to perform diagnosis.

To attach gdb to a running Zope instance, first start Zope with the parameters -t 1 -Z '' to run in single threaded mode without running a separate monitor process. Obtain the process ID via ps or by looking in the STUPID_LOG_FILE set log file to see the process ID reported.

Attaching gdb is a matter of issuing gdb python processid and then hitting RETURN until you get a gdb prompt. Type c and press RETURN to allow Zope to resume execution.

Use Zope until a crash occurs. When this happens, gdb will return to the prompt and will identify where Zope process is at the point the failure occurred. Use the "w" command to find out "where" the program was at the time of failure.

Report to Zope Corporation via the Zope Collector the output of your gdb session (copy and paste it to a file).

REPORTING VIA THE COLLECTOR

If Zope Corporation lacks knowledge about a problem, it will not be able to provide remedies in a timely manner. To that end, the Zope Collector exists to collect problem data.

When issuing a report to the collector, it is useful to use an editor to compose a problem report. This problem report should contain:

  • The versions string information from the Control Panel
  • The products information from the Products page of the Control Panel
  • The gdb output (if applicable)
  • Any workarounds you have tried (if applicable) and your solutions
  • Your operating system and maintenance level
  • YOUR NAME
  • YOUR EMAIL ADDRESS

We may ask you to include portions of your system log or the "big M log."

It is very important to include your name and email address in the issue itself if you file an anonymous report to the collector. Without this information we cannot correspond with you.

Go to http://collector.zope.org/Zope/ and click "New Issue" in the left hand actions box. If you have a Zope.org membership, log in before submitting the issue.

PAID SUPPORT

While Zope Corporation does not require paid support contracts with customers to resolve reported defects in the problem, it also does not rely on customer requirements for a solution without a paid support contract in place. If the solution is critical to you to have addressed on your schedule, you should examine the offerings listed at http://www.zope.com/Services/SupportContracts to see if your needs may be best addressed by purchasing a support contract.

The purchase of a support contract guarantees that engineering resources will be assigned to your problem when you report it, and may also entitle you to specific engineering work on a priority basis for problem resolution.