You are not logged in Log in Join
You are here: Home » Members » Tres Seaver's Zope.org Site » Various Projects » HighlyAvailableZope » MarcusCollins » wikipage_view

Log in
Name

Password

 
 

MarcusCollins

Intro

Following suggestions from Tres Seaver and Michel Pelletier, I am going to begin posting more detailed information here concerning my experiences with threads hanging. Hopefully, this will eventually preserve a solution to which the community can refer in the future when this is forgotten on the mailing list.

The first thing I'd like to do is give some further ideas on how to use DebugLogger to provide the list with neccessary information for determining the source of these hangings.

Case History

Tres has done a great job of summarising my hanging problems in TwistingInTheWindProblems, so I'll present some of the details here.

2000-04-14 I've been attempting to determine, from the calling patterns of PCGIServer? which I've logged, where this hanging occurs. So far, I've caught calls from send_response() that fail to return. At least once, the hanging has occurred when a call to handle() failed to return. Before I added some additional logging, I caught three hangings that occurred inside send_response(), in either HTTPRequest? or in handle(). At these times, ZServer? was running with 10 threads.

After adding additional logging, it has proved rather difficult to hang anything... perhaps this is an indication of a timing problem which is masked by the latency involved in logging?

2000-04-15 Our live Zope server ended up completely hung. The following is the first entry from the logs:

    request 141805568 did not end
    I 141805568 2000-04-15T01:13:48 found_terminator: using StringIO; size=1730
    B 141805568 2000-04-15T01:13:48 GET /Sunesi/health/interactive/list_forum_mid
    I 141805568 2000-04-15T01:13:48 0
    X 141805568 2000-04-15T01:13:48 send_response: before PCGIResponse
    X 141805568 2000-04-15T01:13:48 send_response: before HTTPRequest
    X 141805568 2000-04-15T01:13:48 send_response: before handle 

Note that I have added additional logging instructions to found_terminator. Only one line of debugging should be emitted by found_terminator. It looks like something weird is going on here, as found_terminator seems to be called twice (two inputs). Also, the first input occurs before the begin. Could it be that two requests are trying to use the same thread? This same scenario takes place on numerous other unfinished requests. Here's another example:

    request 145377824 did not end
    I 145377824 2000-04-15T07:21:53 found_terminator: using StringIO; size=2057
    B 145377824 2000-04-15T07:21:53 POST /ContentInput/Work/Generate/ConfirmSubmitContent
    I 145377824 2000-04-15T07:21:53 found_terminator: using StringIO; size=7300
    I 145377824 2000-04-15T07:21:53 0
    X 145377824 2000-04-15T07:21:53 send_response: before PCGIResponse
    X 145377824 2000-04-15T07:21:53 send_response: before HTTPRequest
    X 145377824 2000-04-15T07:21:53 send_response: before handle 

Note the two different input sizes in this request. Note also that there is no further logging after the call to handle().

Another strange factor in these logs is that quite a few requests were handled completely during this timespan, while others failed to complete. Eventually, no further access to the site was possible.

Tres Seaver suggested that there were a few commonalities that needed to be examined.

  • IE 5 seems to be involved

This doesn't appear to be the case. From apache's logs of frameset pages which caused hanging:

    * Mozilla/4.0 (compatible; MSIE 5.0; AOL 5.0; Windows 98; DigExt)
    * Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)
    * Mozilla/4.7 [en] (Win98; I)

  • The management interface seems to trigger the problem

This is sometimes, but not neccessarily, the case. No management activity was involved between the time Zope was last restarted, and our latest hangings.

  • Framesets may be involved

This certainly seems to be valid. Although other requests have also failed to complete, it seems it's always a frameset page that sets this off.

  • Premature request cancellation

Quite possibly.