What's new in ZODB 4.0 alpha 2?

Release data: 20-Jun-2003

General

The code was synchronized with ZODB3 3.2. This includes a variety of bug fixes and new features.

Fixed bug in setup.py that prevented the system from installing.

Fixed several memory leaks involving database connections, persistent caches, and BTree objects.

Database

Invalidations are now processed atomically. Each transaction will see all the changes caused by an earlier transaction or none of them. Before this patch, it was possible for a transaction to see invalid data because it saw only a subset of the invalidations. This is the most likely cause of reported BTrees corruption, where keys were stored in the wrong bucket. When a BTree bucket splits, the bucket and the bucket's parent are both modified. If a transaction sees the invalidation for the bucket but not the parent, the BTree in memory will be internally inconsistent and keys can be put in the wrong bucket.

Fixed a bug in conflict resolution that failed to ghostify an object if it was involved in a conflict. (This code may be redundant, but it has been fixed regardless.)

The database and storages can now be configured using ZConfig. The schemas, described in component.xml files, are in zodb and zodb.storages. The zodb.config module can be used to parse the config files.

Storages

Added three new storages, 2 memory storages and DemoStorage.

The FileStorage index uses fsBuckets instead of fsBTrees. This should reduce the amount of memory used by an index a little bit.

The FileStorage module was changed to a package that includes several modules. The basic client interface is the same, but it is easier to navigate the code when everything isn't in one big file.

FileStorage has a new pack() implementation that fixes several reported problems that could lead to data loss.

Fixed bug in pack for Berkeley storage.

The user and description metadata attributes of a transaction now support Unicode. They are stored as UTF-8 encoded strings.

Storage API added getSerial() to support ZEO cache verification.

Storage API extended to include lastObjectId().

Storage API pack() changed to have optional gc boolean argument.

Fix storages so that Unicode strings can be passed as the user and description arguments to tpcBegin().

ZEO

ZEO now supports authenticated client connections. The default authentication protocol uses a hash-based challenge-response protocol to prove identity and establish a session key for message authentication. The architecture is pluggable to allow third-parties to developer better authentication protocols.

Fixed critical race conditions in ZEO's cache consistency code that could cause invalidations to be lost or stale data to be written to the cache. These bugs can lead to data loss or data corruption. These bugs are relatively unlikely to be provoked in sites with few conflicts, but the possibility of failure existed any time an object was loaded and stored concurrently.

The ZEO server was fixed so that it does not perform any I/O until all of a transactions' invalidations are queued. If it performs I/O in the middle of sending invalidations, it would be possible to overlap a load from a client with the invalidation being sent to it.

Much work was done to improve zdaemon's zdctl.py and zdrun.py scripts. (In the alpha 1 release, zdrun.py was called zdaemon.py, but installing it in <prefix>/bin caused much breakage due to the name conflict with the zdaemon package.) Together with the new mkzeoinst.py script, this makes controlling a ZEO server a breeze.

A ZEO client will not read from its cache during cache verification. This fix was necessary to prevent the client from reading inconsistent data.

The isReadOnly() method of a ZEO client was fixed to return the false when the client is connected to a read-only fallback server.

The sync() method of ClientStorage and the pending() method of a zrpc connection now do both input and output.

The short_repr() function used to generate log messages was fixed so that it does not blow up creating a repr of very long tuples.

Transaction

The signature of prepare() in transaction.interfaces.IDataManager changed. The manager should raise an exception in its prepare() method rather than returning a boolean to indicate failure. Rationale: The txn manager can't raise a reasonable exception, because it doesn't know what the data manager couldn't prepare.

BTrees

Trying to store an object of a non-integer type into an IIBTree or OIBTree could leave the bucket in a variety of insane states. For example, trying

b[obj] = "I'm a string, not an integer"

where b is an OIBTree. This manifested as a refcount leak in the test suite, but could have been much worse (most likely in real life is that a seemingly arbitrary existing key would "go missing").

When deleting the first child of a BTree node with more than one child, a reference to the second child leaked. This could cause the entire bucket chain to leak (not be collected as garbage despite not being referenced anymore).

Other minor BTree leak scenarios were also fixed.

Fixed garbage collection logic for BTrees and Buckets. If the object is a ghost, don't unghostify it or access any of the data. But if the object is not a ghost and is registered with the database, it's still useful to let Python's GC visit this object. The object may be involved in a collectible cycle.

Persistence

Refactor persistence api to use _p_changed only to mark an object as changed. Use _p_deactivate() to turn an object into a ghost, and use the keyword argument force=1 if you want to turn a modified object into a ghost. Several occurrences of the old interface have been updated.

This refactoring uncovered a number of subtle bugs in the persistence C API. The two chief problems were that the load function in the C API struct did not set the state and that the functions return 0 for error and 1 for success. Regardless of whether these APIs are doing the right thing, fix the code to use them correctly.

One downside of the new API is the C objects (BTrees) that override _p_deactivate() have to deal with all the cruft for keyword arguments. Since BTrees only add a single line of extra code to _p_deactivate(), it seems useful to provide a hook in the persistence

The _p_serial attribute of persistent objects is not stored in the C struct that defines the persistent object. Instead, it is stored in the regular state of the object, usually __dict__.

Fixed the last two bugs from Zope3-Dev Collector #86: you couldn't mix Persistent and a base class that used a Python metaclass under Python 2.2, and Persistent subclasses wouldn't have instance dictionaries if a parent class defined '__slots__', even though for normal Python classes, subclassing a class with '__slots__' produces a class with instance dictionaries unless the subclass also defines '__slots__'. Added test cases for both scenarios. Note that these bugs only existed under Python 2.2, where the custom C metaclass 'PersistentMetaClass' is used.

Fix object state change bugs reported in Zope3 collector #107. In the presence of a broken DM 'setstate()' or 'register()' method, persistent objects could end up in CHANGED state when they should have stayed in their original state. This is perhaps not a complete solution to the issue of what an object's state should be in the presence of a failure, but it prevents the two current "silent failure" conditions where an object thinks it's CHANGED, but is not actually registered with the DM. Added test cases to demonstrate the old behavior and ensure it doesn't come back.

Revised persistence.interfaces.ICache API. There is only a single implementation at the moment, but the new API explains more clearly what is explained of an implementation.

Changed the __getattribute__ hook in Persistent so that a ghost's state is not loaded to check for an __del__ attribute. This works around a bug in Python where the garbage collector looks for finalizers by invoking the __getattribute__ hook. Future versions of Python will not invoke the hook from the garbage collector.

Persistent modules

Fixed a bug where a newline was added to a module's source every time the module was updated.

The _p_oid and _p_jar attributes are ignored in _p_newstate() for classes.

Added support for the SimpleDescriptor missing value in MethodMixin.

In persistent_id(), ignore "empty" descriptors, where empty means the special descriptor returned by __get__() qwhen a class has no value associated with the descriptor.