You are not logged in Log in Join
You are here: Home » Members » Toby Dickenson » benchmarks » Storage Benchmarks » View Document

Log in
Name

Password

 

Storage Benchmarks

Storage Benchmarks

Test Description

This test involves several application-specific benchmarka against four different storage implementations to compare performance, memory usage and disk usage. The four storages being tested are:

  1. FileStorage, the conventional implementation
  2. FileStorage, the fsIndex branch designed to reduce memory usage
  3. BerkeleyStorage Full
  4. BerkeleyStorage Packless

The tests use databases that are as large as possible given the time constraints, however they are still not 'very large'.

These tests were performed by Toby Dickenson, [email protected], between 13th and 17 December 2001. The permanent URL for this document is http://www.zope.org/Members/htrd/benchmarks/storages.

Conclusions

If you want to skip ahead, the conclusions are down here.

Availability

This test data, test scripts, and test application are derived from a closed-source product, and are not publically available.

Test Equipment

  1. Redhat 7.1 Linux, on an Athlon 1700 with 512MB memory.
  2. Storage data on a ATA100 IDE disk, formatted with the reiser filesystem.
  3. Python 2.1
  4. Zope 2.4.1 (with many custom patches)
  5. BerkeleyStorage 1.0beta5, db 3.3.11, pybsddb 3.3.2
  6. The storage was hosted in a ZEO server (1.0b3) so that I can isloate storage memory usage from memory used by Zope and the application logic. Note that this test machine is well endowed for memory, therefore I record memory usage as the size of the ZEO process virtual address space (VSZ). This may not be representative of the amount of memory needed before performance is affected, but I believe it is close enough.
  7. The ZODB cache was configuration paramaters are: 60 seconds, 1000 objects.
  8. Zope was started with one worker thread. All test scripts are also single-threaded.

BerkeleyStorage Configuration

I didnt spend too much effort optimising BerkeleyStorage for this test. DB_CONFIG was:

set_lk_max_locks 2000
set_lk_max_objects 2000
set_lk_max_lockers 10
set_cachesize 0 1048576 0
set_lg_bsize 262144

The 1M cachesize gives a cache hit rate of 98% during the bmadd script, compared to 80% with the default 256k cache. This was a significant performance improvement (I didnt measure how much).

The 256k log buffer is something the BerkeleyDB documentation says should improve throughput, since it reduces the 'writes due to overflow' listed by db_stat -l to about 2% of the total writes. In practice I couldnt measure a difference.

It looked like the default number of locks and lock objects would have been enough for this test (due to having no concurrency?), however I doubled the defaults 'just in case'.

During the test db_checkpoint -v -p 5 was running to checkpoint the logfile every 5 minutes. An additional db_checkpoint -1 was run manually shortly before the end of the two long test scripts to ensure that the checkpointing cost was included in the elapsed time measurement.

Test Scripts

I had originally planned to test using two scripts to exercise the storages, but due to time constraints the second test has not yet been performed.

The first is write-heavy, the second read-heavy. In both scripts application logic is expected to use much more processor time than the storage.

bmadd.py

This script transfers roughly 1000 documents into the ZODB using http, where it is indexed (ZCatalog-style indexing and an application-specific indexing process). This corresponds to roughly 18000 ZODB objects.

calc.py

This script traverses all of the documents added by mbadd.py, performing several memory-intensive calculations on each document. Note that this test has not yet been performed.

Test Procedure

  1. Restore the nearly empty 'preadd' database.

  2. If this test uses a FileStorage:
    Delete any index file, start ZEO to create a new index, then stop ZEO to leave behind a clean index file of the correct type.
  3. Delete any ZEO client cache. Start ZEO and Zope.
  4. Run the bmadd.py script. When the first indexing operation is complete (that is, the first one out of roughly 1000), measure the VSZ of the ZEO process. At this point any delayed initialization should have ocurred.
  5. Measure the elapsed time to perform bmadd.py.
  6. Record the VSZ of ZEO process soon after that script terminates. (and check occasionally during the run that this is indeed the maximum)
  7. Record the size of the filestorage file, or size of the Berkeley database files (exclude log files).
  8. Restart both ZEO and Zope processes. Record the ZEO VSZ.
  9. Pack the storage. Record the largest VSZ during the pack (it changes quickly, so I may not always catch the highest peak)
  10. Repeat the file size measurements.
  11. Note: The remainder of this test has not yet been performed.
  12. Restart both ZEO and Zope processes. Record the ZEO VSZ.
  13. Run the calc.py script, and measure the elapsed time.
  14. Repeat the VSZ measurement.

That procedure was repeated for the three storage implementations listed above.

Results

Conventional
FileStorage
fsIndex
FileStorage
Berkeley
Full
Berkeley
Packless
(2a) (2b)
ZEO VSZ before bmadd 6480k 6448k 8336k 8764k
Time for bmadd 2979s 2990s 2999s 3064s (3)
ZEO VSZ after bmadd 9096k 6516k (1) 8364k 13096k
ZEO VSZ growth during bmadd 2616k 68k 28k 4332k (4)
Disk space 132M 188M 134M
ZEO VSZ after restarting 7272k 7452k (2) 8336k 8604x 7536k 6420k
ZEO VSZ peak during packing 12876k 9560k 14468 11316k 13028k 9210k
ZEO VSZ growth during pack 5604k 2180k 6132k 2106k 5492k 2790k
Disk space after packing 111M 188M 135k

1. Toward the end of bmadd.py, the virtual memory size of the fsIndex FileStorage process exhibited short-lived peaks above its initial value of 6448k, but always returned to that original value. I guess those tiny BTree nodes were just on the point of filling up the fragments of free memory left after initialisation.

2. Why is the VSZ of the ZEO process so much larger after restarting? Suspecting that there may be a memory leak or similar problem with de-persisting the BTree index, I repeated the pack test after deleting the index file. These results are in column 2a for the conventional FileStorage, and 2b for the fsIndex. The post-packing numbers suggest that memory was not leaked, just fragmented.

3. During this test the machine was running some other low-load processes, which may account for the longer run time.

4. Looking at the source, I see no good reason why Packless should use increasing amounts of memory as this test proceeds. Is there a memory leak?

Conclusions

  1. There was less than 1% difference in elapsed time when using the different storages. The extra overhead of BerkeleyDB appears to be negligible.
  2. The packing test caused BerkeleyStorage process to grow by 6M for Full, and 2M for Packless. This growth is even larger than for FileStorage. This does not live up to BerkeleyStorage's high-scalability reputation.
  3. In this test the fsIndex branch of FileStorage reduces the memory growth during the test by an impressive factor of 38. During packing the factor is a less impressive 2.5. fsIndex FileStorage uses the least memory when packing of all FileStorages.