You are not logged in Log in Join
You are here: Home » Members » anthony » Choosing to store data in SQL vs ZODB

Log in
Name

Password

 

Choosing to store data in SQL vs ZODB

A reasonably common question that comes up on the Zope lists is "when should I store data in an SQL database instead of Zope?" This how-to lists some of the things to think about when making this decision.

This document should not take the place of your proper design process, but will hopefully give you a few other things to consider.

feedback, suggestions, to anthony@interlink.com.au

Will the data be frequently changing?

If the data is going to be frequently changing, then it's not a good idea to put it in the ZODB. Every time the data changes, a new record will be appended to the Data.fs file - this could become a problem very quickly.

Phillip Eby noted that this is actually a limitation of the standard Zope FileStorage implementation of a backend data store. Other packages, such as Ty Sarna's BerkeleyStorage do not have this limitation (although they don't provide the Undo and Version support, as a consequence. (but then, neither does an SQL database))

Does the data need to be accessed by other tools?

Do you have a requirement that other tools need to be able to access the data (e.g. various reporting tools). Unless you're willing to rewrite these tools to all work off the ZODB, again, SQL will be the way to go.

How large are the objects?

Largish blobs of data might be better in the ZODB than in an SQL database. Depending on the database, you may find that the size limitations of objects that you can insert will constrain you - for example, my reading of the PostgreSQL docs is that nothing larger than 32k can be inserted (although this is being addressed)

How does your data fit into an SQL database?

Tres Seaver writes:
Another point to consider is how well the data "fits" the storage model. As an example, in a former life I built a system storing "standard operating procedures" into an RDBMS. Each procedure had one or more steps, each of which might be recursively decomposed into substeps -- they also might reference diagrams, illustrations, and multimedia content.
Although I managed to make this work in SQL, the impedance mismatch was very high, compared with doing it in a hierarchical database like ZODB. RDBMS's excel at allowing "slice and dice" queries, at the expense of requiring expensive joins to implement parent-child links; hierarchical DB's optimize the parent-child relationships, making "cross-family" queries difficult or even infeasible.

This brings up a good point - if you are actually storing objects, with subobjects, methods, attributes, and the like, you may find that the work to get the objects into the database in a useful form is high, for not a lot of gain. You might also want to look at some sort of hybrid solution, where some of the attributes or data goes in the DB while other data goes into the ZODB.

Do you need full text indexing of the data?

Most RDBMSs do not provide any way to full text index data that is stored in them - if this is a requirement for you, consider using ZODB objects together with ZCatalog to allow this. (noted by Butch Landingin).

Some more notes on indexing and the like, loosely structured...
ZCatalogs are really quite powerful little things - you can do a lot with them, including many things where your first thought would be to use an SQL database. Just be aware of some of the limitations. The primary one (for me) is the lack of a proper query language. Sometimes expressing what you want to do in a call to ZCatalog.searchResults() can be a little tricky. ZCatalogs also lack a lot of the bells and whistles you might be used to in a traditional RDBMS, such as referential integrity, decent schema management, and the like.