You are not logged in Log in Join
You are here: Home » Members » jim » ZODB » ReverseDiffVersioning » wikipage_view

Log in
Name

Password

 
 
FrontPage »

ReverseDiffVersioning

Problem Space

The current ZODB stores a full copy of every change, making it inappropriate for high-write situations, particularly where only small portions of objects change frequently.

An Approach

One obvious approach is to use only differences, where the difference between the "original" version and the current one is what is stored. The problem is that this requires building the current page from diffs each display, and Zope is read-heavy.

A better approach would be to take the new version, and have the old version become the difference between itself and the new version. This way, undo or reversing transactions becomes more expensive, and perhaps a commit (adding something new) becomes slightly more expensive, but reads shouldn't cost anything more than they currently do.

-- mindlace

Jim
I think that this is a very cool idea. It would be too hard to implement in FileStorage (althought maybe it could be a variation during packing). It would be really attractive in a relational storage.
Michael Strasser
Can it be extended to allow redo as well? It seems to me that Zope could really do with a redo facility. Instead of the complete version being the latest, it is one signified somehow as the current version that can be undone by going backwards or redone by going forwards. The versions in the DB going in both directions are diffs from the versions nearer the current version.

Now I'm thinking that the undo-redo list should not actually be linear but be some kind of "map" with a "strand" for each object that is changed. Where one object refers to another, a link could be recorded between one version of an object and the version of the the other to which it refers. Am I getting too far off the track?

~runyaga
this is key functionality to what I'm using ZOPE for - a Offline Content Management System. the idea of Versions (previously edited copies) of a object and being able to undo or redo to a particular version is very important. the beautiful part of this would be if ZOPE could undo/redo at a object instance - it is ahead of one of the main CMS, we use at work - Interwoven (very expensive CMS) can only undo/redo at a branch level. Tres and I were talking about undo/redo but how does this play w/ objects that are apart of tradtional ZOPE versions? I dont see diffs being that important right now, compared to the undo/redo functionality. runyaga

Need pickle diffs

We need a way of diffing pickles to make this work. Any volunteers?

Are there any known generic binary differs?

lalo
there is xdelta, approaching 2.0. A Python interface to libxdelta and libxdfs is in the works (by me, actually). The library includes xdfs, a RCS-like version database (ok, that was the short explanation, not a good one actually), which already stores versions as reverse diffs; redo (and perhaps Zope Versions) could be emulated or perhaps even replaced by branches, which it also handles beautifully. Look for Joshua's paper (xdfs.ps.gz) for details. (/me dreams about a XDFSStorage again)

Barring a generic binary differ, I think it might work along these lines:

  • Split a pickles into sequences of operations. Pickles are basically series of op codes with arguments.
  • Use Tim Peters' sequence differ (Tools/scripts/ndiff.py) to diff the operations sequences.

    I've never used ndiff, I don't know how well it works. I've been meaning to try it form some time.

Guido
If you use text pickles instead of binary pickles, I think you don't even need to use ndiff. A regular line-based diff utility will do just fine, because the text pickle format consists of many short lines. The pickle format includes newlines at the end of almost every object. The only problem would be that under certain circumstances, the tags for object references may be renumbered, and that would cause major changes throughout the pickle. But it's worth a try! Another downside is that if objects are low on structure and their content is a single long string, most changes will realistically change that single string, which is a single line (in text pickle format), and then you'd need ndiff instead. The proper solution in any case would be to ask Tim to craft a diffing algorithm specifically for binary data -- he loves a challenge like that!


jcea (Aug 1, 2001 8:33 am; Comment #1)
Python 2.1 (mandatory for current Zope version, anyway) includes a standard "diff" module. Try http://python.planetmirror.com/doc/current/lib/module-difflib.html.

This implementation is, probably, slow.

I'd vote for xdelta, too...