History for ReverseDiffVersioning

??changed:
-
Problem Space

The current ZODB stores a full copy of every change, making it
inappropriate for high-write situations, particularly where only
small portions of objects change frequently.

An Approach

One obvious approach is to use only differences, where the difference
between the "original" version and the current one is what is stored.
The problem is that this requires building the current page from diffs
each display, and Zope is read-heavy.

A better approach would be to take the new version, and have the old
version become the difference between itself and the new version.
This way, undo or reversing transactions becomes more expensive, and
perhaps a commit (adding something new) becomes slightly more expensive,
but reads shouldn't cost anything more than they currently do.

-- "mindlace":http://www.zope.org/Members/mindlace

Jim -- I think that this is a very cool idea. It would be too hard
to implement in FileStorage (althought maybe it could be a variation
during packing). It would be really attractive in a relational storage.

Michael Strasser -- Can it be extended to allow *redo* as well? It seems to me that
Zope could really do with a redo facility. Instead of the complete
version being the latest, it is one signified somehow as the current
version that can be undone by going backwards or redone by going
forwards. The versions in the DB going in both directions are diffs from
the versions nearer the current version.

Now I'm thinking that the undo-redo list should not actually be linear
but be some kind of "map" with a "strand" for each object that is changed.
Where one object refers to another, a link could be recorded between
one version of an object and the version of the the other to which it
refers. Am I getting too far off the track?

~runyaga -- this is key functionality to what I'm using ZOPE for - a Offline Content Management System. the idea of 'Versions' (previously edited copies) of a object and being able to 'undo' or 'redo' to a particular version is very important. the beautiful part of this would be if ZOPE could undo/redo at a object instance - it is ahead of one of the main CMS, we use at work - "Interwoven":http://www.interwoven.com (very expensive CMS) can only undo/redo at a 'branch' level. Tres and I were talking about undo/redo but how does this play w/ objects that are apart of tradtional ZOPE versions? I dont see 'diffs' being that important right now, compared to the undo/redo functionality. "runyaga":http://www.zope.org/Members/runyaga

Need pickle diffs

We need a way of diffing pickles to make this work. Any volunteers?

Are there any known generic binary differs?

lalo -- there is "xdelta":http://www.cs.berkeley.edu/~jmacd/xdelta.html, approaching 2.0. A Python
interface to libxdelta and libxdfs is in the works (by me, actually). The library includes xdfs, a RCS-like
version database (ok, that was the short explanation, not a good one actually), which already stores
versions as reverse diffs; redo (and perhaps Zope Versions) could be emulated or perhaps even
replaced by branches, which it also handles beautifully. Look for
"Joshua's paper":http://www.cs.berkeley.edu/~jmacd/xdfs.ps.gz (xdfs.ps.gz) for details.
(/me dreams about a XDFSStorage again)

Barring a generic binary differ, I think it might work along these lines:

- Split a pickles into sequences of operations. Pickles are basically
series of op codes with arguments.

- Use Tim Peters' sequence differ (Tools/scripts/ndiff.py) to
diff the operations sequences.

I've never used ndiff, I don't know how well it works. I've
been meaning to try it form some time.

Guido -- If you use *text* pickles instead of binary pickles, I
think you don't even need to use ndiff. A regular line-based diff
utility will do just fine, because the text pickle format consists
of many short lines. The pickle format includes newlines at the
end of almost every object. The only problem would be that under
certain circumstances, the 'tags' for object references may be
renumbered, and that would cause major changes throughout the
pickle. But it's worth a try! Another downside is that if
objects are low on structure and their content is a single long
string, most changes will realistically change that single string,
which is a single line (in text pickle format), and then you'd
need ndiff instead. The proper solution in any case would be to
ask Tim to craft a diffing algorithm specifically for binary data
-- he loves a challenge like that!

jcea (Aug 1, 2001 8:33 am; Comment #1) --
Python 2.1 (mandatory for current Zope version, anyway) includes a standard "diff" module. Try http://python.planetmirror.com/doc/current/lib/module-difflib.html.

This implementation is, probably, slow.

I'd vote for xdelta, too...