ReverseDiffVersioning
Problem Space
The current ZODB stores a full copy of every change, making it inappropriate for high-write situations, particularly where only small portions of objects change frequently.
An Approach
One obvious approach is to use only differences, where the difference between the "original" version and the current one is what is stored. The problem is that this requires building the current page from diffs each display, and Zope is read-heavy.
A better approach would be to take the new version, and have the old version become the difference between itself and the new version. This way, undo or reversing transactions becomes more expensive, and perhaps a commit (adding something new) becomes slightly more expensive, but reads shouldn't cost anything more than they currently do.
-- mindlace
- Jim
- I think that this is a very cool idea. It would be too hard to implement in FileStorage (althought maybe it could be a variation during packing). It would be really attractive in a relational storage.
- Michael Strasser
- Can it be extended to allow redo as well? It seems to me that
Zope could really do with a redo facility. Instead of the complete
version being the latest, it is one signified somehow as the current
version that can be undone by going backwards or redone by going
forwards. The versions in the DB going in both directions are diffs from
the versions nearer the current version.
Now I'm thinking that the undo-redo list should not actually be linear but be some kind of "map" with a "strand" for each object that is changed. Where one object refers to another, a link could be recorded between one version of an object and the version of the the other to which it refers. Am I getting too far off the track?
- ~runyaga
- this is key functionality to what I'm using ZOPE for - a Offline Content Management System. the idea of
Versions
(previously edited copies) of a object and being able toundo
orredo
to a particular version is very important. the beautiful part of this would be if ZOPE could undo/redo at a object instance - it is ahead of one of the main CMS, we use at work - Interwoven (very expensive CMS) can only undo/redo at abranch
level. Tres and I were talking about undo/redo but how does this play w/ objects that are apart of tradtional ZOPE versions? I dont seediffs
being that important right now, compared to the undo/redo functionality. runyaga
Need pickle diffs
We need a way of diffing pickles to make this work. Any volunteers?
Are there any known generic binary differs?
- lalo
- there is xdelta, approaching 2.0. A Python interface to libxdelta and libxdfs is in the works (by me, actually). The library includes xdfs, a RCS-like version database (ok, that was the short explanation, not a good one actually), which already stores versions as reverse diffs; redo (and perhaps Zope Versions) could be emulated or perhaps even replaced by branches, which it also handles beautifully. Look for Joshua's paper (xdfs.ps.gz) for details. (/me dreams about a XDFSStorage again)
Barring a generic binary differ, I think it might work along these lines:
- Split a pickles into sequences of operations. Pickles are basically series of op codes with arguments.
- Use Tim Peters' sequence differ (Tools/scripts/ndiff.py) to
diff the operations sequences.
I've never used ndiff, I don't know how well it works. I've been meaning to try it form some time.
- Guido
- If you use text pickles instead of binary pickles, I
think you don't even need to use ndiff. A regular line-based diff
utility will do just fine, because the text pickle format consists
of many short lines. The pickle format includes newlines at the
end of almost every object. The only problem would be that under
certain circumstances, the
tags
for object references may be renumbered, and that would cause major changes throughout the pickle. But it's worth a try! Another downside is that if objects are low on structure and their content is a single long string, most changes will realistically change that single string, which is a single line (in text pickle format), and then you'd need ndiff instead. The proper solution in any case would be to ask Tim to craft a diffing algorithm specifically for binary data -- he loves a challenge like that!
- jcea (Aug 1, 2001 8:33 am; Comment #1)
- Python 2.1 (mandatory for current Zope version, anyway) includes a standard "diff" module. Try http://python.planetmirror.com/doc/current/lib/module-difflib.html.
This implementation is, probably, slow.
I'd vote for xdelta, too...