You are not logged in Log in Join
You are here: Home » Members » jim » ZODB » ReverseDiffVersioning

Log in
Name

Password

 
 

History for ReverseDiffVersioning

??changed:
-
Problem Space

  The current ZODB stores a full copy of every change, making it
  inappropriate for high-write situations, particularly where only
  small portions of objects change frequently.


An Approach

  One obvious approach is to use only differences, where the difference
  between the "original" version and the current one is what is stored.
  The problem is that this requires building the current page from diffs
  each display, and Zope is read-heavy.

  A better approach would be to take the new version, and have the old
  version become the difference between itself and the new version.
  This way, undo or reversing transactions becomes more expensive, and
  perhaps a commit (adding something new) becomes slightly more expensive,
  but reads shouldn't cost anything more than they currently do.

  -- "mindlace":http://www.zope.org/Members/mindlace 

Jim -- I think that this is a very cool idea. It would be too hard
  to implement in FileStorage (althought maybe it could be a variation
  during packing). It would be really attractive in a relational storage.

Michael Strasser -- Can it be extended to allow *redo* as well? It seems to me that 
  Zope could really do with a redo facility. Instead of the complete
  version being the latest, it is one signified somehow as the current
  version that can be undone by going backwards or redone by going
  forwards. The versions in the DB going in both directions are diffs from
  the versions nearer the current version.

  Now I'm thinking that the undo-redo list should not actually be linear
  but be some kind of "map" with a "strand" for each object that is changed.
  Where one object refers to another, a link could be recorded between
  one version of an object and the version of the the other to which it
  refers. Am I getting too far off the track?

~runyaga -- this is key functionality to what I'm using ZOPE for - a Offline Content Management System.  the idea of 'Versions' (previously edited copies) of a object and being able to 'undo' or 'redo' to a particular version is very important.  the beautiful part of this would be if ZOPE could undo/redo at a object instance - it is ahead of one of the main CMS, we use at work - "Interwoven":http://www.interwoven.com (very expensive CMS) can only undo/redo at a 'branch' level.  Tres and I were talking about undo/redo but how does this play w/ objects that are apart of tradtional ZOPE versions?  I dont see 'diffs' being that important right now, compared to the undo/redo functionality.  "runyaga":http://www.zope.org/Members/runyaga 

Need pickle diffs

  We need a way of diffing pickles to make this work. Any volunteers?

  Are there any known generic binary differs?

  lalo -- there is "xdelta":http://www.cs.berkeley.edu/~jmacd/xdelta.html, approaching 2.0. A Python
  interface to libxdelta and libxdfs is in the works (by me, actually). The library includes xdfs, a RCS-like
  version database (ok, that was the short explanation, not a good one actually), which already stores
  versions as reverse diffs; redo (and perhaps Zope Versions) could be emulated or perhaps even
  replaced by branches, which it also handles beautifully. Look for
  "Joshua's paper":http://www.cs.berkeley.edu/~jmacd/xdfs.ps.gz (xdfs.ps.gz) for details.
  (/me dreams about a XDFSStorage again)

  Barring a generic binary differ, I think it might work along these lines:

     - Split a pickles into sequences of operations. Pickles are basically
       series of op codes with arguments.

     - Use Tim Peters' sequence differ (Tools/scripts/ndiff.py) to
       diff the operations sequences.

       I've never used ndiff, I don't know how well it works. I've
       been meaning to try it form some time.

  Guido -- If you use *text* pickles instead of binary pickles, I
    think you don't even need to use ndiff.  A regular line-based diff
    utility will do just fine, because the text pickle format consists
    of many short lines.  The pickle format includes newlines at the
    end of almost every object.  The only problem would be that under
    certain circumstances, the 'tags' for object references may be
    renumbered, and that would cause major changes throughout the
    pickle.  But it's worth a try!  Another downside is that if
    objects are low on structure and their content is a single long
    string, most changes will realistically change that single string,
    which is a single line (in text pickle format), and then you'd
    need ndiff instead.  The proper solution in any case would be to
    ask Tim to craft a diffing algorithm specifically for binary data
    -- he loves a challenge like that!


<hr solid id=comments_below>


jcea (Aug 1, 2001 8:33 am; Comment #1)  --
 Python 2.1 (mandatory for current Zope version, anyway) includes a standard "diff" module. Try http://python.planetmirror.com/doc/current/lib/module-difflib.html.
 
 This implementation is, probably, slow.
 
 I'd vote for xdelta, too...