home contents changes options help subscribe edit (external edit)

External Garbage Collection

Most ZODB storages support garbage collection (GC) as part of packing. GC can be problematic when multiple databases are used, because a storage doesn't have information about external references to objects in the storage. For this reason, GC will be optional in ZODB 3.9 for many storages, most notably FileStorage. Multi-database applications can be designed so that this is not a problem, but many multi-database applications would benefit from a multi-database GC mechanism.

At Zope Corporation, we have some databases that are too big to perform GC on. Garbage collection using the standard FileStorage packer takes too long and thrashes the disk. GC using the packer in zc.FileStorage uses too much memory. I believe that we can address this by separating GC from packing and implementing GC in a way that spreads GC over a longer period of time to reduce the impact on the server.

The new DemoStorage? can't perform GC during packing if it has a base storage because the necessary information is spread across both the changes and base storages. An external GC API could make this possible, although this probably isn't a high priority.

Proposal

I propose to add an external GC api for ZODB storages, to be implemented initially (in 3.9) for FileStorage, and ClientStorage? (and MappingStorage? and DemoStorage? if time permits):

class IExternalGC(IStorage):

   def deleteObject(oid, serial, transaction):
       """Mark an object as deleted

       This method marks an object as deleted via a new object
       revision.  Subsequent attempts to load current data for the
       object will fail with a POSKeyError, but loads for
       non-current data will suceed if there are previous
       non-delete records.  The object will be removed from the
       storage when all not-delete records are removed.

       The serial argument must match the most recently committed
       serial for the object. This is a seat belt.

       This method can only be called in the first phase of 2-phase
       commit.
       """

Sample (naive) multi-database garbage collection implementation

Assume we have a mapping {storage_name -> storage} and a function, get_references, that returns object references from a storage, where object references are either oids, or storage_name and oid pairs for cross-database references.

Here is a high-level sample multi-database pack implementation. It is optimized for simplicity and readability, not scalability or efficiency. The intent is to show that external GC is possible with the given APIs?. Actual implementations will need lots of optimizations to be scalable.

import transaction

def pack(storages):

    # Step 1: collect references
    refs = {}
    tids = {}
    for name, storage in storages.items():
        for trans in storage.iterator():
            for record in trans:
                oid = name, record.oid
                tids[oid] = record.tid
                orefs = refs.setdefault(oid, set())
                for roid in get_references(record.data):
                    if isinstance(roid, str):
                        roid = name, roid
                    orefs.add(roid)

    # Step 2: Do gc.  Traverse objects starting from roots.
    # As we reverse references, we remove from tids.
    # At end, the tids contain garnage to be removed.
    to_do = set((name, '\0'*8) for name in storages)
    while to_do:
        oid = to_do.pop()
        if oid in tids:
            _ = tids.pop(oid)
            for roid in refs[oid]:
                if roid in tids:
                    to_do.add(roid)

    # Step 3: Take out the trash
    txn = transaction.begin()
    for storage in storages.values:
        storage.tpc_begin(txn)
    for (name, oid), tid in tids.items():
        for oid in oids[name]:
            storages[name].deleteObject(oid, tid, txn)

    for storage in storages.values:
        storage.tpc_vote(txn)
    for storage in storages.values:
        storage.tpc_finish(txn)

It's reasonable to ask whether scalable implementations are possible with the given implementation. I think they are, as I've thought of some. :) I suggest, however, that we view the API as experimental until scalable efficient implementations are available. An implementation will be judged to be scalable and efficient enough if it is at least as scalable and efficient as current non-external gc implementations. Of course, this will depend to some degree on the storages considered. FileStorage is certainly an important case.



subject:
  ( 13 subscribers )