home contents changes options help subscribe edit (external edit)

The various flavors of BTrees (including Buckets, Sets and TreeSets) are ubiquitous in Zope, and in many other ZODB applications. ZODB's BTrees are a form of what's commonly called B+ trees in the literature: all the data (keys and values) lives in leaf nodes (Buckets or Sets), and interior nodes layered on top of the leaf nodes form an overall tree structure. The interior nodes contain references to a small subset of the keys, and serve to guide searches to the correct leaf node efficiently.

BTrees support efficient computation in several ways. The way explained here is their support for automatic conflict resolution: if multiple transactions modify a BTree at the same time, it's possible that the sets of individual interior and leaf nodes they change don't intersect. Then there's no conflict, and all transactions can commit successfully.

However, even if the sets of modified nodes do intersect, it's still possible that the conflicts can be resolved "by magic". While any fixed set of conflict resolution rules will be incorrect for some app, the rules used by BTree conflict resolution are pretty conservative, and should be correct for most apps (if they're not correct for your app, don't use ZODB BTrees!).

Here are the rules. Under the covers, Buckets and Sets are very similar (Buckets have a value associated with each key, and Sets do not), so only Buckets are mentioned below.

If two transactions modify the same interior BTree node, that's always a write conflict (note that because these are B+ trees, interior nodes can change only when leaf Buckets split or vanish).

If a transaction splits or empties a Bucket, and another transaction makes any modification to that Bucket, that's also always a write conflict.

Else we're left with two transactions modifying the same Bucket without splitting or emptying it. In the following, s1 is the Bucket state at the time the current transaction began, s2 is the currently committed Bucket state, and s3 is the Bucket state the current transaction is trying to commit. It's easier to explain what the code allows than what it forbids; any input condition not allowed raises a conflict error:

  • Leaving things alone: it's OK if both s2 and s3 leave a piece of s1 alone (don't delete the key, and don't change the value).
  • Key deletion: a transaction (s2 or s3) can delete a key (from s1), but only if the other transaction (of s2 and s3) doesn't delete the same key. However, it's not OK for s2 and s3 to, between them, end up deleting all the keys. This is a higher-level constraint, due to that the caller of bucket_merge() doesn't have enough info to unlink the resulting empty Bucket from its BTree correctly. It's also not OK if s2 or s3 are empty, because the transaction that emptied the bucket unlinked the bucket from the tree, and nothing we do here can get it linked back in again.
  • Key insertion: s2 or s3 can add a new key, provided the other transaction doesn't insert the same key. It's not OK even if they insert the same (key, value) pair.
  • Mapping value modification: s2 or s3 can modify the value associated with a key in s1, provided the other transaction doesn't make a modification of the same key to a different value. It's OK if s2 and s3 both give the same new value to the key (this doesn't seem consistent with that it's not OK for both to add a new key mapping to the same value, but that is how it works).

In all the above, "same" and "different" are determined by the result of calling Python's cmp() on two objects. If the result is 0, the objects are "the same", else they're different.



subject:
  ( 11 subscribers )