home contents changes options help subscribe edit (external edit)

Motivation

Currently, a ZODB cache can only be limited by the number of objects it contains. As the size of objects can vary unboundedly, this gives very imprecise control over the main memory used for the cache. Either we are conservative assuming worst case object size at the cost of bad resource utilization in the typical case or we are optimistic assuming small object size at the risk to overwhelm our memory resources.

Being able to limit the cache size based on the really used memory would give us much better control over memory resource utilization.

Unfortunately, it is not easy to determine the true memory size for Python objects. Therefore, we use the pickle size as a (very) rough estimation for the memory size. This estimation is often very optimistic but already much better than to count each object as 1.

Currently, the cache and persistent implementations are tightly coupled together: each one uses internal implementations details of the other. In order to be able to chose between different cache implementations (the current one, one based on memory size estimations, one which collects garbage not only during ingrgc but as soon as the limits are exceeded, ...) the proposal gives the cache an abstract API used by the persistent implementation. Any cache implementation implementing the API can be used together with persistent.

Solution Outline

The cache is described with respect to persistent by a generic cache object header and the following (C-level) API functions accessible via the header:

    void (*access)(struct ccobject_head_struct *, struct cPersistentObject*);
    int (*is_ghost)(struct ccobject_head_struct *, struct cPersistentObject*);
    void (*ghostify)(struct ccobject_head_struct *, struct cPersistentObject*);
    void (*unghostify)(struct ccobject_head_struct *, struct cPersistentObject*);
    void (*unreferenced)(struct ccobject_head_struct *, struct cPersistentObject*);

access reports an object access to the cache, is_ghost checks whether the object is a ghost, ghostify ghostified the object, unghostify unghostifies it, and unreferenced informs the cache about the deletion of the last reference to the object.

Towards Python applications, the cache gets a new method updateObjectSizeEstimation(obj, new_estimate). It is used to inform the cache about a new size estimation for the object. This method is called by "ZODB.Connection.Connection" when the object is unghostified or stored. In both cases, the pickle size is passed as the size estimation. updateObjectSizeEstimation allows the cache to keep track of the total size of its objects and avoids to compute this size dynamically (during garbage collection).

The cache stores the size estimate in a read only persistent object attribute _p_estimated_size. Persistent objects get larger by 4 bytes.

ZODB.Connection.Connection.__init__ gets a new parameter cache_factory used to create the connection's cache. Its default is PickleCache to get the current behaviour. We implement also a factory that creates a memory size limited cache.

Risks

  • It is not yet clear how to optimally construct and configure interesting cache factories.

Deliverables

  • A simple cache implementation in Python -- mainly for testing purposes
  • PickleCache and MemoryLimitedPickleCache implementations implementing the current and the above mentioned memory size based cache replacement policy, respectively.
  • Unit tests
  • Documentation update



subject:
  ( 13 subscribers )