Motivation
Currently, a ZODB cache can only be limited by the number of objects it contains. As the size of objects can vary unboundedly, this gives very imprecise control over the main memory used for the cache. Either we are conservative assuming worst case object size at the cost of bad resource utilization in the typical case or we are optimistic assuming small object size at the risk to overwhelm our memory resources.
Being able to limit the cache size based on the really used memory would give us much better control over memory resource utilization.
Unfortunately, it is not easy to determine the true memory size for Python objects. Therefore, we use the pickle size as a (very) rough estimation for the memory size. This estimation is often very optimistic but already much better than to count each object as 1.
Currently, the cache and persistent implementations are
tightly coupled together: each one uses internal implementations
details of the other. In order to be able to chose between
different cache implementations (the current one, one
based on memory size estimations, one which collects garbage
not only during ingrgc but as soon as the limits are exceeded, ...)
the proposal gives the cache an abstract API used by
the persistent implementation. Any cache implementation implementing
the API can be used together with persistent.
Solution Outline
The cache is described with respect to persistent
by a generic cache object header and
the following (C-level) API functions accessible via the header:
void (*access)(struct ccobject_head_struct *, struct cPersistentObject*);
int (*is_ghost)(struct ccobject_head_struct *, struct cPersistentObject*);
void (*ghostify)(struct ccobject_head_struct *, struct cPersistentObject*);
void (*unghostify)(struct ccobject_head_struct *, struct cPersistentObject*);
void (*unreferenced)(struct ccobject_head_struct *, struct cPersistentObject*);
access reports an object access to the cache,
is_ghost checks whether the object is a ghost,
ghostify ghostified the object, unghostify unghostifies it,
and unreferenced informs the cache about the deletion of
the last reference to the object.
Towards Python applications, the cache gets a new method
updateObjectSizeEstimation(obj, new_estimate). It is used to inform the
cache about a new size estimation for the object. This method
is called by "ZODB.Connection.Connection"
when the object is unghostified or stored. In both cases,
the pickle size is passed as the size estimation.
updateObjectSizeEstimation allows the cache to keep track
of the total size of its objects and avoids to compute this
size dynamically (during garbage collection).
The cache stores the size estimate in a read only persistent
object attribute _p_estimated_size. Persistent objects
get larger by 4 bytes.
ZODB.Connection.Connection.__init__ gets a new parameter
cache_factory used to create the connection's
cache. Its default is PickleCache to get the current behaviour.
We implement also a factory that creates a memory size limited cache.
Risks
- It is not yet clear how to optimally construct and configure interesting cache factories.
Deliverables
- A simple cache implementation in Python -- mainly for testing purposes
PickleCacheandMemoryLimitedPickleCacheimplementations implementing the current and the above mentioned memory size based cache replacement policy, respectively.- Unit tests
- Documentation update