Releases
This page is old. It was last updated before working began on ZODB 3.1. It's best viewed as a historical view of the future.
Zope
ZODB will be integrated with Zope. We need to be sure that we coordinate the Zope and ZODB development and release schedules.
StandaloneZODB?
We will also release ZODB plus select components for use in Python programs that don't use Zope.
Major Components
ZODB: persistence + transactions
Storages: FileStorage, bsddb3Storage, ReplicatedStorage, OracleStorage
ZEO: Client-server protocol (1.0)
Development Projects
This is a mostly unordered list of development projects. This document should be extended with some ballpark time estimates (perhaps just small, medium, larger), an assessment of which goals they address, and -- eventually -- assigned priorities.
Documentation
We need documentation for end users and to aid developers, who need to know how proposed changes affect user-visible APIs?. New features, like transactional undo and conflict resolution, are not well documented.
- End-user documentation
- Reference/API documentation
Instrumentation
To assess zodb scalability, we need to understand how important applications actually use zodb and produce application traces and benchmarks that can be used to guide development.
The first step is to provide the right instrumentation for ZODB and ZEO. We probably want to measure things like ZEO cache hit rate, size of objects committed, frequency of conflict errors, etc. The project ought to decide what things are important to measure and provide the code to measure them.
There are plans underway to develop some Zope benchmarks, which could be run with instrumentation turned on and then used to guide optimization.
Testing
We have a minimal test suite for storages. The development of the test suite uncovered many bugs. A good test suite also speeds development in new projects, by catching bugs quickly. A lot of the development work on a good test suite can proceed as part of individual project development, but we should devote some thought to the parts of ZODB that are currently untested.
Multiversion Concurrency Control
Improve ZEO infrastructure for handling invalidations to avoid conflict errors on reads when they do not lead to inconsistency. MVCC would exploit the fact that we keep multiple revisions to let read-only transactions operate on old revisions.
This change has implications for caching in ZODB and ZEO, because it must associate a timestamp/revision# with each object it stores. It also affects the storage API, because clients will need to ask for the most recent revision before time T.
Separate persistence from transactions
Make a cleaner separation between the various components of ZODB -- persistence, transactions. This project will probably make ZODB useful for more people and provides an opportunity to clean up the overall architecture.
ORMapping?
The chief goal here would be interoperation with other enterprise systems that use the RDBMS to store data. The current Connection object allows a database to use ZODB storages as a repository for persistent objects. The ORMapping? project would use a different connection that uses an RDBMS as a repository for persistent objects.
Shane Hathaway has an ORMappingDB, proposal in the fishbowl.
Query and index support
Allow automatic indexing of objects, a feature frequently requested by ZODB users. Perhaps the concrete form of the project would be to integrate the catalog more closely with the transaction mechanism.
There is a fair amount of research needed to figure out how to attack this problem and to see what previous work we can build on. It might be good, e.g., to build on OQL, which is part of the object data standard, but I don't know enough about it to make an informed decision.
Custom packing policies
It is expensive to maintain revisions for each update to each object. A mixed-mode storage would allow different objects to have different revision policies. For example, individual objects could use policies like -- keep all revisions (current policy), keep one revision, keep enough revisions to undo, keep landmark revisions.
There already are some notes on MixedModeStorage.
Allow use of foreign TP monitor
Allow ZODB transactions to be controlled by external TP monitor, like Tuxedo.
Concurrent garbage collection
The current garbage collection scheme (pack) uses an explicit pack operation that scans the entire storage. Two important changes that could improve performance and ease management are automatic storage packing (perhaps combined with custom packing policies) and incremental collection that scans part of the storage at each pack.
Some or all of this has already been implemented for Berkeley storage, but I'm not clear on the integration and testing status.
Replicated Storage
Improve availability by replicating the storage using the ZEO client-server protocol. There is an existing fishbowl project, which needs to be finished.
ZEO2
There are two phases to the ZEO 2 project. Phase 1 should be finished ASAP. Phase 2 planning needs to be done. The following items apply to ZEO2 phase and should be integrated into ZEO2 proposal.
zeo cache management
The ZEO cache management scheme can cause some objects to be discard from the cache even though they are frequently used. (Details: Two files, periodically throw one file away, even if objects in that file are used frequently.) The storage server does not have any caching-- each request is served by request it from the storage, often involving a disk access. We may be able to improve performance and fault tolerance by developing better cache schemes on the client and adding a second level of caching on the server.
increased zeo concurrency
Allow multiple transactions commit at one time, increasing overall throughput.
ZEO2 backwards compatibility
The ZEO2 project (already has a project page) makes several simplifications to ZEO to ease future development and to anticipate the needs of replicated storage. One outstanding issue is backwards compatibility with the ZEO1 protocol. This could be tackled as a separate project to serve users who want to migrate incrementally from ZEO1 to ZEO2.
There may be few users who really care about this problem, which may mean that we don't get to it.
integrated temporary buffers and storage
In several cases, a transaction's data is stored in a temporary file (often called a log in zodb) because the transaction may generate a very large update that would be expensive to keep in memory. In ZEO, this can involve multiple copies of the data on disk, because the ZEO server and the underyling storage may both use a temp file. We could optimize transactions by using a single temporary buffer that is shared between server and storage and keeping the data in memory for small transactions.
ZODB-CORBA integration
John Heintz of isogen is working on an integration project for ZODB and CORBA. Not sure of the details, but this is a project where our stewardship and occasionaly development support may help. It is an important project to support.
Per-transaction isolation levels
Most commercial databases provide multiple isolation levels to allow users to trade off consistency against performance. In ZODB, there is one consistency level. It should be possible to specify different isolation levels on a per-transaction basis to change the policy.
Python 2.1/2.2 integration *
If Python 2.2 includes the type-class unification, we'll need to do some work to bring ExtensionClass and ZODB-based extension classes up to speed with Python 2.2.
Zope 2.4 will require Python 2.1, which gives us an opportunity to consider using Python 2.x-isms in future releases of ZODB. We need to assess how many users still need Python 1.5.2 support.
ZEO connection security
A ZEO server allows clients to connection using TCP sockets, but has no way to authentication the entity connecting. As a result, an attacker could connect to the server and modify the database. Current practice is to use ZEO in a carefully controlled environment, i.e. behind a firewall.
Strong authentication of ZEO clients allow more flexible use of ZEO. Clients in different locations, e.g. east and west coast, could use the same server to serve content. Even inside a firewall, authentication could eliminate accidental misuse of a server.
There are a variety of authentication approaches. SSL would allow both client and server to authentication each other. Within a single admin domain, Kerberos could be useful.
ZODB database security
This complements ZEO connection security, but provides security services for applications -- probably more useful for StandaloneZODB? than for Zope. This security system would allow per-user restrictions on access to objects.
ZODB quotas
There are two versions of this project. A fast, easy-to-implement version that gives coarse control over database use -- sufficient for preventing gross abuse. Another version could track object ownership and account for resources more accurately.
Better version support
not sure what the project is, but many people complain that versions are hard to use.
Enhanced file storage
Berkeley storage seems to be the platform for new features and development. The basic file storage, however, is two or three times faster (and could probably be optimized itself). There are also limitations of berkeley-based solutions, such as the limit on the number of objects committed by any one transaction. It may be useful to spend time on a filestorage solution that provides the enterprise-class feature set of berkeley storage without the limitations.
This is a high risk project, because it took a lot of development work and debugging to get FileStorage to its current state. There is some benefit to pushing reliability concerns down to the BerkeleyDB? layer.
asyncore enhancements
The use of asyncore with ZEO is a primary performance bottleneck, rough guess would be 50% of time is spent in asyncore and smac handlers. Investigate improving performance of asyncore, e.g. use of efficient poll() implementation in Python 2.0 and/or moving some functions to C for efficiency.
We need to find out what the current state of medusa development is.
- michel (Jun 1, 2001 4:09 pm; Comment #1)
- Good job, very comprehensive roadmap. One comment on the documentation section. conflict resolution (and avoidance techniques) are documented in the persistence chapter of the dev guide. This venue should be considered for other missing features noted (transactional undo) and future features. There may need to be other venues for topics that are very advanced and ZODB specific (ones not typically a concern for the day-to-day Zope developer).
- jheintz (Jul 25, 2001 2:13 pm; Comment #2)
- Update on the ZODB-CORBA integration. I have released ZCF 0.5 which contains our latest code and an example server for people to take a look at.
This release exposes a Server and multiple Sessions where the Sessions expose CORBA requests to navigate ZODB objects from a DB Connection.
See http://www.zope.org/Members/jheintz/ZODB_CORBA_Connection for the release.