home contents changes options help subscribe edit (external edit)

The basic update operations occur during the two-phase commit. The details described here are based on the prototype implementation.

The basic update operations can be delivered as reliable, FIFO messages by the primary. Each operation is applied locally by the primary. If it succeeds, it is also broadcast to the replicas before it completes. Each replica applies operations as they are received. Normally, each operation should succeed because it has already been validated by the master.

There are several issues to resolve.

Message format. We must decide on a message format to identify the operation and its arguments.

Very large messages. Spread limits message size to 100K. It's possible for a store operation to include much more data than 100K. The message format must support splitting up a single application message into multiple chunks.

Failures at the replica. While an operation that succeeded at the primary should normally succeed at a replica, it may sometimes fail. The likely causes of these failures are local resource problems (like disk space) that differ among servers and programming errors. A transient error in a single storage could cause a failure. If different servers run different storage implementations, an error in one implementation could cause a failure.

Blocking operations. Most networking code in Zope uses asyncore, which uses sockets in non-blocking mode and select/poll to avoid making block network calls. It is probably not possible to avoid these with spread. We can get filenos to pass to select, but cannot avoid reading or writing more data than the socket is ready to handle.

Message format

An application message consists of one or more message fragments. Messages are fragmented if they do not fit into a single Spread message.

Each message has a five-byte header. byte 0: protocol version number bytes 1-2: fragment offset (starts at zero) bytes 3-4: total number of fragments (starts at one)

The application message consists of the bodies of the fragments. The format is a pickled two-tuple, where the first element is a method name (string) and the second element is the argument tuple for the method.

XXX The format is inefficient for large updates, because the object pickle (arg to store call) is copied when it is pickled.



subject:
  ( 11 subscribers )