FileSystemSynchronizationProposal

File-System Synchronization

Status: IsProposal (much is now implemented; see status below)

Author

JimFulton

Problem

Zope 2 has two different development models: through-the-web (TTW), and file-system (FS). Each has its advantages and disadvantages.

TTW development is convenient. Code can be developed from anywhere with just a web browser or an FTP or WebDAV client. It isn't necessary to restart Zope to see the effect of changes. Software changes are automatically replicated transactionally using ZEO. All TTW code is untrusted, which is a blessing and a curse. It's difficult to ignore security, but security checks slow down execution and development. A major problem with TTW development is that it's difficult or impossible to use third-party development tools that rely on file-system storage of source code.

FS development makes it easy to leverage traditional development tools. Because file-system code is trusted, it's easier to ignore security and FS code runs faster without many security checks. For Python programmers, FS development provides a more familiar, module-based model.

It is difficult to choose which model to use. Making matters worse, it is extremely difficult to switch betweed TTW and FS development.

Proposed solution

A key aspect of TTW development is ZODB management of the software. ZODB management of the software and TTW development can be decoupled. It isn't necessary for ZODB-managed software to be untrusted if it isn't edited TTW. The level of trust of TTW code might vary depending on the nature of the web connection. The level of trust should be based on a flexible policy.

The main idea of this proposal is to provide a unified development model based on software that executes from the ZODB but that can be easily moved between the ZODB and the file-system through a synchronization process. This development model includes some key features:

  • Lossless synchronization between the ZODB and the file system
  • Flexible policies for TTW editability and execution trust. Note that Zope 3's component architecture already provides the former. Security proxies, for better or worse, make the difference between trusted and untrusted execution less significant.
  • Persistent modules are Python modules that can be stored in the ZODB. These will provide a familiar development model. They will provides transactional update semantics, making it unnecessary to restart Zope when module source is updated.

This proposal focusses on file-system synchronization. Persistent modules are an ongoing project at PythonLabs.

Actor, Goals, and Use Cases

Actor: Developer (this is an abstraction of Site Developer and Component Developer)

Goals and use cases

  • Use familiar development tools

    These tools may include a version control systems, editors, and file-analysis utilities (e.g. grep). Some of these tools, especially the version control system may be mandated by the developer's organization.

    • Copy Zope objects to the file system

      Use case: Check out one or more Zope objects to a directory

      If the objects had been checked out before, merge changes from the object database into the checked out files.

      Provide a simple report that shows affected files.

    • Copy files from the file system to Zope objects

      Use case: Check in one or more Zope objects from files in a directory

      If any of the objects in the database have changed since the last checkout, then report conflicts and abort the checkin.

      An option can be used to force data on the file system to override the database.

    • Identify and resolve conflicting updates.

      Use case: Find and resolve conflicts

      A tool can be used to identify files that contain unresolveable conflicts.

    • Find out what's changed in Zope or on the file system since the last synchronization.

      Use case: Generate summary and detailed status reports

Approach

A tool will be written that uses IObjectFile and IObjectDirectory interfaces to serialize objects as files:

      class IObjectEntry(Interface):
        """File-system object representation
        """

        def extra():
            """Return extra data for the entry.

            The data are returned as a mapping object that allows *both*
            data retrieval and setting.  The mapping is from names to
            objects that will be serialized to or from the file system.

            """

        def typeIdentifier():
            """Return a dotted name that identifies the object type.

            This is typically a dotted class name.

            This is used when synchronizing from the file system to the
            database to decide whether the existing object and the new
            object are of the same type.

            """

        def factory():
            """Return the dotted name of a factory to recreate an empty entry.

            The factory will be called with no arguments. It is usually
            the dotted name of the object class.

            """

      class IObjectFile(IObjectEntry):
        """File-system object representation for file-like objects
        """

        def getBody():
            """Return the file body"""

        def setBody(body):
            """Change the file body"""

      class IObjectDirectory(IObjectEntry):
        """File-system object representation for directory-like objects
        """

        def contents():
            """Return the contents

            A sequence of (name, value) tuples is returned.
            The value in each pair will be syncronized.
            """

Normally, one IObjectFile or IObjectDirectory is provided via an adapter.

The synchronization mechanism uses an administration directory, named @@Zope, in much the same way that CVS, uses a directory named CVS and Subversion, uses a directory named .svn.

As currently envisioned, the @@Zope directory will have:

  • An Entries.xml file , akin to CVS's Entries file and Subversion's entries file, this contains information about the files or directories stored in the directory containing the @@Zope directory.
  • An Annotations directory that contains annotation data. Annotations are meta data, such as security assertions or Dublin-code meta-data that are associated with an object. Each object that has annotations will have a subdirectory in Annotations containing the object's annotation data.

    The subdirectories in Annotations contained synchronized data, so they have @@Zope directories too.

    The synchronization tool uses the IAnnotations interface to get annotation data for objects.

  • An Extra directory that contains content data that doesn't fit in a useful file-system representation. As with annotation data, there will be a directory in Extra for each object that has extra data.

    The subdirectories in Extra contained synchronized data, so they have @@Zope directies too.

    The extra method allows data to be stored that doesn't fit well into the file-system representation. The data returned by the extra method will be saved as files in the @@Zope adminstration directory.

  • An Original directory that contains copies of all files (not directories) as of the last synchonization time. The originals are used to figure out if anything has changed in the object database or on the file system since the last synchronization and to show and merge differences.

To see how all of this fits together, consider the following example. A Zope folder, folder, contains two objects, a file, data.xml, and a page template, index.html. The folder and the file have security assertions. Folders and page templates don't have extra data, but files do. Files have to store a content type in addition to their body data.

After synchronizing the folder object to the directory tmp, the tmp directory will contain the following files and subdirectories (shown using a variant on Unix's ls format):

      tmp:
      ---------------------------------------
      drwxrwxr-x  4096  folder
      drwxrwxr-x  4096  @@Zope

      tmp/@@Zope:
      ---------------------------------------
      drwxrwxr-x  4096  Annotations
      -rw-rw-r--   406  Entries.xml

      tmp/@@Zope/Annotations:
      ---------------------------------------
      drwxrwxr-x  4096  folder

      tmp/@@Zope/Annotations/folder:
      ---------------------------------------
      drwxrwxr-x  4096  @@Zope
      -rw-rw-r--    20  zope.app.security.AnnotationPrincipalRoleManager
      -rw-rw-r--    57  zope.app.security.AnnotationRolePermissionManager

      tmp/@@Zope/Annotations/folder/@@Zope:
      ---------------------------------------
      -rw-rw-r--  1009  Entries.xml
      drwxrwxr-x  4096  Original

      tmp/@@Zope/Annotations/folder/@@Zope/Original:
      ---------------------------------------
      -rw-rw-r--    20  zope.app.security.AnnotationPrincipalRoleManager
      -rw-rw-r--    57  zope.app.security.AnnotationRolePermissionManager

      tmp/folder:
      ---------------------------------------
      -rw-rw-r--    30  data.xml
      -rw-rw-r--    86  index.html
      drwxrwxr-x  4096  @@Zope

      tmp/folder/@@Zope:
      ---------------------------------------
      drwxrwxr-x  4096  Annotations
      -rw-rw-r--   780  Entries.xml
      drwxrwxr-x  4096  Extra
      drwxrwxr-x  4096  Original

      tmp/folder/@@Zope/Annotations:
      ---------------------------------------
      drwxrwxr-x  4096  data.xml

      tmp/folder/@@Zope/Annotations/data.xml:
      ---------------------------------------
      drwxrwxr-x  4096  @@Zope
      -rw-rw-r--   106  zope.app.security.AnnotationRolePermissionManager

      tmp/folder/@@Zope/Annotations/data.xml/@@Zope:
      ---------------------------------------
      -rw-rw-r--   548  Entries.xml
      drwxrwxr-x  4096  Original

      tmp/folder/@@Zope/Annotations/data.xml/@@Zope/Original:
      ---------------------------------------
      -rw-rw-r--   106  zope.app.security.AnnotationRolePermissionManager

      tmp/folder/@@Zope/Extra:
      ---------------------------------------
      drwxrwxr-x  4096  data.xml

      tmp/folder/@@Zope/Extra/data.xml:
      ---------------------------------------
      -rw-rw-r--    87  contentType
      drwxrwxr-x  4096  @@Zope

      tmp/folder/@@Zope/Extra/data.xml/@@Zope:
      ---------------------------------------
      -rw-rw-r--   338  Entries.xml
      drwxrwxr-x  4096  Original

      tmp/folder/@@Zope/Extra/data.xml/@@Zope/Original:
      ---------------------------------------
      -rw-rw-r--    87  contentType

      tmp/folder/@@Zope/Original:
      ---------------------------------------
      -rw-rw-r--    30  data.xml
      -rw-rw-r--    86  index.html

Notes on the example:

  • The security assertions show up as files in Annotations subdirectories. For example, role-permission assignments made on data.xml show up in the file zope.app.security.AnnotationRolePermissionManager, in the directory tmp/folder/@@Zope/Annotations/data.xml.
  • The data.xml content type was stored in the file contentType in the extra directory for data.xml, tmp/folder/@@Zope/Extra/data.xml.

The initial tool will be a command-line tool modelled loosly on CVS and subversion and using the ZODB to access objects for synchronization. Later efforts might provide graphical user interfaces or utilize network protocols (other than ZEO) to access data.

A default adapter is provided that saves data via an XML pickle. (Note that a much-improved XML pickler, relative to the XML pickler in Zope 2, will be provided that is fairly friendly to tools like diff and CVS.)

Risks

  • Lossy serialization due to incomplete adapters

    For the most part, the solution relies on specialized adapters provided for specific content types. A poorly-written adapter could cause data loss or even improperly constructed content that violates its invariants. The risk is mitigated in part by the fact that the annotation framework frees the content-type developer from dealing with meta-data such as security asserions.

  • Hierarchical model

    File systems are hierarchical, and, for the most part, so is the Zope object system. Cyclical or other non-hierarchical references can cause problems without special care. For example, when we fall back to XML pickles for objects that don't have specilized adapters, and those instances have persistent classes, we'll need to be careful not to include the classes in the serialization.

  • Interface adapters are too brittle

    Because of this, a separate service was implemented that maps classes to adapters that implement IObjectEntry (and one of its subclasses) and does not automatically map subclasses.

Status

  • A new implementation exists; the command line tools are checked in as .../src/zope/fssync/, and a README.txt file there explains the project status in more detail. For usage documentation, run the main.py script with a -h option. I don't plan to keep this Wiki up to date. (Guido van Rossum)


andym (Sep 30, 2002 1:25 am; Comment #1)
I followed this with a proposal: CommonFileSystemAPI
deb_h (Sep 30, 2002 7:26 am; Comment #2)
file-system synchronization model diagram



( 96 subscribers )