Zope Configuration Processing and Side Effects

Status: IsInformational

Author

JimFulton

Problem

The Zope Configuration system uses a three-step processing model:

  1. Collect Actions and Meta Configuration

    In the first phase, some sort of configuration language is used to compute configuration actions or to customize the configuration system (meta configuration). Currently, an XML-based language is used. (A simpler text-based language was used initially.)

    A configuration context is used in the first phase to keep track of configuration state. Meta-configuration directives can modify the configuration context to add features, such as support for new configuration directives. The context acts as a stack and multiple active configuration directives can exchange information through the stack.

    The main purpose of the first phase is to compute configuration actions to be analyzed and possibly executed later.

  2. Resolve conflicts.

    In the second phase, actions are analyzed to determine if any conflict and to try to resolve those conflicts. The mechanism used is outside the scope of this document.

  3. Execute actions.

    In the final phase, the actions remaining after the second phase are executed.

Ideally, the first phase of processing should have no effects outside the configuration context. Meta configuration processing may modify the configuration context to extend the configuration language. Other processing should have no side effects at all.

In practice, our implementations of configuration handlers, used in the first phase, have numerous side effects, including importing modules and, sometimes, creating new module global variables to be used by other handlers. For example, a handler may define a global that is referenced and validated by later handlers. These side effects are generally pretty innocent, but, with a bit more discipline, they can be eliminated.

Almost all of the side effects are a result of efforts to validate configuration input during the first phase. This is generally not necessary. Validation and error reporting can, as easily, be done in the second phase.

Some side effects are unavoidable, however. In particular, it is usually necessary to perform some imports during the first phase. Configuration files are almost always (and probably should always be) found as package data, meaning that it is necessary to import packages to retrieve configuration data. Furthermore, for convenience, we often inspect components to determine configuration data needed to compute action discriminators needed during the second phase, requiring the import of modules containing the components. We should never modify the imported modules during the first phase.

Why should side effects be avoided during the first phase? There are at least 2 reasons:

  • During the second phase, we may decide to discard many actions. The result should be as if the actions were never created in the first place.
  • Generally, these side effects are the result of processing that is best deferred to the second phase for performance reasons. Any processing for actions discarded in the second phase is wasted, so first-phase processing should be minimized.

The main goal of this proposal is to explain, briefly, how processing works and to state the goal that there should be no side effects other than import during the first phase of processing and that even imports should be avoided where practical.

Proposal

We will set as a goal the elimination of first-phase side effects and minimization of first-phase processing. Further, we will refactor core configuration directives to conform to this goal. This will include the adapter, utility, require and allow directives.


comments:

... --TresSeaver?, Mon, 23 Oct 2006 05:04:23 -0700 reply
Other reasons for avoiding side effects include:

  • Being able to introspect a given (even possibly broken) configuration without modifying the global state of the system.
  • Being able to save the parsed configuration (e.g., to speed application restarts), in the same mode as .pyc files.
  • Our "eager" error checking is essentially an attempt to do "compile-time" checking; Python shows us that "run-time" checking for anything by syntax errors can be much more productive.

The eager checking typically happens while trying to validate that "dotted names" can be resolved. We might be able to make this change transparent to downstream code by making that code create a "lazy proxy" for the target object instead; actually using the object in the action code would then trigger the resolution (and therefore any error).



( 97 subscribers )