The Process/Interpreter Model

Copyright © Graham Dumpleton

Apache can operate in a number of different modes dependent on the platform being used and the way in which it is configured. This ranges from multiple processes being used with only one request being handled at a time within each process, to one or more processes being used with concurrent requests being handled in distinct threads executing within the same process or distinct processes.

A handler or application which is not written to take into consideration the different process models and modes of operation of Apache will likely not be portable and potentially will not be robust. It may work for one configuration, but fail in unexpected ways when used in another setup. Although you may not need a handler to work under all modes initially, it is highly recommended that any handler still be designed to work under any of the different operating modes.

This article provides some background information on how Apache makes use of processes to handle requests and how Python interpreters are used by mod_python. Knowing this information should assist in making a correctly functioning handler.

The UNIX "prefork" Mode

This mode is the most commonly used. It was the only mode of operation available in Apache 1.3 and is still the default mode on UNIX systems in Apache 2.0 and 2.2. In this setup, the main Apache process will at startup create multiple child processes. When a request is received by the parent process, it will be handed off to one of the child processes to be handled.

Each child process will only handle one request at a time. If another request arrives at the same time, it will be handed off to the next available child process. If there are no more child processes available to handle requests, additional child processes will be created on demand. If a limit is specified as to the number of child processes which may be created and the limit is reached, an error response may instead be returned to the client from which the request was received.

Where additional child processes have to be created due to a peak in the number of current requests arriving and where the number of requests has subsequently dropped off, the excess child processes may be shutdown and killed off. Child processes may also be shutdown and killed off after they have handled some set number of requests.

Although threads are not used to service individual requests, this does not preclude a handler from creating separate threads to perform some specific task. The ability for a handler to create such threads will however be dependent on threading support being compiled into Python and Apache.

The UNIX "worker" Mode

The "worker" mode is similar to "prefork" mode except that within each child process there will exist a number of worker threads. Instead of a request being handed off to the next available child process with the handling of the request being the only thing the child process is doing, the request will be handed off to a specific worker thread within a child process with other worker threads in the same child process potentially handling other requests at the same time.

It is possible that a handler could be executed at the same time from multiple worker threads within the one child process. This means that multiple worker threads may want to access common shared data at the same time. As a consequence, such common shared data must be protected in a way that will allow access and modification in a thread safe manner. Normally this would necessitate the use of some form of synchronisation mechanism to ensure that only one thread at a time accesses and or modifies the common shared data.

If all worker threads within a child process are busy when a new request arrives and the maximum number of worker threads has not been exceeded, a new worker thread will be created. Alternatively, the request may be handed off to an idle worker thread in another child process. Apache may still create new child processes on demand if necessary. Apache may also still shutdown and kill off excess child processes, or child processes that have handled more than a set number of requests.

Overall, use of "worker" mode will result in less child processes needing to be created, but resource usage of individual child processes will be greater.

The Windows "winnt" Mode

When the "winnt" mode is used, multiple worker threads within a child process are used to handle all requests. The "winnt" mode is different to the "worker" mode however in that there is only one child process. At no time are additional child processes created, or that one child process shutdown and killed off, except where Apache as a whole is being stopped or restarted.

In order to cater for an increased number of requests, additional worker threads are created. Because there is only one child process, the maximum number of allowed threads is much greater. When there are excess worker threads that are no longer required, the individual worker threads are shutdown and killed off. This is different to "worker" mode whereby the only way in which worker threads could be killed was to shutdown and kill off the whole child process.

Other Operating Modes

Apache supports a number of other modes specific to particular operating systems. These include modes specific to BeOS, Netware and OS/2. Each of these modes are implemented in a similar way to one of the main variants already described above and as such, provided your handler is written to work with the main variants, it should work with these less common modes.

Sharing Of Global Data

When "winnt" mode is being used, there is only one child process and all requests are handled within the context of that single process. This means that all request handlers will have direct access to the same global data. This global data will persist in memory until Apache is shutdown or restarted.

This ability to access the same global data from any request handler and for that data to persist for the lifetime of that instance of Apache is not present when either of the "prefork" or "worker" modes are used. This is because handlers can be executing within the context of distinct child processes, each with their own set of global data unique to that child process. Further, the global data present within a particular child process only exists for the lifetime of that child process and Apache could shutdown and kill off a child process at any time.

The consequences of this are that you cannot assume that separate invocations of a handler will have access to the same global data if that data only resides within the memory of the child process. If some set of global data must be accessible by all invocations of a handler, that data will need to be stored in a way that it can be accessed from multiple child processes. Such sharing could be achieved by storing the global data within an external database, the filesystem or in shared memory accessible by all child processes.

Since the global data will be accessible from multiple child processes at the same time, there must be adequate locking mechanisms in place to prevent distinct child processes from trying to modify the same data at the same time. The locking mechanisms need to also be able to deal with the case of multiple threads within one child process accessing the global data at the same time, as will be the case for the "worker" and "winnt" modes.

Multiple Python Interpreters

The default behaviour of mod_python is to create distinct interpreters within which requests for different virtual hosts are managed. The name used to identify each interpreter instance will be the name of the virtual host. Multiple such interpreters corresponding to different virtual hosts can exist within the one child process.

Each interpreter instance will have its own set of loaded Python modules. In other words, a change to the global data within the context of one interpreter instance for one virtual host will not be seen from the interpreter corresponding to a different virtual host, even where an interpreter instance for each exists in the same process.

If it is necessary for two different virtual hosts to use the same interpreter instance so that global data is shared, it will be necessary to explicitly specify the name of the interpreter to use. This can be done using the PythonInterpreter directive. It should be set for one of the virtual hosts to be the name of the other virtual host. Alternatively, for both virtual hosts it could be set to be a name common to both but different to the name of either virtual host.

The only other way of sharing data between the interpreters within the one child process would be to use an external data store, or a third party extension for Python which allows communication between multiple interpreters within the same process.

The PythonInterpPerDirectory and PythonInterpPerDirective can also be used to control creation of Python interpreters. In general though, they are used to cause greater separation by creating more distinct interpreters as opposed to using a common interpreter. One would usually be better off explicitly naming the interpreter to use rather than using these directives.

Building A Portable Handler

Taking into consideration the different process models used by Apache and the manner in which interpreters are used by mod_python, to build a portable and robust handler requires the following therefore be satisified.


CategoryModPython

ModPython/Articles/TheProcessInterpreterModel (last edited 2006-09-07 07:08:08 by GrahamDumpleton)