GRAHAM DUMPLETON

Replacement Module Importer

Author: Graham Dumpleton
Contact: grahamd@dscpl.com.au
Updated:11/03/2006

This article documents a proposed replacement for the current module importing system provided with mod_python. This replacement system is intended to address the bulk of currently known issues, as well as provide additional features that will make it easier to use. The material presented here is based on a working implementation of the replacement system and is not purely theoretical.

Note

This document is a work in progress and thus is incomplete. The exact details of how the importer will work may change over time. If this document raises any issues, or problems are perceived in what is being proposed, either bring it up on the mod_python developers mailing list, or log a comment against MODPYTHON-143.

Backward Compatability

In respective of programming interfaces, ensuring backward compatibility is vital. Any replacement cannot be a completely new system which stands distinct from the existing system, thus requiring existing code to be changed significantly. Any replacement must still preserve the current functionality of the "import_module()" function as much as possible. Ideally this should remain the only entry point into the system, even if it means the purpose of the function may need to be overloaded to some degree.

First and foremost any replacement system should strive for correctness and robustness. Inevitably this may mean some reduction in performance. Tuning of the system to improve the performance should be seen as a secondary task, only carried out after the functionality of the system is found to be correct and adequate to replace the existing system.

Module Import Function

The module importing system is all based upon the function "import_module()" contained in the "mod_python.apache" module. This function is used to import modules which have been specified by any of the "Python*Handler", "PythonHandlerModule", "Python*Filter" or "PythonConnectionHandler" directives. The "import_module()" function may also be used explicitly from within any Python code executing within the context of mod_python.

Note

The "PythonImport" directive does not currently use the "import_module()" function but needs to be changed so that it does. The underlying reasons for such a change are described as [ISSUE 17].

The prototype for the "import_module()" function in the existing implementation is as follows:

import_module(module_name,autoreload=1,log=0,path=None)

The only required parameter is that of the module name. In supplying only the module name, a search will be made of the Python module search path as specified by "sys.path". If the "path" parameter is supplied, it should be a list of directories to search for the module. To load a module from a particular location, the "path" parameter would be a list containing just the directory containing the desired module.

The "log" argument controls whether debugging information is output to the Apache error log file pertaining to when modules are loaded. The "autoreload" argument is consulted when the module is already cached and controls whether the module should be reloaded if the version on disk has changed.

When any of the directives is the trigger for the "import_module()" function being called to load a module, the "log" and "autoreload" arguments are explicitly set based on the values of the "PythonDebug" and "PythonAutoReload" directives as may be specified in the Apache configuration. If the "import_module()" is called explicitly, these arguments will use their default values or as otherwise provided by the caller.

In the replacement for the current module importing system, the number of arguments to the "import_module()" function will remain the same. The default arguments for "log" and "autoreload" option will however change and those two arguments should in nearly all cases not need to be supplied. The revised prototype for the "import_module()" function will therefore be as follows:

import_module(module_name,autoreload=None,log=None,path=None)

The manner in which the "module_name" and "path" arguments are interpreted will also change subtly. This will be to allow modules to also be specified by their full pathname. A more an indepth discussion of the proposed changes is undertaken in the following sections.

Configuration Of Logging

As described under [ISSUE 1], in the existing implementation the only way to enable logging when using the "import_module()" function explicitly is by supplying the "log" argument to the function itself, there is no effective way of enabling logging of imports globally in one place.

This problem is solved in the replacement implementation by having the "import_module()" function directly access the appropriate request or server configuration in order to query the value associated with the "PythonDebug" setting. Such a direct lookup of the configuration will occur when the "log" argument to the "import_module()" function is "None".

As "None" is now the default value for the "log" argument, in practically all cases it will therefore no longer be necessary to supply the "log" argument. The "import_module()" function will automatically pick up the setting from the Apache configuration. The only time that the "log" argument would be required is where a user wanted to specifically override the setting in a particular case.

In order to achieve this, the implementation of the top level mod_python "CallBack" object through which handler dispatch occurs, has been modified to cache the configuration object obtainable from a request object. The "import_module()" function is then able to directly access the configuration object from the cache.

Where the "import_module()" function is called from within the context of a module imported using the "PythonImport" directive, there will be no request object to obtain the configuration from. In that case, the server level configuration object cached by the "CallBack" object when mod_python is first initialised is instead used.

Disabling Of AutoReload

As described under [ISSUE 7], a similar problem to that of enabling logging globally in one place, exists for disabling of the autoreload feature for modules. This problem is solved in the same way, with the "import_module()" function directly accessing the cached configuration object to obtain the value associated with the "PythonAutoReload" setting. As with the "log" argument, in practically all cases it will therefore no longer be necessary to supply the "autoreload" argument. The only time that the "autoreload" argument would be required is where a user wanted to specifically override the setting in a particular case.

Note that this doesn't, and can't, solve the problems associated with a user consciously setting the value of "PythonAutoReload" to different values for different parts of the document tree in the context of the same interpreter and then importing a common module. To at least lesson the risk of rogue instances of the "PythonAutoReload" setting in a production environment causing problems, the new function "freeze_modules()" is provided. This function can be called from the context of a module imported using the "PythonImport" directive and will have the same effect of disabling module reloading. When used though, any settings for "PythonAutoReload" or the "autoreload" argument to "import_module()" will subsequently be ignored.

Note

With mod_python 3.3 implementing "req.server.get_options()", as well as, or in place of the "freeze_modules()" function, a global PythonOption setting could be used instead.

How Modules Are Found

Where the only argument supplied to the "import_module()" function is the module name, in the existing implementation a search is made of the Python module search path for a module or package of the specified name. The Python module search path is that as defined by "sys.path". In order that modules located in the document tree are found, the top level mod_python "CallBack" object through which handler dispatch occurs, inserts the directory for which a "Python*Handler", "PythonHandlerModule" or "Python*Filter" directive was specified into "sys.path". Where a search path is specified by supplying the "path" argument, only those directories are searched but this is done without inserting the directories specified by the "path" argument into "sys.path".

The manner in which the existing implementation modifies "sys.path", the fact that the modules are still stored in "sys.modules" and the mechanisms used to support modules of the same name in different locations, cause a number of problems, including [ISSUE 14], [ISSUE 9], [ISSUE 10], [ISSUE 12] and [ISSUE 13]. To avoid these problems, the replacement implementation does not modify "sys.path" and does not store modules in "sys.modules", but keeps them in a separate cache.

Because "sys.path" would no longer be modified in the replacement implementation, the mechanism used to perform a search for a module is different. Specifically, it no longer uses "imp.find_module()" and a customised search algorithim is instead used. The steps performed by this algorithm for where a module name and optionally a search path are supplied to the "import_module()" function are as follows:

  1. If a search path is specified using the "path" argument, each of those directories will be searched for the module code stored as a file the same name as the requested module, but with a ".py" extension.
  2. If the "import_module()" function is being used within the context of handling a request, the "PythonPath" directive has not been specified, and the "Python*Handler", "PythonHandlerModule" or "Python*Filter" directive was specified inside a "Directory" directive or in a ".htaccess" file, the directory for which the handler or filter directive was specified for will be searched for the module code stored as a file the same name as the requested module, but with a ".py" extension.
  3. If the module could not be found, importing of the module is instead handed off to the standard Python import mechanism by calling the "__import__" function. The standard Python import mechanism will only search directories found in "sys.path".

Where the code file corresponding to a module is found in steps 1 or 2, the code is loaded into an empty module created using the "imp.new_module()" function. The actual loading of code into the module is done using "execfile()", where globals is set to the builtin "__dict__" of the newly created module. The module is stored into a distinct module cache for later use and returned. This module cache is keyed by the pathname to the modules code file and not by the name of the module, thereby keeping distinct modules of the same name residing at different locations. Obviously, if a module corresponding to a specific pathname were already in the cache, the module could be obtained from the cache rather than loading it again.

The use of a separate module cache is to ensure that there is a clear separation between modules imported directly by mod_python and those located in one of the directories specified in "sys.path" and imported by the standard Python import mechanism. Only modules held in the mod_python cache will be candidates for automatic module reloading. Because mod_python will only load file based Python modules and not packages, packages must be located in a directory listed in "sys.path", will be loaded using the standard Python import mechanism and will therefore not be candidates for automatic reloading.

Use Of Import Statement

In the existing implementation, where used within the context of the "Directory" directive or ".htaccess" file, the directory associated with a "Python*Handler", "PythonHandlerModule" or "Python*Filter" directive will be inserted into "sys.path" prior to the "import_module()" function being called to load the module specified by the directive. This was done instead of the directory being supplied to the "import_module()" function using the "path" argument, as the latter would preclude use of modules which had been installed into the Python "site-packages" directory from being used.

For example, it would not have been possible to specify "mod_python.publisher" or "mod_python.psp" as the handler module. Any module specified would have had to reside in the directory the directive applied to. The changes to the search algorithm in the replacement implementation mean that when the "path" argument is supplied, if the module cannot be found in those directories it will fall through to using the standard Python import mechanism to search the "sys.path" directories.

The replacement implementation therefore is now able to pass the directory a directive applies to in the "path" argument to the "import_module()" function, avoiding any need for it to be added to "sys.path". At the same time though, not adding the directory into "sys.path" may cause some existing code to stop working. Code which would be affected would be where a user has used the Python "import" statement or even the "import_module()" function, without specifying a "path" argument, to import modules from the directory the directive applied to.

It should be said though, that such code may have been unstable and prone to random failures anyway. Such instability could have arisen due to problems such as described under [ISSUE 9] and also the unpredictable order that directories are added to "sys.path" when there are multiple handler or filter directives specified within the context of a single Python interpreter. The latter resulting in the wrong module being imported where the same name was used for a module in multiple locations.

The issues with using the "import_module()" function in this case have been eliminated through the changes to the algorithm it uses to search for modules. To combat the issues arising from the mixing of the "import" statement and the "import_module()" function to load the same module and to provide more certainty as to which directories are searched for a module when the "import" statement is used, the replacement implementation uses import hooks as first outlined in PEP 302, to intercept use of the "import" statement and customise how it also searches for and loads modules.

Note

PEP 302 was first implemented in Python 2.3a1. This means that were the module importer to use this feature, it would set a minimum requirement that you must have Python 2.3 or later to use mod_python. Currently mod_python still supports the use of Python 2.2.

Because of how import hooks work, all use of the "import" statement will be intercepted, however, the customised search algorithm will only be applied where the "import" statement is being used in a module which was previously loaded and stored in the module cache. This means that how the "import" statement works for modules found in directories specified in "sys.path" is unchanged. The steps performed by this algorithm when it is applied however, are:

  1. Search in the same directory as the code file corresponding to the module the import is being performed in, for the module code stored as a file the same name as the requested module, but with a ".py" extension.
  2. If an additional module search path has been embedded within the module the import is being performed in, each of those directories will be searched for the module code stored as a file the same name as the requested module, but with a ".py" extension.
  3. If the import is occuring within the context of handling a request, the "PythonPath" directive has not been specified, and the "Python*Handler", "PythonHandlerModule" or "Python*Filter" directive was specified inside a "Directory" directive or in a ".htaccess" file, the directory for which the handler or filter directive was specified for will be searched for the module code stored as a file the same name as the requested module, but with a ".py" extension.
  4. If the module could not be found, importing of the module is instead handed back to the standard Python import mechanism, which will only search directories found in "sys.path".

As was the case when the "import_module()" function is used, any code file identified in steps 1, 2 or 3 would be explicitly loaded as necessary and stored into the module cache. This is the same module cache as used by the "import_module()" function. Thus if the module had already been loaded previously using the "import_module()" function, the module in the cache could be used instead of loading it again. Such modules are still candidates for later automatic reloading. Also as before, packages will be handed off to be loaded by the standard Python import mechanism.

Setting Of PythonPath

The purpose of the "PythonPath" setting is to allow the default Python module search path as specified by "sys.path" to be modified. It could be used for example to specify a directory outside of the document tree where modules specific to the web application are stored. The "PythonPath" setting can still be used in this way, however any modules or packages in the additional directories will be loaded by the standard Python module import mechanism and will not be candidates for automatic module reloading.

Although the directories added to "sys.path" using the "PythonPath" setting could include directories inside the document tree, this was always problematic in the existing implementation due to issues arising from the overlapping use of the "import" statement and the "import_module()" function. To avoid problems, no directory within the document tree should be added to "sys.path" using "PythonPath" or directly. To highlight this potentially bad practice, the replacement implementation will log a warning to the Apache error log when there is an attempt to explicitly load a module from a location which is also listed in "sys.path". Other than the warning however, no other action is taken and it is up to the user to rectify their code so as to avoid this scenario.

If the "PythonPath" setting is specified, in both the search algorithms underlying the "import_module()" function and the "import" statement, a check is not made in the directory associated with a handler or filter directive as a result. Further, once the "PythonPath" setting has been specified, this behaviour can't be undone and even applies to a distinct interpreter specified for just a part of the document tree below where the "PythonPath" setting was specified. This is the case, as that is how the existing implementation behaves.

Note

It is debatable whether this behaviour actually makes sense. Certainly in the context of how the replacement implementation works, it may be better to not preserve the existing behaviour. This would certainly simplify the rules as to how the search algorithm works and make the behaviour consistent across all cases. For now though the existing implementation behaviour is retained, but feedback is requested as to what should be done in this particular case.

More Stuff To Come Later

This document is not yet finished ....