Issues With Session Objects
Copyright © Graham Dumpleton
This article describes various limitations on what data can be stored into a session object. These limitations should be considered when writing a handler to ensure its portability.
Types Of Session Database
Where Apache is configured to run with multiple child processes, by default an external DBM file database is used to store session information between the times it is required by a request handler. In the case where there is at most a single child process, an in memory database consisting of a global Python dictionary is used to store session information.
Where the session information is stored in memory as the native Python data, a session object could contain any type of Python object. Where an external database is used however, the session object needs to be stored in a serialised form. At the present time this serialisation is performed using the C variant of the "pickle" module.
That the "pickle" serialisation mechanism is used, combined with how mod_python is implemented means that there are certain restrictions as to what data can be stored in the session object. In mod_python 3.1.4 and earlier, the storage of certain types of data could result in intermittent problems with saving sessions in certain circumstances. From mod_python 3.2 however, these types of data will not be able to stored at all, with it consistently failing.
Pickle And Module Loading
The source of the problems and limitations is how the operation of the "pickle" serialisation routine is affected by the module reloading mechanism implemented by mod_python. That is, the module loading mechanism as underlies the Python*Handler directives and as implemented by the apache.import_module() function. The particular types of data which are known to be affected are function objects and class objects.
To illustrate the problems and where they arise, consider the following output from an interactive Python session.
>>> import pickle >>> def a(): pass ... >>> pickle.dumps(a) 'c__main__\na\np0\n.' >>> z = a >>> pickle.dumps(z) 'c__main__\na\np0\n.'
As can be seen, it is possible to pickle a function object. This can be done even through a copy of the function object by reference, although in that case the pickled object still refers to the original function object.
If now the original function object is deleted however, and the copy of the function object is pickled, a failure will occur.
>>> del a >>> pickle.dumps(z) Traceback (most recent call last): ... <deleted> pickle.PicklingError: Can't pickle <function a at 0x612b0>: it's not found as __main__.a
The exception has been raised because the original function object was deleted from where it was created. It occurs because the copy of the original function object is still internally identified by the name which it was assigned at the point of creation. The "pickle" serialisation routine will check that the original object as identified by the name still exists. If it doesn't exist, it will refuse to serialise the object.
Creating a new function object in place of the original function object does not eliminate the problem, although it does result in a different sort of exception.
>>> def a(): pass ... >>> pickle.dumps(z) Traceback (most recent call last): ... <deleted> pickle.PicklingError: Can't pickle <function a at 0x612b0>: it's not the same object as __main__.a
In this case, the "pickle" serialisation routine recognises that "a" exists but realises that it is actually a different function object from which the "z" copy was originally made.
Where problems can start occuring in mod_python 3.1.4 and earlier is if the function object being saved in the session object was itself a copy of some function object which is held outside of the module the function object was defined in. If the module holding the original function object were now reloaded because of the automatic module reloading mechanism implemented by the apache.import_module() function, an attempt to pickle the session object will fail. This is because the original function object which had been copied from will have been overrwritten by a new one when the module was reloaded.
This sort of problem, although it will not occur for an instance of a class, will occur for the class object itself.
>>> class B: pass ... >>> b=B() >>> pickle.dumps(b) '(i__main__\nB\np0\n(dp1\nb.' >>> del B >>> pickle.dumps(b) '(i__main__\nB\np0\n(dp1\nb.' >>> class B: pass ... >>> pickle.dumps(B) 'c__main__\nB\np0\n.' >>> C = B >>> pickle.dumps(C) 'c__main__\nB\np0\n.' >>> del B >>> pickle.dumps(C) Traceback (most recent call last): ... <deleted> pickle.PicklingError: Can't pickle <class __main__.B at 0x53ab0>: it's not found as __main__.B
Note though that for the case of a class instance, an appropriate class object must exist at the same location when the serialised object is being restored.
>>> class B: pass ... >>> b = B() >>> pickle.loads(pickle.dumps(b)) <__main__.B instance at 0x41e40> >>> del B >>> pickle.loads(pickle.dumps(b)) Traceback (most recent call last): ... <delete> AttributeError: 'module' object has no attribute 'B'
Module Importing Changes
In mod_python 3.2, changes have been made to how modules are imported when mod_python.publisher is used. These changes are restricted to mod_python.publisher, but in mod_python 3.3 how apache.import_module() function works will also be changed. These changes are necessary to overcome shortcomings in the existing module importing mechanism.
The main difference with how the new module importing mechanism works is that it no longer makes use of the standard Python module importing mechanism. This is necessary as the standard Python module importing mechanism requires every loaded module to have a unique module name, with each module residing in sys.modules.
The new module importing mechanism avoids placing modules in sys.modules thereby allowing multiple modules with the same name and avoiding some of the issues that currently exist with mod_python. The consequence though of modules not residing in sys.modules is that function objects and class objects within such a module cannot be serialised using "pickle". This is because "pickle" wants to be able to locate the objects through sys.modules.
The problem can be seen in the following output from an interactive Python session.
>>> import new
>>> import pickle
>>> m = new.module("m")
>>> def a(): pass
...
>>> m.f = new.function(a.func_code,m.__dict__,"f")
>>> pickle.dumps(m.f)
Traceback (most recent call last):
... <deleted>
pickle.PicklingError: Can't pickle <function f at 0x67530>: it's not found as m.f
That the issue is due to whether or not the module resides in sys.modules is evident when it is actually added to the global set of modules.
>>> import sys >>> sys.modules["m"] = m >>> pickle.dumps(m.f) 'cm\nf\np0\n.'
Issues also arise for instances of class objects. In this case, although the class instance will be able to serialised by the "pickle" module, it will not be able to be turned back into an object. This again is due to the module not residing in sys.modules and means that instances of classes can also not be stored in sessions.
>>> exec "class C: pass" in m.__dict__ >>> c = m.C() >>> pickle.dumps(c) '(im\nC\np0\n(dp1\nb.' >>> pickle.loads(pickle.dumps(c)) <m.C instance at 0x9a0d0> >>> del sys.modules["m"] >>> pickle.loads(pickle.dumps(c)) Traceback (most recent call last): ... <deleted> ImportError: No module named m
Summary Of Limitations
In practice what this means is that neither function objects, class objects or instances of classes should be stored in session objects, unless you are absolutely gauranteed that the module that the original function object or class object resides in was imported using the standard import statement and that it mapped through to the standard Python import mechanism, with the module always residing in sys.modules and never being reloaded.
In order to ensure that no strange problems at all are likely to occur, it is suggested that only basic builtin Python types, ie., scalars, tuples, lists and dictionaries, be stored in session objects. That is, avoid any type of object which has user defined code associated with it.
