SetHandler Versus AddHandler

Copyright © Graham Dumpleton

This document explores the use of SetHandler and AddHandler to trigger response handlers.

The Handler Directives

When using mod_python, a handler directive needs to be listed in the Apache configuration in order for a user supplied Python function to be called. Depending on which phase of processing a request the function needs to be called, a different handler directive is used. The handler directives currently understood by mod_python are as follows.

Directive

Purpose

PythonPostReadRequestHandler

This handler is called after the request has been read but before any other phases have been processed. This is useful to make decisions based upon the input header fields.

PythonTransHandler

This handler allows for an opportunity to translate the URI into an actual filename, before the server's default rules (Alias directives and the like) are followed.

PythonHeaderParserHandler

This handler is called to give the module a chance to look at the request headers and take any appropriate specific actions early in the processing sequence.

PythonInitHandler

This handler is actually an alias to two different handlers. When specified in the main config file outside any directory tags, it is an alias to PostReadRequestHandler. When specified inside directory (where PostReadRequestHandler is not allowed), it aliases to PythonHeaderParserHandler.

PythonAccessHandler

This routine is called to check for any module-specific restrictions placed upon the requested resource.

PythonAuthenHandler

This routine is called to check the authentication information sent with the request (such as looking up the user in a database and verifying that the [encrypted] password sent matches the one in the database).

PythonAuthzHandler

This handler runs after AuthenHandler and is intended for checking whether a user is allowed to access a particular resource. But more often than not it is done right in the AuthenHandler.

PythonTypeHandler

This routine is called to determine and/or set the various document type information bits.

PythonFixupHandler

This routine is called to perform any module-specific fixing of header fields, et cetera.

PythonHandler

This is the main response handler and is where the content of the response to a request is typically generated.

PythonLogHandler

This routine is called to perform any module-specific logging activities.

The value provided as argument to the directive gives the name of the Python module containing the handler function.

PythonFixupHandler my_web_application
PythonHandler my_web_application

The default name of the handler function which will be used from the specified module is determined from the directive name by dropping the Python prefix and then making the remainder of the name lower case. For example, for the PythonHandler directive, the default name of the handler function is handler. If a different name needs to be used for the handler function, it should be listed after the name of the module, separated from the module name by "::". In other words, the default for the PythonHandler directive would be the same as having explicitly said:

PythonHandler my_web_application::handler

The Response Handler

Unlike all the other phases, a response handler is not called unless something specifies that mod_python is actually responsible for generating the content of a response for a specific request. In other words, only specifying the PythonHandler directive is not enough to ensure that your response handler is called.

The mechanism by which mod_python would be designated as being responsible for providing the content for a response, is for a handler executing within an earlier phase to the response handler, to set the handler attribute of the Apache request object. For a registered fixup handler implemented in Python, this would be written as:

   1 from mod_python import apache
   2 
   3 def fixuphandler(req):
   4     req.handler = 'mod_python'
   5     return apache.OK

The result of doing this, would be that any function specified by the PythonHandler directive would be executed as the response handler for any request falling within the same part of the URL namespace for which the fixup handler is configured to be run.

The decision to use mod_python for the response handler could also be selective based on the extension of the file matched by Apache as the target of the request. Using the ability of mod_python to specify the handler function to be called for a later phase, rather than use a directive, the handler could also be registered within the same fixup handler.

   1 import posixpath
   2 from mod_python import apache, psp
   3 
   4 def fixuphandler(req):
   5     extension = posixpath.splitext(req.filename)[1]
   6     if extension == '.psp':
   7         req.add_handler('PythonHandler', psp.handler)
   8         req.handler = 'mod_python'
   9     return apache.OK

<!> Note that assigning a value to req.handler in this way is not possible prior to mod_python version 3.3. Passing an actual function when registering a handler also cannot be done with versions of mod_python prior to 3.3. In these older versions, to register the handler, the same string as would be used as argument to the directive in the Apache configuration would be used instead.

The SetHandler Directive

Although a fixup handler could be written to set the value of req.handler, thus indicating that mod_python should be used for the response handler phase, more traditionally, special directives supplied by Apache are used to trigger the use of mod_python. The first of these is the SetHandler directive.

SetHandler mod_python

This has exactly the same effect as having set req.handler to be the value "mod_python". Once again, the result of doing this would be that any function specified by the PythonHandler directive would be executed as the response handler for any request falling within the same part of the URL namespace for which the SetHandler directive was defined.

Requests Against A Directory

In the case where the SetHandler directive is used within a Directory directive, or a .htaccess file, even requests which match to the directory will also be processed by the specified response handler. Note though that this will only be for requests which Apache has as necessary already performed trailing slash redirection. That is, where the match is against the directory, the req.uri value would have a trailing slash on the end. This is the case, as the Apache mod_dir module performs such trailing slash redirection automatically at the end of the fixup handler phase.

Although the response handler does not have to deal with the issue of trailing slash redirection where a Directory directive, or .htaccess file is used, it does however have to deal with what it means to receive a request which matches to the directory and not some resource contained within the directory. This is because when SetHandler is used, the DirectoryIndex directive is ignored and it is instead the responsibility of the handler to deal with any request against the directory.

As an example, when mod_python.publisher is used in conjunction with the SetHandler directive, it will internally view the request against the directory as being equivalent to a request against the index.py file contained in that directory. Presuming that the index.py file contains a function called index() that function would then subsequently be executed by mod_python.publisher to handle the request.

If the response handler needs to distinguish when the request is against a directory, as opposed to some resource held within the directory, the value of req.content_type can be queried. For a request against a directory, this will have the value httpd/unix-directory. The handler should always ensure that the req.content_type is reset to some more appropriate value, otherwise a user's browser is likely to expect the response to be saved to a file.

In the case of a Location directive, no trailing slash redirection is performed by the Apache mod_dir module as there is nothing to reference against in order to determine when such a redirection should be forced. As such, any response handler would have to deal with such things as appropriate when the Location directive is used.

<!> Note that this internal reinterpretation of a request against the directory to be a request against the index.py file is not present in mod_python.publisher in version 2.7 of mod_python, only in version 3.0 and later.

Disabling Of MultiViews

When the SetHandler directive is used, because the response handler is responsible for processing all requests falling within that part of the URL namespace, the handler can dictate whether or not it requires some sort of file extension to appear within the URL. In mod_python.publisher for example, the .py extension is treated in a special way, it being the extension used for Python code files. Specifically, whether one uses the .py extension in the URL is optional with mod_python.publisher when the SetHandler directive is being used.

For all such URL interpretation to be managed exclusively by the response handler though, any MultiViews matching performed by the Apache mod_mime module should be disabled. If this is not done, Apache can decide to fail a request even before the response handler is executed, if it cannot in some way associate a URL which does not contain a file extension, against some file within the directory. That this is occurring can be determined by looking in the Apache error log file for error messages of the form:

Negotiation: discovered file(s) matching request: /some/path/index (None could be negotiated).

If this is found to be occurring, perhaps because MultiViews was enabled at some higher scope within the Apache configuration, it can be disabled by using the Options directive.

Options -MultiViews

Mixing In Of Static Files

If a directory for which the SetHandler directive was being used, needs to also contain static files, it will be necessary to override for that specific file type which response handler is used to deliver up the content for the request. In the case of static files, the response handler should be set to the default-handler. For example, if the directory also contained JPEG image files

<Files *.jpg>
SetHandler default-handler
</Files>

The default-hander will use the contents of the file unmodified as the content of the response. If the static files need to be processed in some way, it may be more appropriate to reset the handler back to None. This will allow for other Apache modules to automatically take control of the response handler for the file type they specifically understand. For example, if needing to mix in PHP files within the same directory, None would be used.

<Files *.php>
SetHandler None
</Files>

Multiple PythonHandlers

Where a directory may have multiple resource types that need to be processed by distinct handler functions implemented using mod_python, it will be necessary to qualify for which resource extension that the non primary handler should be used for. The primary handler is that which would handle any requests against the directory itself plus requests for which a specific target couldn't be identified. For example, to mix PSP files along with code files for mod_python.publisher in the same directory and where mod_python.publisher is the primary handler, the following would be used.

SetHandler mod_python
PythonHandler mod_python.publisher
PythonHandler mod_python.psp | .psp

The AddHandler Directive

The second directive after SetHandler that can be used to specify that mod_python should be used for the response handler phase, is the AddHandler directive. The difference when using the AddHandler directive is that whether mod_python would be invoked is conditional based on the file extension. To have mod_python be invoked for only PSP files, the required configuration would be:

AddHandler mod_python .psp
PythonHandler mod_python.psp

If a directory contained multiple resource types that need to be processed by distinct handler functions implemented using mod_python, it will be necessary to qualify for which file extension each handler function should be used.

AddHandler mod_python .psp
PythonHandler mod_python.psp | .psp

AddHandler mod_python .py
PythonHandler mod_python.publisher | .py

If there are static files in the same directory with different resource extensions, whether or not they need to be processed, they will be handled as normal without any further configuration.

DirectoryIndex Directive

Because only requests which match to resources with specific extensions will be processed by mod_python when AddHandler is used, the response handlers do not need to deal with requests which target the actual directory. If you still require some sort of response from a request against the directory, you will need to use the DirectoryIndex directive to specify a target resource to which requests against the directory should be redirected. For example, if there is mod_python.publisher file called index.py containing an appropriate index() function, list index.py against the DirectoryIndex directive.

DirectoryIndex index.py

When the response handler is executed, it will perceive the request as being targeted at the listed resource and not the directory.

Resource Extensions

If you still wish to use a URL which doesn't list the extension for the resource when AddHandler is used, you will need to enable and properly configure the content negotiation features of the Apache mod_mime module. The first two parts to this are enabling the MultiViews option and indicating that extensions associated with handlers should be considered as part of the content negotiation process.

Options MultiViews
MultiviewsMatch Handlers

This will allow for a URL to be used which doesn't list the resource extension, however without further configuration it may not always work properly. Where problems can arise is where a URL does not include the resource extension and there are multiple possible candidates which it could match against. In the absence of any further information, which Apache will choose may not be the one you intended.

Part of the information that is used in making a decision as to which of two resources should be used when no resource extension is used comes from information sent from the HTTP client. This is in the form of the Accept header in a HTTP request.

Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6, image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1

What this indicates is the HTTP clients preference to receive a response of a specific content type over another when there is more than one choice. This preference is specified in terms of content type. Unless additional configuration is supplied however, this may not be able to be applied to resources which have non standard extensions for which there is no standard content type associated.

Two basic possibilities exist for what could occur, although there are probably others as well. Where there is a choice between a resource with a known content type and one without a known content type, that with the known content type will always be given preference. Where there is a choice between two resources where neither has a known content type, Apache will choose that where the target file is smaller in size. Where both are of the same size, which will be chosen will be undefined.

To avoid problems, it is necessary to indicate a content type for non standard extension types. If necessary, a server side preference for which should be used over another can be specified as well. This is done using the AddType directive.

AddType text/html;qs=1.0 .py
AddType text/html;qs=0.9 .psp
AddType text/html;qs=0.8 .html
AddType text/plain;qs=0.7 .txt

Although this will influence the outcome, it isn't saying what the outcome will be. This is because the source quality factors specified here are combined with those from the HTTP client. Thus, if the client has a very high quality factor associated with a plain text file and a very low factor associated with HTML, the plain text content type can still be chosen as that that should be returned.

This approach also has the downside that the same precedence order applies to all resources within the same directory. If it is necessary to affect the choices for individual resources, then it would be necessary to also utilise the type maps feature of the Apache mod_negotiation module.

The Location Directive

The SetHandler directive may be used equally well with the Location directive as the Directory directive. Because there is no corresponding physical directory nor files when the Location directive is used, whether MultiViews is enabled or not, does not become an issue.

<Location /some/path>
SetHandler mod_python
PythonHandler my_web_application
</Location>

That there is no physical directory nor files being matched against, does however present issues with the AddHandler directive. This is because there is no matching process by which an extension can be derived to match against that supplied to the AddHandler directive. To get around this, it is necessary to direct the mod_mime module to take into consideration the value of the path info left after matching the URL against that defined by the Location directive. This is done using the ModMimeUsePathInfo directive.

<Location /some/path>
AddHandler mod_python .py
PythonHandler my_web_application
ModMimeUsePathInfo On
</Location>

This would allow a URL of /some/path/dir1/dir2/xxx.py to match and for the handler to be triggered. The extension must however be at the very end of the URL. A URL of the form /some/path/dir1/xxx.py/action will not be matched.


CategoryModPython

ModPython/Articles/SetHandlerVersusAddHandler (last edited 2006-11-05 22:25:23 by GrahamDumpleton)