Servlet Framework
The HTTP servlet framework can be used to provide a window into your application. A number of predefined servlets are provided or you can create your own. You can also create your own server objects to map the servlets to appropriate parts of the URL namespace. Alternatively, a number of predefined server objects can be used for common tasks such as serving up files from the filesystem, or provision of RPC over HTTP services. Basic user authentication is implemented and clients can also be blocked based on their address.
The major classes in the OSE C++ class library involved in providing this functionality are the OTC_HttpDaemon, OTC_HttpServer and OTC_HttpServlet classes, plus the various derived servlet and server classes. The implementation of the HTTP servlet framework is based on the event system and multiple HTTP requests can be handled concurrently.
Although the framework is quite powerful, you should still keep in mind that its main purpose is for interacting with an application. If you are after a general purpose web server, you would probably be better off using a product like Apache. If you need the appearance that the web site and application are one, use the "mod_proxy" plugin for Apache to redirect only a portion of the URL namespace. This actually has the added benefit that Apache can be used to setup a SSL connection with the client over any insecure network, with communication between Apache and the application on the secure network being normal HTTP.
Framework Overview
When a HTTP client makes a connection to the server process, a session manager is created to parse any requests made by that client. For each request, an attempt is made to find a server object which manages the part of the URL namespace that the request falls under. This server object is then asked to provide a servlet to handle the actual request. If no server object is found corresponding to that portion of the URL namespace, or the server object is not able to provide a servlet to handle the request, a HTTP error response is returned to the client indicating that the resource corresponding to the supplied URL could not be found.
Where an appropriate servlet to handle the request is found, the session manager will initially pass off to the servlet the details of the request. This will include the type of request, the URL and the contents of any HTTP headers. The details initially provided to the servlet do not include any content associated with the request. Any content associated with a request will subsequently be passed to the servlet as it arrives. This will only occur though if the servlet wasn't able to process the request based on the initial information and does actually require the content.
In the majority of cases a request will not have any associated content and a servlet will be able to process the request straight away. Even if there is no content however, the servlet isn't obligated to send a response immediately. This may be the situation if the servlet needs to wait until information from another source arrives before it can form the response. In this scenario, the servlet might send a request using the messaging framework to a remote service to obtain the information. When the response from the remote service arrives, the servlet can then generate the response.
When the action of the servlet does depend on the content supplied with the request, the servlet would accumulate the content as it arrives until the amount of content matches that given in the content length header, or until some appropriate boundary is encountered. Now having all the content associated with the request, the servlet can process the request and send a response. Alternatively it could again delay the response if it needs to first send the content received to some remote service and wait for some response.
When a servlet sends a response to the HTTP client, as long as the servlet generates a content length in the HTTP headers, any request by a HTTP client to keep alive the session will be honoured. This allows the HTTP client to submit additional requests using the same connection if desired. In general the servlet framework adheres to the HTTP 1.0 protocol.
The HTTP Daemon
The Python class which listens for connection requests from HTTP clients is called HttpDaemon. When creating the HTTP daemon, you need to tell it which port to listen on and also register with it any HTTP server objects. When registering a HTTP server object, you need to identify which part of the URL namespace it manages. Finally, you need to start the daemon so that once the dispatcher is run it will actually listen and handle the requests.
1 dispatcher = netsvc.Dispatcher()
2 dispatcher.monitor(signal.SIGINT)
3
4 daemon = netsvc.HttpDaemon(8000)
5 filesrvr = netsvc.FileServer(os.getcwd())
6 daemon.attach("/", filesrvr)
7 daemon.start()
8
9 dispatcher.run()
When a HTTP server object is registered, the first argument to the attach() member function should be the path under which resources made available by the HTTP server object are accessible. Except for the root directory, the path should not include a trailing "/". The path should also be in normalised form. That is, it should not include consecutive instances of "/" within the path, or include the path components ".." or ".".
If the path isn't normalised in this respect, these paths will never match against any request as request URLs will always be normalised before attempting a match. The request URL is always normalised to avoid the possibility of malicious requests trying to access file type resources outside the available directory tree.
If desired, a single HTTP server object may be registered multiple times within the one URL namespace. Registrations may also be done hierachically. That is, one registration may nest within the URL namespace of another. In this situation a request will match against the HTTP server object with the most deeply nested path.
1 filesrvr1 = netsvc.FileServer(os.path.join(os.getcwd(),"info"))
2 filesrvr2 = netsvc.FileServer(os.path.join(os.getcwd(),"logs"))
3 daemon.attach("/",filesrvr1)
4 daemon.attach("/logs",filesrvr2)
Normally the port which the HTTP daemon is to listen on will be fixed. If you require a dynamically allocated port, you should use "0" as the port number. The actual port number which is allocated can then be queried using the port() member function. Obviously, this port number would then need to be displayed somewhere or otherwise accessible so it is known which port to connect to.
1 daemon = netsvc.HttpDaemon(0)
2 port = daemon.port()
The File Server
The FileServer class is a predefined HTTP server object for serving up files from the file system in response to HTTP GET requests. This server object is suitable for providing access to documentation related to an application, configuration files or application log files. A plugin mechanism for handling special file types is also included
In the case of files resident in the file system, the server is able to handle any size file, with the corresponding servlet only sending data back to the HTTP client as it is able to receive it. That is, transmission of a large file will not blow out the size of the application nor will it cause the application to block if the client is slow at reading the contents of the file.
When an instance of the FileServer class is created, it must be supplied with the filesystem directory from which files are to be served. The server object utilises the Python "mimetypes" module for determining file types. The file type associated with an extension can be overridden, or knowledge of additional file types can be added using the map() member function.
1 filesrvr = netsvc.FileServer("/home/httpd")
2 filesrvr.map(".py", "text/plain")
Note that it is expected that the HTTP client knows the name of the file it is trying to access as there is no builtin support included for directory browsing. It is however possible to define the names of one or more index files to try when a request identifies a directory as opposed to a file. When more than one index file is specified, those which were declared later, take precedence.
1 filesrvr.index("index.htm")
2 filesrvr.index("index.html")
Editor backup files, temporary files generated by an application, or any other files which should not in any way be accessible from a HTTP client, can be hidden from view so long as they have a distinct extension.
1 filesrvr.hide(".bak")
2 filesrvr.hide(".html~")
When it comes to the actual task of serving up a single file from the file system, the FileServlet class is used. This is a wrapper around the corresponding servlet class from the OSE C++ class library used to handle the request for a single file. The servlet may be used directly from a custom HTTP server object.
Client Authorisation
If the HTTP servlet framework is being used to provide an administrative interface into an application, it may be desirable to block access from all but a few selected client hosts. This can be useful where the application is otherwise intentionally accessible over the Internet, or may inadvertantly become accessible from a broader range of hosts than intended. This may result from misconfigured firewalls, or the addition of additional subnets to a corporate network.
If you wish to control who can access the application through the port used by the HTTP daemon, it is necessary to create a derived version of the HttpDaemon class and override the authorise() member function. For each client connection, this member function will be called with the IP address of the client host. Your code can thereby block requests from any undesirable hosts.
1 class HttpDaemon(netsvc.HttpDaemon):
2 def __init__(self, port, hosts=[]):
3 netsvc.HttpDaemon.__init__(self, port)
4 self._allow = hosts
5 def authorise(self, host):
6 return host in self._allow
If access to a particular client is disallowed, the connection will be dropped immediately. The client will not receive any form of specific HTTP error response indicating why the connection has been closed. Note that this mechanism blocks a client from accessing any part of the URL namespace for that HTTP daemon. If you wish to only block client access to specific resources, you would need to customise each HTTP server object or servlet, or use multiple HTTP daemon objects on separate ports.
User Authorisation
A further level of authorisation beyond that of blocking specific client hosts is to individually authenticate each user. The mechanism for user authentication is performed against the HTTP server objects. That is, the URL namespace managed by each HTTP server component can be individually protected using different user databases.
To add user authentication to a particular HTTP server object, you should derive from the class and override the authorise() member function. If building your own HTTP server object, you could embed the member function directly in your class.
1 class FileServer(netsvc.FileServer):
2 def __init__(self, directory, users={}):
3 netsvc.FileServer.__init__(self, directory)
4 self._allow = users
5 def authorise(self, login, password):
6 return self._allow.has_key(login) and \
self._allow[login] == password
If you need to control user access at the level of individual URLs within the URL namespace managed by a particular HTTP server object, that functionality would need to be embedded into any servlets created by that HTTP server object, or managed at the point that the servlets are created by the HTTP server object. Note that only the HTTP basic authentication mechanism is supported. There is no support for use of secure sockets and SSL.
HTTP Server Objects
When a HTTP request is received, it is a HTTP server object which will dictate the type of HTTP servlet created to handle the request. If you wish to implement a customised mapping between request URLs and the available HTTP servlets, or introduce a new type of HTTP servlet, you will need to define your own HTTP server object by deriving from the HttpServer class and overriding the servlet() member function.
1 class HttpServer(netsvc.HttpServer):
2 def servlet(self, session):
3 servletPath = session.servletPath()
4 if servletPath == "echo":
5 return netsvc.EchoServlet(session)
6 elif servletPath == "motd":
7 return netsvc.FileServlet(session, "/etc/motd", "text/plain")
8 return netsvc.ErrorServlet(404)
9
10 daemon = netsvc.HttpDaemon(8000)
11 server = HttpServer()
12 daemon.attach("/test", server)
13 daemon.start()
The job of the servlet() member function is to create an instance of a HTTP servlet capable of handling a request made against a specific URL. When the servlet() member function is called it is supplied with the HTTP session object. The session object provides access to details of the request, including the server root and servlet path. The server root corresponds to the path under which the HTTP server object was registered. The servlet path is the remainder of the path expressed relative to that server root.
As an example, if the request used the path "/test/echo" and the HTTP server object was registered with the path "/test", the server root would be "/test" and the servlet path would be "echo". In the case that a HTTP server object is registered with path "/", the server root will still be "/". This is the only case where the trailing "/" isn't removed.
Under normal circumstances the HTTP server object would determine the type of HTTP servlet to create and the resource being referenced based only on the servlet path. If necessary however, it can query other information related to a request. Such a circumstance might be to look for the presence of cookies used to implement a user session mechanism.
When a HTTP servlet is created, it will need to be passed the handle to the HTTP session object. All the predefined HTTP servlets accept this as the first argument when the servlet is created. If you are defining your own servlets, it is recommended you follow this convention.
If the HTTP server object isn't able to map a request to a particular type of HTTP servlet, the "servlet()" member function should return None, or should indicate a specific type of HTTP error response using the ErrorServlet class. A HTTP client can be redirected to a different resource using the RedirectServlet class.
The Error Servlet
The error servlet as implemented by the ErrorServlet class is provided as a quick way for a custom HTTP server object to return a HTTP error response. In addition to the HTTP session object, the error servlet needs to be supplied with an approriate HTTP error response code. Text to be included in the body of the response can also be provided if desired. Such text may include any relevant HTML markup, but should not include the opening and closing "body" tags.
1 class HttpServer(netsvc.HttpServer):
2 def servlet(self, session):
3 return netsvc.ErrorServlet(session, 501, "Not implemented.")
The Redirect Servlet
The redirect servlet as implemented by the RedirectServlet class would be used when it is necessary to redirect a HTTP client to an alternate resource. In addition to the HTTP session object, it should be supplied the URI of the resource to which the HTTP client is to be directed. By default, the HTTP response code will be "302", indicating the resource has been temporarily moved. This can be explicitly indicated by using the value "REDIRECT_TEMPORARY". If the resource has been permanently moved, the value "REDIRECT_PERMANENT" can instead be used.
1 class HttpServer(netsvc.HttpServer):
2 def servlet(self, session):
3 url = "http://hostname/" + session.servletPath()
4 type = netsvc.REDIRECT_PERMANENT
5 return netsvc.RedirectServlet(session, url, type)
If the URI doesn't start with "/", it is assumed to be a valid URI and will be passed as is. If the URI starts with "/", it will assumed to be an absolute URL against the current server host and will be automatically adjusted to include the details of the server host in the URL.
The Echo Servlet
The echo servlet as implemented by the EchoServlet class is useful for debugging. When used to service a HTTP request, it will generate a HTML document which provides details about the request.
1 class HttpServer(netsvc.HttpServer):
2 def servlet(self, session):
3 return netsvc.EchoServlet(session)
The File Servlet
The file servlet as implemented by the FileServlet class, is used to deliver up to a HTTP client the contents of a file stored in the operating system's filesystem. This is the same servlet which is used internal to the "FileServer" class. When this servlet is being created it needs to be supplied with the name of the file and the file type. The latter corresponds to the MIME content type included in the HTTP response.
If the path supplied to the FileServlet class actually describes a directory, the servlet will generate a response indicating that access is forbidden. If you wish to implement directory browsing you will need to implement a separate HTTP servlet to generate an appropriate response and map the request to it. If you want to redirect the request to an index file, your HTTP server should determine if such an index file exists and if it does, create the file servlet against it instead.
When the FileServlet class is used, any size file can be handled without the size of the application growing in size and without the application blocking as a result of a slow HTTP client. This is achieved as a result of the file being sent in blocks, with the servlet waiting if the connection to the HTTP client becomes congested. Although the servlet may be forced to wait before it can send more data, any other jobs in the event system will still be serviced, including other HTTP requests.
Logging of Requests
By default no information is logged about requests. If you wish to log what requests are being made against your application using the HTTP servlet framework, you need to set the environment variable "OTCLIB_HTTPLOGCHANNEL" to the name of the log channel to record the information on. The environment variable needs to be set prior to the first request being received by the application through any instance of the HttpDaemon class.
1 dispatcher = netsvc.Dispatcher()
2 dispatcher.monitor(signal.SIGINT)
3
4 netsvc.mergeEnviron("OTCLIB_HTTPLOGCHANNEL", "")
5
6 daemon = netsvc.HttpDaemon(8000)
7 filesrvr = netsvc.FileServer(os.getcwd())
8 daemon.attach("/", filesrvr)
9 daemon.start()
10
11 dispatcher.run()
The format of the logged messages is the same as Apache web server common log file format except that no matter what version of HTTP is used, the url component of the request is always expanded to its complete form. That is, it will be prefixed with "http://hostname:port" as appropriate. Normally this would only be the case if the request originated with a client supporting HTTP/1.1 protocol and a full url had been supplied by the client.
If you do not want information about requests appearing in the default log file, but want to split out the logged messages into a distinct log file, or otherwise treat them in a special way, use a hidden log channel and create a user defined log channel to capture them.
