Servlet Objects

To make the most of the HTTP servlet framework it will be necessary to create your own servlets for interacting with your application. Servlets can be written to handle basic requests against a resource, or requests where form data is supplied. Special purpose servlets which process arbitrary content associated with a request may also be created. Having created a servlet, it can be integrated into an application by defining a custom HTTP server object, or by storing it as a file and using a plugin, in association with the file server object.

As the HTTP servlet framework is implemented on top of an event system and doesn't rely upon threads, it is necessary to be mindful of how servlets are implemented to avoid a situation where the code blocks. If the code does block it will effectively stop the whole application. The event system should therefore be used as appropriate where concurrency is required. If communication with service objects in a remote process is required to obtain data to satisfy a request, this will be essential.

Processing a Request

In order to implement your own HTTP servlet, you need to create a new class which derives from the HttpServlet class. If the request your HTTP servlet is to handle does not have any content associated with it, you will only need to override the processRequest() member function.

The processRequest() member function will be called immediately after the HTTP server object has returned a valid HTTP servlet. If the HTTP servlet doesn't need to process any content associated with a request, it will typically be able to generate a response straight away and the servlet can then be destroyed.

   1 class HttpServlet(netsvc.HttpServlet):
   2     def processRequest(self):
   3         if self.requestMethod() != "GET":
   4             self.sendError(400)
   5         else:
   6             self.sendResponse(200)
   7             self.sendHeader("Content-Type", "text/plain")
   8             self.endHeaders()
   9             self.sendContent("Hi there.")
  10             self.endContent()

The major member functions of the HTTP servlet class used to interrogate the details of the HTTP request are requestMethod(), requestPath() and queryString(). It is these methods you would use to determine what the HTTP servlet is to do. It may be the case however that it is only necessary to validate the type of request method. This would occur where the HTTP server object had already identified a resource against which a request was being made and supplied the handle for that resource when the HTTP servlet was created.

Having determined the validity or otherwise of a request, the HTTP servlet can do a number of things. In the event of an error the HTTP servlet can use the sendError() member function to generate an error response. The first argument to sendError() should be the appropriate HTTP response code. An optional second argument may also be supplied consisting of valid HTML text. This text will be included within the body of the HTML document generated by the sendError() member function.

If the request is valid, the HTTP servlet might instead generate its own response including any appropriate content. To start the response the sendResponse() member function must be called. The first argument to sendResponse() would typically be be "200", indicating a successful response. A HTTP servlet may if it wishes supply any valid HTTP response code here. In fact, the sendError() member function is merely a shorthand method for generating an error response and underneath actually uses the same functions as described here.

The HTTP servlet may now include any HTTP headers by calling sendHeader(). The arguments to sendHeader() should be the name of the header and its string value. Whether or not any HTTP headers are included, the member function endHeaders() must now be called.

To include content in a response the sendContent() member function is used. This may be called multiple times. When all content has been sent, the endContent() member function should be called. Calling the endContent() member function will have the affect of closing off the response and once the processRequest() member function returns, the servlet will able to be destroyed.

Persistent Connections

A persistent connection is one whereby the connection to the client can be maintained after a response has been sent. This allows a HTTP client to submit additional requests without the need to create a new connection. Negotiation of persistent connections between the HTTP client and server is managed by using special HTTP request and response headers.

Where possible the session manager will undertake to maintain persistent connections without you needing to take any special actions. This is done as a result of the session manager inserting on your behalf the special headers as appropriate when you call the endHeaders() member function.

If the client has requested a persistent connection and supplied a valid content length in the request headers, and you include a valid content length header in the response headers, the session manager will aim to maintain the connection. If you do not include a valid content length header in the response headers, or sendError() was used to generate a response, the connection will always be shutdown.

Note that when a HTTP client does send an additional request over the same connection, it will not be the same HTTP servlet instance that handles the request. Each request received will always be separately parsed, with the appropriate HTTP server object and servlet used each time.

Delaying a Response

The servlet framework is implemented on top of the event system. As a result, it is not mandatory that a complete response be generated by the processRequest() member function. Instead, the servlet could execute some action which would result in a callback at a later point in time. When that callback occurs, then it might complete the response.

   1 class HttpServlet(netsvc.HttpServlet, netsvc.Agent):
   2     def __init__(self, session):
   3         netsvc.HttpServlet.__init__(self, session)
   4         netsvc.Agent.__init__(self)
   5     def processRequest(self):
   6         if self.requestMethod() != "GET":
   7             self.sendError(400)
   8         else:
   9             self.sendResponse(200)
  10             self.sendHeader("Content-Type", "text/plain")
  11             self.endHeaders()
  12             self.startTimer(self.completeResponse, 10, "timeout")
  13     def completeResponse(self, tag):
  14         self.sendContent("Hi there.")
  15         self.endContent()

This is useful where the servlet needs to wait until data needed to formulate a response is available or where some form of time dependent server push mechanism is being implemented. Note however that special steps may be required in these situations to cope with a HTTP client prematurely closing the connection.

Destruction of Servlets

The destruction of a servlet can come about as a result of two situations. The first situation is where a servlet handles a requests and generates a response, whether that be successful or otherwise. The second situation is where the HTTP client closes the connection before the servlet has sent a complete response.

The fact that the actions of a servlet may need to be aborted before it has finished complicate the destruction of a servlet. This is because any callback which may have been set up will result in a reference count against the servlet object. The existance of such references will actually prevent the immediate destruction of the servlet object. If that reference is never deleted, the servlet object may never be destroyed.

All this means that it isn't sufficient for the servlet framework to delete its own reference to an instance of a HTTP servlet and expect that it will be destroyed. Instead, it is necessary to introduce a special member function to the HttpServlet class and require that any derived class extend it as appropriate to cancel any callbacks or otherwise cause external or circular references to the servlet to be deleted.

The name of this member function is destroyServlet(). The member function will be called when a HTTP client prematurely closes the connection. So that only one mechanism is employed to ensure a servlet is destroyed, the member function is also called subsequent to a servlet generating a complete response.

   1 class HttpServlet(netsvc.HttpServlet, netsvc.Agent):
   2     def __init__(self, session):
   3         netsvc.HttpServlet.__init__(self, session)
   4         netsvc.Agent.__init__(self)
   5     def processRequest(self):
   6         if self.requestMethod() != "GET":
   7             self.sendError(400)
   8         else:
   9             self.sendResponse(200)
  10             self.sendHeader("Content-Type", "text/plain")
  11             self.endHeaders()
  12             self.startTimer(self.completeResponse, 10, "timeout")
  13     def completeResponse(self, tag):
  14         self.sendContent("Hi there.")
  15         self.endContent()
  16     def destroyServlet(self):
  17         netsvc.HttpServlet.destroyServlet(self)
  18         netsvc.Agent.destroyAgent(self)

The first action of the derived version of the member function destroyServlet() should be to call the base class version of the function in the HttpServlet class. The member function should then do what is ever necessary to ensure that references to the servlet are deleted. If the servlet had been derived from the Agent class, this would include calling the destroyAgent() member function.

Processing Content

If the function of a HTTP servlet entails that the content associated with a request be processed in some way, it will be necessary to override the processContent() member function. The processContent() member function will only be called subsequent to processRequest() being called, and only provided that processRequest() hadn't already dealt with the request and sent a complete response.

As the means to determine how much content to expect is dependent on the specifics of a request, no attempt is made to first accumulate the content into one block. Instead, the processContent() member function will be called multiple times if appropriate, once for each block of data which is read in. It is up to the "processContent()" member function to accumulate the data or otherwise process it, until it determines that all content has been received.

Typically, how much content is expected will be dictated by the presence of a HTTP content length header, or by a MIME multipart message boundary string as specificed in a HTTP content type header. Either way, it is up to the specific implementation of a HTTP servlet to know what to expect and deal with it appropriately.

   1 class FormServlet(netsvc.HttpServlet):
   2     def __init__(self, session):
   3         netsvc.HttpServlet.__init__(self, session)
   4         self._content = []
   5         self._contentLength = 0
   6         self._environ = {}
   7     def processRequest(self):
   8         if self.requestMethod() not in ["GET", "POST"]:
   9            self.sendError(501,"Request method type is not supported.")
  10         elif self.requestMethod() == "POST" \
                  and self.contentLength() < 0:
  11             self.sendError(400,"Content length required for POST.")
  12         elif self.requestMethod() == "GET":
  13             self._headers = self.headers()
  14             self._environ["REQUEST_METHOD"] = self.requestMethod()
  15             self._environ["QUERY_STRING"] = self.queryString()
  16             self._headers["content-type"] = \
                    "application/x-www-form-urlencoded"
  17             try:
  18                 form = cgi.FieldStorage(headers=self._headers, \
                      environ=self._environ, keep_blank_values=1)
  19                 self.processForm(form)
  20             except:
  21                 netsvc.logException()
  22                 self.shutdown()
  23         elif self.contentLength() == 0:
  24             self.processContent("")
  25     def processContent(self, content):
  26         self._content.append(content)
  27         self._contentLength = self._contentLength + len(content)
  28         if self._contentLength >= self.contentLength():
  29             self._headers = self.headers()
  30             self._environ["REQUEST_METHOD"] = self.requestMethod()
  31             self._content = string.join(self._content, "")
  32             self._content = self._content[:self.contentLength()]
  33             fp = StringIO.StringIO(self._content)
  34             try:
  35                 form = cgi.FieldStorage(headers=self._headers, \
                        environ=self._environ, keep_blank_values=1, fp=fp)
  36                 self.processForm(form)
  37             except:
  38                 netsvc.logException()
  39                 self.shutdown()
  40     def processForm(self, form):
  41         self.sendResponse(501)

Member functions which a HTTP servlet may find useful here are contentLength() and contentType(). The contentLength() member function returns an integer value corresponding to that defined by the HTTP content length header, or "-1" if no such field was provided. The contentType() member function returns the HTTP content type header. Note that this will include any supplied parameters so you will need to extract these yourself.

A HTTP servlet may also interrogate arbitrary headers using the member functions containsHeader() and header(). These respectively indicate if a header exists and return its value. The name of a header should always be given as a lower case string. All headers may be obtained as a Python dictionary using headers().

If a HTTP servlet encounters an internal error at any time, it may call the shutdown() member function to abort all processing of the request. This will cause the connection to the HTTP client to be closed immediately, discarding any data which hadn't yet been sent. The instance of the HTTP servlet will then subsequently be destroyed.

The Form Servlet

As processing of form data will be a common situation, an implementation of a form servlet is provided. This is called FormServlet. The implementation of this servlet is similar to the previous example except that it does additional processing to translate data from the types used by the FieldStorage class into standard Python lists and dictionaries. The name of the member function which you need to override to process the form is handleRequest().

   1 class LoginServlet(netsvc.FormServlet):
   2     def handleRequest(self):
   3         if self.containsField("user") and \
                  self.containsField("password"):
   4             user = self.field("user")
   5             password = self.field("password")
   6             if self.authenticateUser(user, password):
   7                 self.sendResponse(netsvc.REDIRECT_TEMPORARY)
   8                 self.sendHeader("Location", self.serverRoot())
   9                 self.endHeaders()
  10                 self.endContent()
  11             else:
  12                 self.sendError(400)
  13         else:
  14             self.sendError(400)
  15     def authenticateUser(self, user, password):
  16           # ...

The existance of a field can be determined by calling the containsField() member function. The member function field() can then be called to retrieve the value for the field. All fields which have been set can be obtained as a dictionary using the fields() member function.

Slow HTTP Clients

The HTTP servlet framework does not use multithreading but is layered on top of an event system. This fact means that it is not possible for a HTTP servlet to block, as doing so would block the whole process and stop anything else from running. For this reason, a HTTP servlet does not have direct access to the socket connection associated with a HTTP client. Instead, a HTTP servlet in sending data back to a HTTP client is effectively queueing the data for deliverly.

If the HTTP client is slow in reading data from a socket connection, the server side of the socket connection could effectively block. The underlying framework used to manage a socket connection will detect this, and will only send data over a socket connection when such a condition would not occur. A consequence of the queuing mechanism however is that any data will first be added to a queue and will only be sent after the servlet has returned.

For a small response this would not be a problem, but if the content associated with a response is large, the size of the process would grow dramatically if all data is queued at once. To avoid this, it is important that if sending large responses that they be sent in parts. Further, a HTTP servlet should suspend sending of further data when the socket connection would block, as this would again only serve to grow the amount of queued data and thus the size of the process.

To monitor changes in the state of the socket connection, a HTTP servlet should call the member function monitorCongestion(), passing a callback function. The callback function supplied will be called when writing data to a socket connection would effectively block and also subsequently when the socket has cleared. These changes in state can be used to suspend and subsequently resume sending of data.

   1 class TestServlet(netsvc.FormServlet):
   2     def __init__(self, session):
   3         netsvc.FormServlet.__init__(self, session)
   4         self._batch = None
   5         self._total = None
   6         self._count = 0
   7         self._job = netsvc.Job(self.generateContent)
   8     def destroyServlet(self):
   9         FormServlet.destroyServlet(self)
  10         self._job.cancel()
  11         self._job = None
  12     def handleRequest(self):
  13         if not self.containsField("batch") or \
                  not self.containsField("total"):
  14             self.sendError(400)
  15         else:
  16             try:
  17                 self._batch = int(self.field("batch"))
  18                 self._total = int(self.field("total"))
  19             except:
  20                 self.sendError(400)
  21             else:
  22                 self.sendResponse(200)
  23                 self.sendHeader("Content-Type", "text/plain")
  24                 self.endHeaders()
  25                 self.monitorCongestion(self.clientCongestion)
  26                 self._job.schedule(netsvc.IDLE_JOB)
  27     def generateContent(self):
  28         content = []
  29         for i in range(0, self._batch):
  30             self._count = self._count + 1
  31             content.append(string.zfill(self._count, 60))
  32         content.append("")
  33         self.sendContent(string.join(content, "\n"))
  34         self._total = self._total - 1
  35         if self._total <= 0:
  36             self.ignoreCongestion()
  37             self.endContent()
  38         else:
  39             self.flushContent()
  40             self._job.schedule(netsvc.IDLE_JOB)
  41     def clientCongestion(self, status, pending):
  42         if status == netsvc.CONNECTION_CLEARED:
  43             self._job.reset()
  44             self._job.schedule(netsvc.IDLE_JOB)
  45         elif status == netsvc.CONNECTION_BLOCKED:
  46             self._job.cancel()

When a HTTP servlet no longer wishes to monitor the status of the socket connection the member function ignoreCongestion() can be called. Although not absolutely necessary, it is good practice to always call this just prior to calling the member function endContent() to close off the response.

Note that the Python wrapper around the C++ implementation of the HTTP servlet class performs buffering of content and will only pass content onto the C++ implementation when a set amount has been exceeded or the end of content has been indicated. If you suspend sending of further data, so that a HTTP client will see content produced so far, you may wish to flush out any buffered data by calling the flushContent() member function.


CategoryOSE

OSE/Python/ServletObjects (last edited 2006-09-14 10:59:08 by GrahamDumpleton)