[svlug] Apache internals: persistence of classes

Ian Kluft ikluft at thunder.sbay.org
Thu Jan 1 21:27:21 PST 2004


>From: Ivan Sergio Borgonovo <mail at webthatworks.it>
>I'd like to write some C++ classes (or C) to be able to write
>something like this in my PHP code
[... keep persistent user data across requests, track number of open
browser windows...]

>Classes should be persistent. So I don't want Apache load my classes
>(code and data) for each request.
>
>Something like this:
>http://search.cpan.org/~jbaker/Apache-Session-1.54/Session.pm
>but without the need to serialize, place in a DB, retrieve.
>
>What should I read to know:
>a) if it is worth to write/use my proposed system
>b) if I'm reinventing the wheel
>c) how to write such thing
>?
>
>I read Apache modules docs but it seems that Apache memory management
>will destroy allocated structures as soon as the request has been
>satisfied.

I can't answer if it's worth writing.  I'm not aware of exactly what
you're looking for existing out there.  But there are some similarities.

Apache is one system of lots of web servers, answering lots of requests.
It divides itself by processes and threads in order to handle requests
concurrently.  It does not store "state" information between requests -
that's simply the nature of HTTP.  You can't escape that aspect of it.

If you want to work on the server side of an HTTP request, you're going
to have to save any state-tracking data where concurrent processes can
reach it.

Using a database as in the Apache::Session Perl module that you found
is one approach, and probably the most reliable.

The other way would be to have a daemon (background server) process which
all the Apache processes contact when dealing with that resource.  That's
how Java servlets are handled.  And that seems to be more what you're
looking for.

Be careful of security issues when making a separate server process that
the web server can connect to - you then have to deal with making sure
only the web server does connect to it.  (It's a big deal to make sure
no one is trying to spoof as your intended users.)  Take a look at
Apache's Jakarta (Java-based projects) site for ideas.
   http://jakarta.apache.org/

You may be able to use the existing protocols for the servlet engines.
I haven't kept up with this for some time.  At the 1998 Perl Conference
I presented a Perl module which acted as a back-end for the Apache JServ
protocol.  It presented itself to the Apache HTTPD as if it was the Java
servlet engine.  Except that it ran Perl code.  (It had grown out of code
used to test the Apache JServ protocol.)  But they've changed the protocol
many times since then so such an effort would have to be started again
from scratch now.

Things to watch for:
* Try to use as much existing software as your application can.  It all
  saves work for you.
* If you're thinking of making a product out of this in the future, make
  your design accomodate not just what you're doing now, but wherever you
  can imagine this might later be installed in.  Then again, if you're
  just experimenting to learn this, take the shortest route to working code.
* If you use Unix sockets, you've limited your connections to the local
  machine.  It protects your server but won't work on a system of multiple
  web server hosts.
* If you use IP sockets, you have to make sure your connections are from
  valid clients.  (Careful on the terminology here - *your* client is a
  web server.  But that fact that it's a server is irrelevant for your
  protocol definition.)  To verify your client, you have traditional
  network security choices:
  * Simply limiting the IP address who can connect to you is not enough -
    it can be spoofed.  However, if you know you'll run this on a separate
    network that only your client and server machines are on, that would
    allow you to relax some restrictions.
  * You may have a shared secret, such as a password.  But it shouldn't be
    sent in the clear where a sniffer can (and will) get it.  Instead, use
    it and information about each connection to create a hash which the
    server can verify without passing the secret itself.  Include a timestamp
    inside and outside the hash so that valid packets have expiration times
    and can't be re-used by an attacker.  If you don't need to encrypt your
    connection, this digest method is preferable because of lower overhead.
  * You may use public key cryptography to encrypt or sign messages.
    This wasn't available to us for Open Source code in the late 90's
    since certain patents hadn't expired yet.  But it's definitely a
    choice now.  This avoids needing to pass along your authentication
    secrets as part of the connection.  If you need to encrypt your
    connection, this is probably your only choice.
* Even though you would seem to have re-serialized the control stream,
  you still have to keep in mind that you're answering requests to a
  stateless protocol.  After each request, you don't know that there will
  ever be another one for that connection.  You don't know that requests
  will come in chronological order.  Failure to heed this warning will
  result in hung servers and lots of additional frustration beyond normal
  debugging.




More information about the svlug mailing list