[volunteers] [revival] apache dead on svlug, restarting after this dumped state

Daniel Gimpelevich daniel at gimpelevich.san-francisco.ca.us
Thu May 5 16:01:26 PDT 2016

On Wed, 2016-05-04 at 18:25 -0700, Rick Moen wrote:
> Quoting Daniel Gimpelevich (daniel at gimpelevich.san-francisco.ca.us):
> > My interpretation of your intention is to wait until that system suffers
> > a catastrophic software failure without intervention, then build a new
> > system from scratch with significant downtime. This may be the path of
> > least resistance, or it may not.
> I've not been heroically stepping forward to build a replacement in
> advance, because frankly I have my own priorities, and I got tired of
> everyone else sitting around and letting me do everything.  In
> particular, I got tired of warning SVLUG people for many years that the
> host was in a perilous state and hearing zero response.  
> If nobody else cared enough to do _anything_, then when the machine
> failed and the mailing lists had 1-2 days of downtime while a
> replacement got built, I would have zero sympathy with anyone who
> complained.  I would say, 'I warned you for years, and you did
> absolutely fsck-all, so now there is downtime, and you will just have to
> live with it.  If you want to prevent this from happening again, get
> involved and stop punting all responsibility to
> someone-else-nobody-special-just-not-me.'
> In fact, by telling people I was backing up the mailing list rosters and
> mbox files (and best of the local scripts) and doing nothing else, 
> I pretty much announced publically that this would be the eventual
> course of action.
> > Sarah installed a fully functioning system outside the chroot, but the
> > public-facing services run within the chroot.
> Sounds right.  As I said, I took only a brief amount of time to look at
> the results of her work after she did it.  I've mostly forgotten what I
> saw.  I was just grateful for the work, and lacked time and energy to do
> much further.
> Earlier, I believe you said that only Mailman was in the chroot.  This
> did not accord with my (uncertain) memory, which was that Apache,
> Mailman, exim / sa-exim, and spamd (all public-facing services except
> sshd) all are now running in the chroot.  I'm not sure which of the two 
> things you'd said is correct, just Mailman in the chroot or all
> public-facing services in the chroot (which could not include sshd).
> I'm thinking it's the latter -- but the point is that you've said both
> at different times.  Maybe you should be careful about what you touch
> until (among other things) you are sure.
> All of this has been yet another time sink for me.  I don't know how I
> can be clearer about this:  I cannot handle yet more gratuitous time
> sinks. 
> > This presents opportunity to establish sandboxed replacement
> > services on the host outside the chroot without affecting the
> > public-facing ones and then gracefully transition them over when they
> > are ready. There's no need to do this right away, but the possibility is
> > there.
> I don't know what you mean by 'sandboxed replacement services', and 
> why and how such replacement services should be sandboxed.  I'm really
> not clear on what you have in mind, at all.
> Here's my nightmare scenario, Daniel:  The lists.svlug.org host goes
> mysteriously nonfunctional following a period of time when you've been
> messing around with it.  I inquire with you about a precise,
> account of what you did and when, and it turns out none exists.  I ask
> if you have backout procedures for everything you did, and you don't
> have those either.  Everything devolves into a long discussion where
> nothing is clear and nothing gets resolved.  After a huge amount of
> wasted time, I end up having to construct a replacement host from backup
> data.  
> You keep making strange suggestions (like a retroactive ChangeLog for
> lists.svlug.org constructed from shell history) and saying strange
> things I cannot parse (like 'sandboxed replacement services on the host
> outside the public-facing ones'), and doing alarming things like
> screwing with /etc/hosts (or /etc/hostname, or whatever it was) 
> but not telling me that until after you've chewed up my time trying to
> diagnose a problem you probably _caused_ by doing that.
> You suggest that Apache falling over was unrelated to your screwing with
> /etc/hosts or /etc/hostname because Apache received a USR1 signal a few
> minutes before, even though that's a non-sequitur as USR1 merely signals
> Apache to do a graceful restart.
> All of this, and I end up spending more and more time talking to you
> about these things with little to show for it except increased anxiety
> that you are at risk of blowing up the machine.
> Here, a possible solution:  I'll go totally hands-off of
> lists.svlug.org.  You manage it.   If the machine falls over or starts
> doing screwball things, I'll refer people to you and say you've promised
> to handle it.  If you want that, say yes, and I'll stop worrying about
> the state of the machine, as it's no longer my problem.
> Either way, I can't keep sinking more and more time into this.
> _______________________________________________
> revival mailing list
> revival at linuxmafia.com
> http://linuxmafia.com/mailman/listinfo/revival

Moving from the out-of-band list back here. Here is the script currently
starting services on that host:
> #!/bin/sh
> CHROOT=/var/old-svlug-rfs/
> # TODO(tim): Can this be put in /etc/fstab or will that conflict with startup
> # of udev/systemd/whatever?
> mount --bind /dev $CHROOT/dev
> for service in spamassassin exim4 mailman apache cron; do
>   chroot $CHROOT /etc/init.d/$service start
> done
This tells us exactly what's being used from the original system. The
exim is from 2005, which is pretty suboptimal, but the apache is not
only from 2004, it's from the 1.3 branch, which I doubt many other
people run anymore. I won't even bother to check what version of PHP is
still hooked into that.
Proof of concept time: I've now gone ahead and installed lighttpd
outside the chroot, listening on port 8080. I hooked it into the
existing mailman instance with a few lines of C code. I'd like to invite
folks to try it out and kick the tires. If it's deemed robust, apache
can be decommissioned and lighttpd moved to ports 80 and 443, being
"better, faster, cheaper" and with no more PHP. With that done, similar
things could be done with the other services…

More information about the volunteers mailing list