[volunteers] SVLUG Mail Working?
Rick Moen
rick at linuxmafia.com
Tue Sep 18 00:25:28 PDT 2007
Quoting Mark Weisler (mark at weisler-saratoga-ca.us):
> So, would all "production" [A] be on Linode/gruyere with replication to brie
> so that, in the event of a problem, we could re-point SVLUG activities to
> brie?
>
> [A] Production services being, I think, mail, web, and DNS.
To make a really long story short, despite then-VP Micah Dowty's opinion
that all of SVLUG's Internet services can fit within the original Linode
limits (3072 MB disk, 80 MB RAM), upon reflection that struck me and
other sysadmins as utterly impossible. But it _did_ seem quite feasible
if you kept mail services (SMTP, Mailman + related Web host) elsewhere
(DNS names "lists.svlug.org" and "svlug.org"). Linode/gruyere would
still serve everything else, that being primary DNS nameservice + main
HTML stuff (DNS names "www.svlug.org" and "ns1.svlug.org").
I really _tried_ to fit everything into 3072 disk + 80 RAM; thus our use
of the RAM-thrifty NSD nameserver and ligthttpd w/FastCGI Moin. But my
guess is that Micah had simply no idea who much competent anti-spam
add-ons have bloated modern MTAs. We recently saw a _runaway_ spamd effect
(last Wed/Thu), but even a well-running Exim4 + spfd + spamd setup
consumes a couple hundred MB. I know this because it does exactly that,
chez moi.[0] If you were to tell me the RAM consumption can be cut even
further, I would believe you, but would maintain it's difficult until
shown otherwise.
So, Linode bumping gruyere up to 8192 disk + 256 RAM helps, but is not
enough to run _all_ services on greyere including Exim4/Mailman -- and
you yourself (as Mr. Backup) know about SVLUG's disk usage, 90%+ of
which is the mailing list archives, which also grow monotonically.
I'm not sure if we still fit under 8GB disk (including a sparse Ubuntu
Server software load), but, even if we do, the list archives would
eventually outgrow that.
Fortunately, past SVLUG sysadmins were careful to divide up functionality
by DNS names! This is a major win, because it means we _can_ move only
the SMTP and Mailman + related Web host functions wherever we want
without interrupting service: Just repoint "lists.svlug.org" and
"svlug.org" in our DNS. Done. Near-instant switchover, modulo some lag
(up to 3 days) at sites that use outdated cached DNS data.[1]
Migration stage 3 could logically be to arrange at least one failover
location (site) to house replicas of the main Web content. "brie" would
certainly be the first obvious place for that. Again, in the event of
Linode/gruyere becoming unavailable, just repoint the DNS.
(As Heather and I've mentioned previously, MoinMoin now supports live
replication between Moin instances, so future replica sites can be
configured to remain current without manual maintenance.)
Stage 4 (no longer "migration") could be to invent a better recovery
plan for loss of the SMTP and Mailman + related Web host. Currently and
through the near future, our plan is: build or borrow a new Debian box,
copy your most recent data backup to it, adjust system + services
configuration, point DNS to it. We lose permanently all mail + archived
mailing list posts since your most recent backup.
There are ways to improve on that, but we should solve our more pressing
problems first. We're still on Stage 1.
[0] Current SMTP-related estimated RAM usage on uncle-enzo.linuxmafia.com:
Exim4: 32 MB. spfd: 14 MB. spamd: 124 MB. Total: 170 MB. (Total
system RAM is 256 MB.) This is the sum of each process's VSZ (virtual
size) less RSS (resident set size), plus in each case an estimate of the
shared RSS common to all of them, e.g., shared by all running instances
of Exim4. RAM usage does tend to spike sometimes, especially with spamd
(SpamAssassin). Note: My site has spamd's max-child parameter set to 2
child instances, as opposed to 32 on www.svlug.org as set by Marc
Merlin. I'm able to get away with only two -- barely -- because I have
better front-end spam rejection within Exim4 (smaller, faster) before
the handoff to spamd. Hence, _much_ less spam makes it through to
spamd. Hence, two instances suffice.
[1] Many ISPs' nameservers are set to deliberately ignore domains' DNS
TTL (time to live) values, to force use of cached data even when it's
marked obsolete, in order to save bandwidth on DNS lookups. Also, some
other software such as some versions of nscd (nameservice caching
daemon) do likewise automatically, on account of bugs.
More information about the volunteers
mailing list