[volunteers] What broke mail to 'president@svlug.org'
Rick Moen
rick at linuxmafia.com
Wed Aug 3 02:33:45 PDT 2016
Marc pointed out that mail to president at svlug.org broke because the
lists.svlug.org host's hostname had been changed.
While looking for that (and this is a digression from discussion
of lists.svlug.org), I found this similar matter:
May 6, on this mailing list, I wrote (addressing Daniel):
3. Looking at another of your recent gruyere site-docs/ChangeLog
entries:
Tu 2016-05-03 Daniel Gimpelevich <daniel at gimpelevich.san-francisco.ca.us>
Removed vestiges of via.net from nameserver, and updated hostnames.
Rebooted host and added IPv6 access.
What does 'updated hostnames' mean? What does 'added IPv6 access' mean?
Please amend your 2016-05-03 to make it so intelligent readers can
understand what the hell you did.
Daniel, seriously, you need to document what you changed, not just
handwave about your having changed _something_.
I'm having to chew up a lot of time just figuring out what you've been
doing, and this really is not good.
I note with partial approval that Daniel _somewhat_ improved the 2016-05-03
site-docs/ChangeLog entry on gruyere (www.svlug.org), to say:
Tu 2016-05-03 Daniel Gimpelevich <daniel at gimpelevich.san-francisco.ca.us>
The A records for {ftp,mail,svlug}.svlug.{org,net,com} and
lists.svlug.{net,com} still had the via.net address. Updated all
records to reflect the current state of affairs, introducing
explicit A records for the cheese-themed hostnames. Corrected
/etc/hosts not to rely on the public IP address, since this is a
DHCP client. In the Linode web UI, set RDNS for IPv4 and
provisioned IPv6, rebooting host. Made lighttpd listen on IPv6.
For the record, this _still_ does not say what you did, Daniel. Your
log entries need to suffice to tell other admins how to reverse your
changes. Above _still_ doesn't do that.
Getting back to host lists.svlug.org:
May 5, I'd said to Daniel:
Here's my nightmare scenario, Daniel: The lists.svlug.org host goes
mysteriously nonfunctional following a period of time when you've been
messing around with it. I inquire with you about a precise,
account of what you did and when, and it turns out none exists. I ask
if you have backout procedures for everything you did, and you don't
have those either. Everything devolves into a long discussion where
nothing is clear and nothing gets resolved. After a huge amount of
wasted time, I end up having to construct a replacement host from
backup data.
You keep making strange suggestions (like a retroactive ChangeLog for
lists.svlug.org constructed from shell history) and saying strange
things I cannot parse (like 'sandboxed replacement services on the host
outside the public-facing ones'), and doing alarming things like
screwing with /etc/hosts (or /etc/hostname, or whatever it was)
but not telling me that until after you've chewed up my time trying to
diagnose a problem you probably _caused_ by doing that.
You suggest that Apache falling over was unrelated to your screwing
with /etc/hosts or /etc/hostname because Apache received a USR1 signal a
few minutes before, even though that's a non-sequitur as USR1 merely
signals Apache to do a graceful restart.
All of this, and I end up spending more and more time talking to you
about these things with little to show for it except increased anxiety
that you are at risk of blowing up the machine.
Earlier that day, Daniel said:
> Daniel then dropped into conversation the missing context. He said
> he had felt that the main entry in /etc/hosts was wrong and had edited it
> to what he felt was the correct value. (I think he said the 'new'
> value was lists.svlug.org, while the 'old' server had had svlug.svlug.org.)
> ISTR he said the failed message had followed his doing that, but that
> he'd been able to restart the process (which one, can't remember; this
> was late at night) after changing the /etc/hosts entry back.
No. The hostname file carried over from the retired box remains
svlug.svlug.org, but the equivalent file Sarah put on the new host had
lists.svlug.org, which mailman now requires regardless of what's in the
chroot.
What's in the /var/old-svlug-rfs/ chroot is the significant bit, here,
as that's what Exim and Mailman see.
There, /etc/hosts is an appalling, obsolete mess that ought to be
severely pruned and fixed. I'm going to defer that work for now,
and just mention that among the wrong things in it is:
157.22.20.228 brie.svlug.org brie
157.22.20.227 svlug.org svlug
157.22.20.227 svlug.svlug.org www
157.22.20.227 www.svlug.org www
157.22.20.227 lists.svlug.org lists
Those were via.net IPs.
lists:/# cat /etc/hostname
svlug.svlug.org
lists:/#
hostname(1) returns 'lists'.
The 'dig' command returns a correct result from the DNS.
lists:/# host svlug.org
svlug.org A 64.62.190.98
lists:/#
The 'ping' command relies on the (wrong) information in /etc/hosts, and
goes to the obsolete IP.
lists:/# ping svlug.org
PING svlug.org (157.22.20.227): 56 data bytes
[times out]
I don't know who thought it was a good idea to put a bunch of static
mappings into /etc/hosts that are redundant to the DNS, but doing
that and then _never updating_ is obviously bad.
I commented out the '157.22.20.227 svlug.org svlug' line, and
'ping svlug.org' now does the correct thing (based on the DNS).
I am going back and commenting out all of these:
64.62.190.98 gruyere.svlug.org gruyere
157.22.20.228 brie.svlug.org brie
157.22.20.227 svlug.org svlug
157.22.20.227 svlug.svlug.org www
157.22.20.227 www.svlug.org www
157.22.20.227 lists.svlug.org lists
12.234.173.250 gargamel.merlins.org
12.234.173.251 gargamel.merlins.org
12.234.173.252 gargamel.merlins.org
12.234.173.253 gargamel.merlins.org
12.234.173.254 gargamel.merlins.org
Until a couple of days ago, /etc/mail/domains/localdomains contained:
# Domains that are accepted locally
svlug.net
lists.svlug.net
svlug.com
lists.svlug.com
I amended that to:
# Domains that are accepted locally
svlug.net
lists.svlug.net
svlug.com
lists.svlug.com
svlug.org
lists.svlug.org
Absence of the line 'svlug.org' _combined with_ the inability
to find 'svlug.org' at all on account of the erroneous /etc/hosts
entry is what sabotaged inbound mail to 'president at svlug.org' --
and doubtless also other point-of-contact aliases such as
'webmaster at svlug.org'.
So,
I've now tested that the 'webmaster' alias works (indeed, over-tested:
I was puzzled that my test messages weren't in the admin queue, which
was because they went _through_ to mailing list web-team; do'h!).
Other aliases in /etc/exim4/aliases (in the chroot):
postmaster: svlug-admin-folks
root: svlug-admin-folks
publicity: publicity-team
publicity-team: volunteers
president: volunteers
vice-president: volunteers
speaker: speakers
speakers: volunteers
webmaster: web-team
webteam: web-team
cvs-watcher: web-team
web-team: web-team at lists.svlug.org
rsvp: officers
info: officers
av-crew: volunteers
svcs: :fail: not here anymore
socialnet-spam: socialnet-spam at lists.svlug.org
all-volunteers: volunteers
admin: svlug-admin-folks
svlug-admin-folks: dmarti,rick
mailman: owner-mailman
mailer-daemon: postmaster
owner-mailman: svlug-admin-folks
dmarti: dmarti at zgp.org
rick: rick at linuxmafia.com
Anyway, _that_ was what broke mail to all $FOO at svlug.org addresses
(but not $FOO at lists.svlug.org ones).
I am updating site-docs/RecentChanges on lists.svlug.org
to account for recent fixes.
More information about the volunteers
mailing list