[volunteers] What broke mail to 'president@svlug.org'

Rick Moen rick at linuxmafia.com
Wed Aug 3 02:33:45 PDT 2016


Marc pointed out that mail to president at svlug.org broke because the
lists.svlug.org host's hostname had been changed.


While looking for that (and this is a digression from discussion
of lists.svlug.org), I found this similar matter:


May 6, on this mailing list, I wrote (addressing Daniel):

  3.  Looking at another of your recent gruyere site-docs/ChangeLog
  entries:

  Tu 2016-05-03   Daniel Gimpelevich <daniel at gimpelevich.san-francisco.ca.us>
             Removed vestiges of via.net from nameserver, and updated hostnames.
             Rebooted host and added IPv6 access.

  What does 'updated hostnames' mean?  What does 'added IPv6 access' mean?
  Please amend your 2016-05-03 to make it so intelligent readers can
  understand what the hell you did.

  Daniel, seriously, you need to document what you changed, not just
  handwave about your having changed _something_.

  I'm having to chew up a lot of time just figuring out what you've been
  doing, and this really is not good.


I note with partial approval that Daniel _somewhat_ improved the 2016-05-03
site-docs/ChangeLog entry on gruyere (www.svlug.org), to say:

Tu 2016-05-03   Daniel Gimpelevich <daniel at gimpelevich.san-francisco.ca.us>
           The A records for {ftp,mail,svlug}.svlug.{org,net,com} and
           lists.svlug.{net,com} still had the via.net address.  Updated all
           records to reflect the current state of affairs, introducing
           explicit A records for the cheese-themed hostnames.  Corrected
           /etc/hosts not to rely on the public IP address, since this is a
           DHCP client.  In the Linode web UI, set RDNS for IPv4 and
           provisioned IPv6, rebooting host.  Made lighttpd listen on IPv6.

For the record, this _still_ does not say what you did, Daniel.  Your
log entries need to suffice to tell other admins how to reverse your
changes.  Above _still_ doesn't do that.



Getting back to host lists.svlug.org:


May 5, I'd said to Daniel:

    Here's my nightmare scenario, Daniel:  The lists.svlug.org host goes
    mysteriously nonfunctional following a period of time when you've been
    messing around with it.  I inquire with you about a precise,
    account of what you did and when, and it turns out none exists.  I ask
    if you have backout procedures for everything you did, and you don't
    have those either.  Everything devolves into a long discussion where
    nothing is clear and nothing gets resolved.  After a huge amount of
    wasted time, I end up having to construct a replacement host from
    backup data.  

   You keep making strange suggestions (like a retroactive ChangeLog for
   lists.svlug.org constructed from shell history) and saying strange
   things I cannot parse (like 'sandboxed replacement services on the host
   outside the public-facing ones'), and doing alarming things like
   screwing with /etc/hosts (or /etc/hostname, or whatever it was) 
   but not telling me that until after you've chewed up my time trying to
   diagnose a problem you probably _caused_ by doing that.

   You suggest that Apache falling over was unrelated to your screwing
   with /etc/hosts or /etc/hostname because Apache received a USR1 signal a
   few minutes before, even though that's a non-sequitur as USR1 merely
   signals Apache to do a graceful restart.

   All of this, and I end up spending more and more time talking to you
   about these things with little to show for it except increased anxiety
   that you are at risk of blowing up the machine.


Earlier that day, Daniel said:

   > Daniel then dropped into conversation the missing context.  He said
   > he had felt that the main entry in /etc/hosts was wrong and had edited it
   > to what he felt was the correct value.  (I think he said the 'new'
   > value was lists.svlug.org, while the 'old' server had had svlug.svlug.org.)
   > ISTR he said the failed message had followed his doing that, but that
   > he'd been able to restart the process (which one, can't remember; this
   > was late at night) after changing the /etc/hosts entry back.

   No. The hostname file carried over from the retired box remains
   svlug.svlug.org, but the equivalent file Sarah put on the new host had
   lists.svlug.org, which mailman now requires regardless of what's in the
   chroot.


What's in the /var/old-svlug-rfs/ chroot is the significant bit, here, 
as that's what Exim and Mailman see.

There, /etc/hosts is an appalling, obsolete mess that ought to be
severely pruned and fixed.  I'm going to defer that work for now, 
and just mention that among the wrong things in it is:

157.22.20.228 brie.svlug.org	brie
157.22.20.227 svlug.org	svlug
157.22.20.227 svlug.svlug.org	www
157.22.20.227 www.svlug.org	www
157.22.20.227 lists.svlug.org	lists


Those were via.net IPs.

lists:/# cat /etc/hostname 
svlug.svlug.org
lists:/# 

hostname(1) returns 'lists'.


The 'dig' command returns a correct result from the DNS.

lists:/# host svlug.org
svlug.org           	A	64.62.190.98
lists:/# 

The 'ping' command relies on the (wrong) information in /etc/hosts, and 
goes to the obsolete IP.

lists:/# ping svlug.org
PING svlug.org (157.22.20.227): 56 data bytes
[times out]


I don't know who thought it was a good idea to put a bunch of static 
mappings into /etc/hosts that are redundant to the DNS, but doing 
that and then _never updating_ is obviously bad.

I commented out the '157.22.20.227 svlug.org svlug' line, and 
'ping svlug.org' now does the correct thing (based on the DNS).
I am going back and commenting out all of these:

64.62.190.98  gruyere.svlug.org gruyere
157.22.20.228 brie.svlug.org    brie
157.22.20.227 svlug.org        svlug
157.22.20.227 svlug.svlug.org   www
157.22.20.227 www.svlug.org     www
157.22.20.227 lists.svlug.org   lists

12.234.173.250  gargamel.merlins.org
12.234.173.251  gargamel.merlins.org
12.234.173.252  gargamel.merlins.org
12.234.173.253  gargamel.merlins.org
12.234.173.254  gargamel.merlins.org

Until a couple of days ago, /etc/mail/domains/localdomains contained:

# Domains that are accepted locally

svlug.net
lists.svlug.net
svlug.com
lists.svlug.com

I amended that to:

# Domains that are accepted locally

svlug.net
lists.svlug.net
svlug.com
lists.svlug.com
svlug.org
lists.svlug.org


Absence of the line 'svlug.org' _combined with_ the inability
to find 'svlug.org' at all on account of the erroneous /etc/hosts 
entry is what sabotaged inbound mail to 'president at svlug.org' --
and doubtless also other point-of-contact aliases such as 
'webmaster at svlug.org'.

So, 

I've now tested that the 'webmaster' alias works (indeed, over-tested: 
I was puzzled that my test messages weren't in the admin queue, which
was because they went _through_ to mailing list web-team; do'h!).

Other aliases in /etc/exim4/aliases (in the chroot):

postmaster: svlug-admin-folks
root: svlug-admin-folks
publicity:      publicity-team
publicity-team: volunteers
president: volunteers
vice-president: volunteers
speaker:        speakers
speakers:       volunteers
webmaster:      web-team
webteam:        web-team
cvs-watcher:    web-team
web-team:       web-team at lists.svlug.org
rsvp:           officers
info:           officers
av-crew:        volunteers
svcs:   :fail: not here anymore
socialnet-spam: socialnet-spam at lists.svlug.org
all-volunteers: volunteers
admin:          svlug-admin-folks
svlug-admin-folks: dmarti,rick
mailman: owner-mailman
mailer-daemon: postmaster
owner-mailman: svlug-admin-folks
dmarti: dmarti at zgp.org
rick: rick at linuxmafia.com


Anyway, _that_ was what broke mail to all $FOO at svlug.org addresses
(but not $FOO at lists.svlug.org ones).

I am updating site-docs/RecentChanges on lists.svlug.org 
to account for recent fixes.





More information about the volunteers mailing list