[volunteers] (forw) Re: [revival] exim4 dead on svlug, restarting after this dumped state

Rick Moen rick at linuxmafia.com
Sat Jul 9 14:48:43 PDT 2016


OK, as Daniel already revealed on this mailing list a few months ago, 
linuxmafia.com also hosts a mailing list called 'revival' that is
used as an out-of-band discussion forum for, e.g., times when SVLUG's
own forums don't work.

Further to what is mentioned below, I am documenting the backing out
(for now) of Sarah's suggested fix to Decorate.py in
site-docs/RecentChanges .

And now I am completely out of time on this for today.


----- Forwarded message from Rick Moen <rick at linuxmafia.com> -----

Date: Sat, 9 Jul 2016 14:44:34 -0700
From: Rick Moen <rick at linuxmafia.com>
To: revival at linuxmafia.com
Subject: Re: [revival] exim4 dead on svlug,
	restarting after this dumped state
Organization: If you lived here, you'd be $HOME already.

Quoting Daniel Gimpelevich (daniel at gimpelevich.san-francisco.ca.us):

> OK, but currently, posts to test at lists.svlug.org do not go out, nor into
> the archive. AFAICT, all mailing list functionality has been AWOL since
> at least yesterday. My cursory intervention was after the fact and did
> not address this. It was an emergency reaction to the fact that even SSH
> was malfunctioning, which would make any investigation much more
> difficult.

Well, shit.  I honestly, seriously don't have time to deal with this.

You could try backing out the change to a Python module I made at Sarah
Newman's suggestion.  You could try rebooting the host.  Other than
that, I don't know how to proceed.

What does 'SSH was malfunctioning' mean?  FFS, Daniel, that doesn't tell
me anything.  You are giving me useless, non-information.  Instead of 
attempting diagnosis and attempting to take useful steps, you are just
making cryptic statements.  That is not helpful.

I've just sshed into the host.  I immediately notice that Mailman's
qrunner process isn't running at all.

Partial shell session, starting with most of "ps auwx" output:

root      1463  0.0  0.0   2412   516 hvc0     Ss+  Jul08   0:00 /sbin/getty -L hvc0 9600 linux
root      1465  0.0  0.0   4640   580 tty1     Ss+  Jul08   0:00 /sbin/getty -8 38400 tty1
root      1500  0.0  0.0      0     0 ?        S    Jul08   0:00 [kauditd]
111       2414  0.0  0.0   8688  1456 ?        Ss   08:01   0:00 /usr/sbin/exim4 -bd -q30m
root      2725  0.0  0.1   2328  2324 ?        SLs  08:19   0:03 /usr/sbin/ntpd
root      8489  0.0  0.0      0     0 ?        S    19:59   0:00 [kworker/2:2]
nobody    9059  0.0  1.2  27972 25364 ?        Ss   21:05   0:00 /usr/sbin/spamd --username=nobody --max-children 24 --helper-home-dir=/var/spool/spamassassin/ --n
nobody    9060  0.0  1.2  28736 25984 ?        S    21:05   0:00 spamd child
nobody    9061  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9062  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9063  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9064  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9065  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9066  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9067  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9068  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9069  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9070  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9071  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9072  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9073  0.0  1.1  27972 23492 ?        S    21:05   0:00 spamd child
nobody    9074  0.0  1.1  27972 23492 ?        S    21:05   0:00 spamd child
nobody    9075  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9076  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9077  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9078  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9079  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9080  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9081  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9082  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
nobody    9083  0.0  1.1  27972 23496 ?        S    21:05   0:00 spamd child
root      9140  0.3  0.1  11144  3656 ?        Ss   21:14   0:00 sshd: rick [priv]
rick      9158  0.0  0.0  11144  1852 ?        S    21:14   0:00 sshd: rick at pts/0
rick      9159  0.7  0.1   6628  2880 pts/0    Ss   21:14   0:00 -bash
rick      9178  0.0  0.0   5184  1088 pts/0    R+   21:15   0:00 ps auxw
root     26897  0.0  0.0      0     0 ?        S    04:34   0:01 [kworker/u8:0]
root     27233  0.0  0.0      0     0 ?        S    05:01   0:00 [kworker/3:0]
root     28656  0.0  0.0      0     0 ?        S    05:08   0:00 [kworker/u8:1]
syslog   28664  0.0  0.2  33536  4176 ?        Ssl  05:08   0:01 rsyslogd
root     28686  0.0  0.0      0     0 ?        S    05:08   0:00 [kworker/0:2]
root     28796  0.0  0.0      0     0 ?        S    05:17   0:00 [kworker/2:0]
www-data 29233  0.0  0.1   6524  3172 ?        S    05:28   0:06 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
rick at lists:~$ su -
Password: 
root at lists:~# chroot /var/old-svlug-rfs/
lists:/# /etc/init.d/mailman status
lists:/# /etc/init.d/mailman start 
lists:/# ps auxw | grep qrunner
mailman   9243  5.2 99.9  8088 589505315 ?   S    14:20   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
mailman   9244  0.3 99.9  6640 589505315 ?   S    14:20   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=BounceRunner:0:1 -s
mailman   9245  0.5 99.9  6796 589505315 ?   S    14:20   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=CommandRunner:0:1 -s
mailman   9246  2.4 99.9  7908 589505315 ?   S    14:20   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s
mailman   9247  0.3 99.9  6596 589505315 ?   S    14:20   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s
mailman   9248  6.7 99.9  8256 589505315 ?   S    14:20   0:01 /usr/bin/python /var/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s
mailman   9249  1.2 99.9  8584 589505315 ?   S    14:20   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=VirginRunner:0:1 -s
mailman   9250  0.3 99.9  6516 589505315 ?   S    14:20   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=RetryRunner:0:1 -s
root      9531  0.0 99.9  1672 589505315 ?   S    14:20   0:00 grep qrunner
lists:/#

Instantly, exim starts grinding out mail (see 2313 processes further down 
in this shell transcript), and host starts going into swap:

lists:/# free -m
             total       used       free     shared    buffers     cached
Mem:          2012       1244        767          0        127         67
-/+ buffers/cache:       1049        962
Swap:          511          0        511
lists:/# free -m
             total       used       free     shared    buffers     cached
Mem:          2012       1309        703          0          1         63
-/+ buffers/cache:       1244        768
Swap:          511          6        505
lists:/# free -m
             total       used       free     shared    buffers     cached
Mem:          2012       1329        683          0          0         64
-/+ buffers/cache:       1264        747
Swap:          511         21        490
lists:/#

Obviously, exim is having some problem, and it's related to what's being fed to it by Mailman.

lists:/# /etc/init.d/exim stop
Stopping MTA: exim4.
lists:/# ps auxw | grep exim | wc -l
   2313
lists:/# killall exim
exim: no process killed
lists:/# killall exim4
lists:/# ps auxw | grep exim | wc -l
      1
lists:/# /etc/init.d/spamassassin stop
Stopping SpamAssassin Mail Filter Daemon: spamd.
lists:/# /etc/init.d/mailman stop
lists:/# 

Backing out the patch to Mailman .

lists:/# cd /var/local/mailman/Mailman/Handlers
lists:/var/local/mailman/Mailman/Handlers# ls -al Decorate.*
ls: unrecognized prefix: rs
ls: unparsable value for LS_COLORS environment variable
-rw-r--r--    1 root     mailman      9576 Jul  8 00:17 Decorate.py
-rw-r--r--    1 root     mailman      9476 Jul  8 00:15 Decorate.py.ORIG
-rw-r--r--    1 mailman  mailman      5739 Jul  8 00:20 Decorate.pyc
lists:/var/local/mailman/Mailman/Handlers# mv Decorate.py Decorate.py.ATTEMPTED-2016-07-09
lists:/var/local/mailman/Mailman/Handlers# mv Decorate.py.ORIG Decorate.py
lists:/var/local/mailman/Mailman/Handlers# 

Restart services mailman, spamassassin, and exim.

lists:/var/local/mailman/Mailman/Handlers# /etc/init.d/mailman start
lists:/var/local/mailman/Mailman/Handlers# /etc/init.d/spamassassin start
Starting SpamAssassin Mail Filter Daemon: /espamd.
lists:/var/local/mailman/Mailman/Handlers# /etc/init.d/exim start
Starting MTA: /usr/sbin/exim4 already running.
lists:/var/local/mailman/Mailman/Handlers# ps auxw | grep exim | wc -l
      2
lists:/var/local/mailman/Mailman/Handlers# ps auxw | grep exim        
111      12392  0.0 99.9  8688 589505315 ?   S    14:28   0:00 /usr/sbin/exim4 -bd -q30m
root     12766  0.0 99.9  1672 589505315 ?   S    14:40   0:00 grep exim
lists:/var/local/mailman/Mailman/Handlers# killall exim4
lists:/var/local/mailman/Mailman/Handlers# ps auxw | grep exim 
root     12769  0.0 99.9  1668 589505315 ?   S    14:40   0:00 grep exim
lists:/var/local/mailman/Mailman/Handlers# /etc/init.d/exim start
Starting MTA: 
exim4.
lists:/var/local/mailman/Mailman/Handlers# 

See if exim's blowing up.

lists:/var/local/mailman/Mailman/Handlers# ps auxw | grep exim | wc -l
      5
lists:/var/local/mailman/Mailman/Handlers#

So far, so... at least not blowing up.


Bigger problem:  I'm not kidding when I say I relly just don't have time for this.
My time is fully committed, and I do not have time to also babysit problems with
the SVLUG servers.  

How about you step up to the plate some more?



_______________________________________________
revival mailing list
revival at linuxmafia.com
http://linuxmafia.com/mailman/listinfo/revival

----- End forwarded message -----



More information about the volunteers mailing list