[volunteers] Runaway spamassassin ran the system out of RAM, again
Rick Moen
rick at linuxmafia.com
Tue Jan 5 18:13:58 PST 2010
Here's one thing that can happen when the lists.svlug.org machine
suffers RAM-exhaustion. Message continues following the quoted
advisory from cron (reporting that one of Mailman's gateway scripts
could not run for lack of allocatable memory).
----- Forwarded message from Cron Daemon <root at svlug.org> -----
From: Cron Daemon <root at svlug.org>
To: mailman at svlug.org
Date: Tue, 05 Jan 2010 17:39:47 -0800
X-Spam-Status: No, score=-1.1 required=4.0 tests=AWL,BAYES_40,SPF_HELO_PASS,
SPF_PASS autolearn=ham version=3.2.5
Subject: Cron <mailman at svlug> /usr/bin/python -S
/var/local/mailman/cron/gate_news
Traceback (most recent call last):
File "/var/local/mailman/cron/gate_news", line 284, in ?
main()
File "/var/local/mailman/cron/gate_news", line 264, in main
process_lists(lock)
File "/var/local/mailman/cron/gate_news", line 199, in process_lists
mlist = MailList.MailList(listname, lock=0)
File "/var/local/mailman/Mailman/MailList.py", line 129, in __init__
self.Load()
File "/var/local/mailman/Mailman/MailList.py", line 625, in Load
dict, e = self.__load(file)
File "/var/local/mailman/Mailman/MailList.py", line 598, in __load
dict = loadfunc(fp)
IOError: [Errno 12] Cannot allocate memory
----- End forwarded message -----
root at svlug:~# uptime
18:01:01 up 94 days, 18:25, 1 user, load average: 20.43, 16.16, 14.24
root at svlug:~# ps auxw | grep spamd
nobody 3192 0.0 0.5 28556 2824 ? S 17:05 0:01 /usr/sbin/spamd --username=nobody --max-children 24 --helper-home-dir=/var/spool/spamassassin/ --nouser-config -d
--pidfile=/var/spool/spamassassin/spamd.pid
nobody 3193 1.0 3.0 29308 15596 ? R 17:05 0:36 spamd child
nobody 3194 1.4 14.7 92164 75924 ? R 17:05 0:47 spamd child
nobody 3197 0.2 0.6 30024 3556 ? S 17:05 0:08 spamd child
nobody 3199 0.0 0.7 29936 3604 ? S 17:05 0:01 spamd child
nobody 3200 1.7 22.9 169144 117988 ? D 17:05 0:59 spamd child
nobody 3201 1.9 17.1 168608 88228 ? R 17:05 1:04 spamd child
nobody 3203 0.0 0.6 29264 3576 ? S 17:05 0:01 spamd child
nobody 3205 0.0 0.6 29784 3412 ? S 17:05 0:01 spamd child
nobody 3206 0.0 3.1 29568 16036 ? S 17:05 0:02 spamd child
nobody 3208 0.0 3.0 30036 15700 ? S 17:05 0:01 spamd child
nobody 3210 0.7 14.1 87112 72572 ? R 17:05 0:23 spamd child
nobody 3211 0.0 0.6 29700 3536 ? S 17:05 0:01 spamd child
nobody 3213 0.0 2.9 29232 14968 ? S 17:05 0:01 spamd child
nobody 3214 1.7 0.7 168612 3616 ? S 17:05 1:00 spamd child
nobody 3216 1.7 0.7 168468 3616 ? S 17:05 0:58 spamd child
nobody 4182 0.8 3.1 28956 16188 ? R 17:38 0:11 spamd child
nobody 4183 0.8 3.2 28964 16504 ? R 17:38 0:11 spamd child
nobody 4184 0.8 3.1 28956 16096 ? R 17:38 0:11 spamd child
nobody 4195 0.8 3.0 30064 15908 ? R 17:39 0:10 spamd child
nobody 4236 0.8 3.1 28956 16012 ? R 17:40 0:10 spamd child
nobody 4237 0.9 3.3 31372 17184 ? R 17:40 0:11 spamd child
nobody 4238 0.0 0.4 28556 2536 ? S 17:40 0:00 spamd child
nobody 4239 0.0 0.4 28556 2540 ? S 17:40 0:00 spamd child
nobody 4437 0.0 0.5 28556 3076 ? S 17:52 0:00 spamd child
root 4755 40.0 0.1 1828 952 pts/0 S 18:01 0:00 grep spamd
root at svlug:~# /etc/init.d/exim4 stop
Stopping MTA: exim4.
root at svlug:~# /etc/init.d/cron stop
Stopping periodic command scheduler: cron.
root at svlug:~#
root at svlug:~# killall -9 spamd
root at svlug:~# uptime
18:05:20 up 94 days, 18:29, 1 user, load average: 2.26, 11.59, 13.18
root at svlug:~#
Notice that spamd was grabbing huge chunks of RAM and system load was rising
rapidly until I killed the spamd processes, at which point the system
load dropped like a rock.
Stopping the cron daemon is a necessary procedure in these cases, because
Marc Merlin has a cronjob that checks frequently to ensure that
essential daemons are running them and respawns them -- ordinarily
A Good Thing, but I want to have manual control at the moment.
root at svlug:~# uptime
18:05:20 up 94 days, 18:29, 1 user, load average: 2.26, 11.59, 13.18
root at svlug:~# /etc/init.d/spamassassin start
Starting SpamAssassin Mail Filter Daemon: spamd.
root at svlug:~#
I sit back and watch "top" for a while. Seems pretty stable, for now.
So, we start up the other stuff, again.
root at svlug:~# /etc/init.d/exim4 start
Starting MTA: exim4.
root at svlug:~# /etc/init.d/cron start
Starting periodic command scheduler: cron.
root at svlug:~#
And, just to make sure that Mailman's qrunner is still running:
root at svlug:~# /etc/init.d/cron start
Starting periodic command scheduler: cron.
root at svlug:~# ps auxw | grep qrunner
mailman 22911 0.0 1.2 9340 6216 ? S 00:02 0:31 /usr/bin/python /var/local/mailman/bin/qrunner --runner=BounceRunner:0:1 -s
mailman 22916 0.0 1.2 9404 6524 ? S 00:02 0:44 /usr/bin/python /var/local/mailman/bin/qrunner --runner=VirginRunner:0:1 -s
mailman 22917 0.0 0.3 7116 1940 ? S 00:02 0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=RetryRunner:0:1 -s
mailman 29703 0.2 1.3 9336 6876 ? S 12:47 0:39 /usr/bin/python /var/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s
mailman 31527 0.1 0.4 7172 2372 ? S 14:22 0:17 /usr/bin/python /var/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s
mailman 31560 0.2 1.5 10012 7756 ? S 14:23 0:40 /usr/bin/python /var/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
mailman 4240 0.3 1.3 9012 7084 ? S 17:40 0:06 /usr/bin/python /var/local/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s
mailman 4241 0.2 0.5 7100 2956 ? S 17:40 0:03 /usr/bin/python /var/local/mailman/bin/qrunner --runner=CommandRunner:0:1 -s
root 5414 0.0 0.1 1868 952 pts/0 S 18:09 0:00 grep qrunner
root at svlug:~#
Though, it would be prudent to restart it, given that it was yelping about
being unable to run its gateway scripts for lack of RAM.
root at svlug:~# /etc/init.d/mailman stop
root at svlug:~# ps auxw | grep qrunner
root 6017 0.0 0.1 1824 924 pts/0 S 18:11 0:00 grep qrunner
root at svlug:~# /etc/init.d/mailman start
root at svlug:~#
More information about the volunteers
mailing list