[volunteers] Runaway spamassassin ran the system out of RAM, again

Rick Moen rick at linuxmafia.com
Tue Jan 5 18:13:58 PST 2010


Here's one thing that can happen when the lists.svlug.org machine 
suffers RAM-exhaustion.  Message continues following the quoted 
advisory from cron (reporting that one of Mailman's gateway scripts 
could not run for lack of allocatable memory).

----- Forwarded message from Cron Daemon <root at svlug.org> -----

From: Cron Daemon <root at svlug.org>
To: mailman at svlug.org
Date: Tue, 05 Jan 2010 17:39:47 -0800
X-Spam-Status: No, score=-1.1 required=4.0 tests=AWL,BAYES_40,SPF_HELO_PASS,
	SPF_PASS autolearn=ham version=3.2.5
Subject: Cron <mailman at svlug> /usr/bin/python -S
	/var/local/mailman/cron/gate_news

Traceback (most recent call last):
  File "/var/local/mailman/cron/gate_news", line 284, in ?
    main()
  File "/var/local/mailman/cron/gate_news", line 264, in main
    process_lists(lock)
  File "/var/local/mailman/cron/gate_news", line 199, in process_lists
    mlist = MailList.MailList(listname, lock=0)
  File "/var/local/mailman/Mailman/MailList.py", line 129, in __init__
    self.Load()
  File "/var/local/mailman/Mailman/MailList.py", line 625, in Load
    dict, e = self.__load(file)
  File "/var/local/mailman/Mailman/MailList.py", line 598, in __load
    dict = loadfunc(fp)
IOError: [Errno 12] Cannot allocate memory

----- End forwarded message -----


root at svlug:~# uptime
 18:01:01 up 94 days, 18:25,  1 user,  load average: 20.43, 16.16, 14.24
root at svlug:~# ps auxw | grep spamd
nobody    3192  0.0  0.5 28556 2824 ?        S    17:05   0:01 /usr/sbin/spamd --username=nobody --max-children 24 --helper-home-dir=/var/spool/spamassassin/ --nouser-config -d
--pidfile=/var/spool/spamassassin/spamd.pid
nobody    3193  1.0  3.0 29308 15596 ?       R    17:05   0:36 spamd child
nobody    3194  1.4 14.7 92164 75924 ?       R    17:05   0:47 spamd child
nobody    3197  0.2  0.6 30024 3556 ?        S    17:05   0:08 spamd child
nobody    3199  0.0  0.7 29936 3604 ?        S    17:05   0:01 spamd child
nobody    3200  1.7 22.9 169144 117988 ?     D    17:05   0:59 spamd child
nobody    3201  1.9 17.1 168608 88228 ?      R    17:05   1:04 spamd child
nobody    3203  0.0  0.6 29264 3576 ?        S    17:05   0:01 spamd child
nobody    3205  0.0  0.6 29784 3412 ?        S    17:05   0:01 spamd child
nobody    3206  0.0  3.1 29568 16036 ?       S    17:05   0:02 spamd child
nobody    3208  0.0  3.0 30036 15700 ?       S    17:05   0:01 spamd child
nobody    3210  0.7 14.1 87112 72572 ?       R    17:05   0:23 spamd child
nobody    3211  0.0  0.6 29700 3536 ?        S    17:05   0:01 spamd child
nobody    3213  0.0  2.9 29232 14968 ?       S    17:05   0:01 spamd child
nobody    3214  1.7  0.7 168612 3616 ?       S    17:05   1:00 spamd child
nobody    3216  1.7  0.7 168468 3616 ?       S    17:05   0:58 spamd child
nobody    4182  0.8  3.1 28956 16188 ?       R    17:38   0:11 spamd child
nobody    4183  0.8  3.2 28964 16504 ?       R    17:38   0:11 spamd child
nobody    4184  0.8  3.1 28956 16096 ?       R    17:38   0:11 spamd child
nobody    4195  0.8  3.0 30064 15908 ?       R    17:39   0:10 spamd child
nobody    4236  0.8  3.1 28956 16012 ?       R    17:40   0:10 spamd child
nobody    4237  0.9  3.3 31372 17184 ?       R    17:40   0:11 spamd child
nobody    4238  0.0  0.4 28556 2536 ?        S    17:40   0:00 spamd child
nobody    4239  0.0  0.4 28556 2540 ?        S    17:40   0:00 spamd child
nobody    4437  0.0  0.5 28556 3076 ?        S    17:52   0:00 spamd child
root      4755 40.0  0.1  1828  952 pts/0    S    18:01   0:00 grep spamd
root at svlug:~# /etc/init.d/exim4 stop
Stopping MTA: exim4.
root at svlug:~# /etc/init.d/cron stop
Stopping periodic command scheduler: cron.
root at svlug:~#
root at svlug:~# killall -9 spamd
root at svlug:~# uptime                   
 18:05:20 up 94 days, 18:29,  1 user,  load average: 2.26, 11.59, 13.18
root at svlug:~#

Notice that spamd was grabbing huge chunks of RAM and system load was rising
rapidly until I killed the spamd processes, at which point the system
load dropped like a rock.

Stopping the cron daemon is a necessary procedure in these cases, because
Marc Merlin has a cronjob that checks frequently to ensure that 
essential daemons are running them and respawns them -- ordinarily
A Good Thing, but I want to have manual control at the moment.


root at svlug:~# uptime                   
 18:05:20 up 94 days, 18:29,  1 user,  load average: 2.26, 11.59, 13.18
root at svlug:~# /etc/init.d/spamassassin start
Starting SpamAssassin Mail Filter Daemon: spamd.
root at svlug:~# 


I sit back and watch "top" for a while.  Seems pretty stable, for now.
So, we start up the other stuff, again.

root at svlug:~# /etc/init.d/exim4 start
Starting MTA: exim4.
root at svlug:~# /etc/init.d/cron start
Starting periodic command scheduler: cron.
root at svlug:~#

And, just to make sure that Mailman's qrunner is still running:

root at svlug:~# /etc/init.d/cron start
Starting periodic command scheduler: cron.
root at svlug:~# ps auxw | grep qrunner
mailman  22911  0.0  1.2  9340 6216 ?        S    00:02   0:31 /usr/bin/python /var/local/mailman/bin/qrunner --runner=BounceRunner:0:1 -s
mailman  22916  0.0  1.2  9404 6524 ?        S    00:02   0:44 /usr/bin/python /var/local/mailman/bin/qrunner --runner=VirginRunner:0:1 -s
mailman  22917  0.0  0.3  7116 1940 ?        S    00:02   0:00 /usr/bin/python /var/local/mailman/bin/qrunner --runner=RetryRunner:0:1 -s
mailman  29703  0.2  1.3  9336 6876 ?        S    12:47   0:39 /usr/bin/python /var/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s
mailman  31527  0.1  0.4  7172 2372 ?        S    14:22   0:17 /usr/bin/python /var/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s
mailman  31560  0.2  1.5 10012 7756 ?        S    14:23   0:40 /usr/bin/python /var/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
mailman   4240  0.3  1.3  9012 7084 ?        S    17:40   0:06 /usr/bin/python /var/local/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s
mailman   4241  0.2  0.5  7100 2956 ?        S    17:40   0:03 /usr/bin/python /var/local/mailman/bin/qrunner --runner=CommandRunner:0:1 -s
root      5414  0.0  0.1  1868  952 pts/0    S    18:09   0:00 grep qrunner
root at svlug:~# 


Though, it would be prudent to restart it, given that it was yelping about 
being unable to run its gateway scripts for lack of RAM.

root at svlug:~# /etc/init.d/mailman stop
root at svlug:~# ps auxw | grep qrunner
root      6017  0.0  0.1  1824  924 pts/0    S    18:11   0:00 grep qrunner
root at svlug:~# /etc/init.d/mailman start
root at svlug:~# 




More information about the volunteers mailing list