[volunteers] lists.svlug.org ran out of disk again

Rick Moen rick at linuxmafia.com
Sun Feb 3 01:53:34 PST 2019


lists.svlug.org was declining inbound mail with a 45x SMTP reject
message saying (paraphrased) 'Low on disk, try later.'

~ $ ssh lists.svlug.org
Enter passphrase for key '/home/rick/.ssh/id_dsa':

Last login: Sun Feb  3 06:30:29 2019 from linuxmafia.com
rick at lists:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       48G   45G   16M 100% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
/dev           1005M  4.0K 1005M   1% /var/old-svlug-rfs/dev
tmpfs           202M  348K  202M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none           1008M     0 1008M   0% /run/shm
none            100M     0  100M   0% /run/user
rick at lists:~$


Well, that's not good.

rick at lists:/var$ su -
Password:
root at lists:~# cd /var/log
root at lists:/var/log# /tmp/largest20
  53012066 /var/log/lighttpd/access.log.1
  45070384 /var/log/lighttpd/access.log
  31918941 /var/log/mail.log.1
  13318382 /var/log/installer/cdebconf/templates.dat
  12225408 /var/log/btmp.1
   8804208 /var/log/lighttpd/access.log.11.gz
   8601521 /var/log/kern.log.1
   6834427 /var/log/kern.log
   6072242 /var/log/lighttpd/access.log.3.gz
   5829453 /var/log/auth.log.1
   5426044 /var/log/lighttpd/access.log.4.gz
   5271133 /var/log/lighttpd/access.log.5.gz
   5123176 /var/log/lighttpd/access.log.2.gz
   5060890 /var/log/mail.log
   4941869 /var/log/lighttpd/access.log.7.gz
   4936195 /var/log/lighttpd/access.log.6.gz
   4731806 /var/log/lighttpd/access.log.9.gz
   4401558 /var/log/auth.log
   4300787 /var/log/mail.log.4.gz
   4296428 /var/log/lighttpd/access.log.12.gz
root at lists:/var/log#


To get a little immediate relief, I deleted 53012066
/var/log/lighttpd/access.log.1, though that obviously was not the source
of problems.

root at lists:/var/old-svlug-rfs/var# cd ../home
root at lists:/var/old-svlug-rfs/home# /tmp/largest20
 167796374 /var/old-svlug-rfs/home/httpd/html/html-2008-06-16.tar.gz
  58235301 /var/old-svlug-rfs/home/lisa/web.tgz
  10834155 /var/old-svlug-rfs/home/log/apache/access.log.1
   9496492 /var/old-svlug-rfs/home/log/procmeter/cpu-idle
   9487209 /var/old-svlug-rfs/home/log/procmeter/cpu-user
   8905533 /var/old-svlug-rfs/home/log/procmeter/cpu-sys
   8810474 /var/old-svlug-rfs/home/log/exim4/mainlog.1
   7945242 /var/old-svlug-rfs/home/log/apache/access.log.0
   7717515 /var/old-svlug-rfs/home/log/procmeter/cpu-nice
   7717501 /var/old-svlug-rfs/home/log/procmeter/context
   7152248 /var/old-svlug-rfs/home/log/exim4/mainlog
   6855403
/var/old-svlug-rfs/home/log/exim4/rejectedembeddedmimeattachement.log
   5674930 /var/old-svlug-rfs/home/log/xntpd
   4349844 /var/old-svlug-rfs/home/log/ippl/all.log.2.gz
   2821825 /var/old-svlug-rfs/home/log/exim4/rejectlog.10.gz
   2721516 /var/old-svlug-rfs/home/log/exim4/rejectlog.9.gz
   2646661 /var/old-svlug-rfs/home/log/exim4/rejectlog.8.gz
   2473860 /var/old-svlug-rfs/home/log/exim4/mainlog.10.gz
   2398333 /var/old-svlug-rfs/home/log/apache/access.log.18.gz
   2133174 /var/old-svlug-rfs/home/log/apache/access.log.33.gz
root at lists:/var/old-svlug-rfs/home# rm /var/old-svlug-rfs/home/httpd/html/html-2008-06-16.tar.gz
root at lists:/var/old-svlug-rfs/home# rm /var/old-svlug-rfs/home/lisa/web.tgz
root at lists:/var/old-svlug-rfs/home#

Pruned /var/old-svlug-rfs/var/local/mailman/backup of pre-2019 logs.
(That's one tree we know grows like Topsy.)

root at lists:/var/old-svlug-rfs# /tmp/largest20

2097152000 /var/old-svlug-rfs/var/tmp/swap
1069543424 /var/old-svlug-rfs/dev/core
1069543424 /var/old-svlug-rfs/proc/kcore
 333295335 /var/old-svlug-rfs/var/local/mailman/archives/private/mailman-owner.mbox/mailman-owner.mbox
 215058314 /var/old-svlug-rfs/var/local/mailman/archives/private/svlug.mbox/svlug.mbox
 166354944 /var/old-svlug-rfs/var/spool/spamassassin/auto-whitelist
  48810367 /var/old-svlug-rfs/var/lib/apt/lists/http.us.debian.org_debian_dists_unstable_main_source_Sources
  46863606 /var/old-svlug-rfs/var/lib/apt/lists/http.us.debian.org_debian_dists_unstable_main_binary-i386_Packages
  44615098 /var/old-svlug-rfs/var/lib/apt/lists/http.us.debian.org_debian_dists_testing_main_binary-i386_Packages
  41857024 /var/old-svlug-rfs/var/spool/spamassassin/bayes_seen
  40473778 /var/old-svlug-rfs/var/lib/apt/lists/http.us.debian.org_debian_dists_testing_main_source_Sources
  37011283 /var/old-svlug-rfs/var/local/mailman/logs/post
  37011283 /var/old-svlug-rfs/proc/1343/fd/8
  37011283 /var/old-svlug-rfs/proc/1343/task/1343/fd/8
  32857266 /var/old-svlug-rfs/var/local/mailman/archives/private/volunteers.mbox/volunteers.mbox
  29141370 /var/old-svlug-rfs/var/src/kernel-source-2.4.22.tar.bz2
  23187274 /var/old-svlug-rfs/var/local/mailman/logs/smtp
  23187274 /var/old-svlug-rfs/proc/1343/task/1343/fd/7
  23187274 /var/old-svlug-rfs/proc/1343/fd/7
  22956114 /var/old-svlug-rfs/var/lib/dpkg/available

Deleted /var/old-svlug-rfs/var/tmp/swap
Truncated /var/old-svlug-rfs/var/local/mailman/archives/private/mailman-owner.mbox/mailman-owner.mbox
Removed package catalogues from /var/old-svlug-rfs/var/lib/apt/lists/
Deleted /var/old-svlug-rfs/var/src/kernel-source-2.4.22.tar.bz2


root at lists:/var/old-svlug-rfs/var# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       48G   42G  3.5G  93% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
/dev           1005M  4.0K 1005M   1% /var/old-svlug-rfs/dev
tmpfs           202M  360K  202M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none           1008M     0 1008M   0% /run/shm
none            100M     0  100M   0% /run/user
root at lists:/var/old-svlug-rfs/var#

Ah, I just figured out where one problem is:
root at lists:/home/www-backup# du -sh .
3.7G    .
root at lists:/home/www-backup#

I have no recollection of exactly how /home/www-backup is supposed to be
used.  I'm going to disregard that for now, as I'm time-limited and 
looking for bigger fish.

There's 23GB in /var/old-svlug-rfs/var, dividing up as:
root at lists:/var/old-svlug-rfs/var# du -sh *
28K     account
2.1M    backups
25M     cache
4.0K    dhcp
71M     lib
2.4G    local
4.0K    lock
0       log
4.0K    lost+found
0       mail
180K    run
21G     spool
24M     src
16K     state
268K    tmp
0       www
724K    yp
root at lists:/var/old-svlug-rfs/var#

Oh, really?  21GB in a spool directory?  Tell me more!

root at lists:/var/old-svlug-rfs/var/spool# du -sh *
0       cron
24K     cron.debian
0       exim
22M     exim4
4.0K    lintian
19M     mail
21G     sa-exim
202M    spamassassin
4.0K    ttysnoop
root at lists:/var/old-svlug-rfs/var/spool# 


root at lists:/var/old-svlug-rfs/var/spool/sa-exim# du -sh *
156M    old
108K    SAerrorsave
5.4G    SApermreject
999M    SAspamaccept
1.2G    SAteergrube
5.3G    SAtempreject
38M     SAtimeoutsave
7.1G    tuplets
root at lists:/var/old-svlug-rfs/var/spool/sa-exim#



https://serverfault.com/questions/46526/can-i-remove-emails-from-var-spool-sa-exim-sapermreject

Q:  I have a lot of emails in /var/spool/sa-exim/SApermreject. Can I
remove these emails or will this lead to some kind of nasty spam
problem? Or is this just a history of all the emails that were never
received by anyone?

A: They are just saved copies of incoming mails that have reached your
specified permreject score. You can safely delete these. If you don't
want to save them, see the setting SApermrejectSavCond in your
sa-exim.conf.


root at lists:/var/old-svlug-rfs/var/spool/sa-exim# cd SApermreject
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject# ls -al
total 41568
drwxrwx---  5 111 111     4096 Mar  3  2005 .
drwxr-xr-x 10 111 111     4096 Dec  8  2007 ..
drwxrwx---  2 111 111     4096 Mar  3  2005 cur
drwxrwx---  2 111 111 42500096 Feb  3 08:40 new
drwxrwx---  2 111 111     4096 Mar  3  2005 tmp
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject# du -sh *
4.0K    cur
5.4G    new
4.0K    tmp
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject# cd new
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject/new# ls -al

(ls output showed a very, very large number of exim message spool files
going back to the year 2005.  This command was a foolish one for me to
type, in context.  I should have known better.  After wasted time, I
finally regained control.)

Quickest way to delete the contents of a directory full of a maassive
numbre of files is using Perl.

root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject/new#
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject/new# perl -e 'for(<*>){((stat)[9]<(unlink))}'
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject/new# ls -al
total 41556
drwxrwx--- 2 111 111 42500096 Feb  3 08:48 .
drwxrwx--- 5 111 111     4096 Mar  3  2005 ..
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject/new#


root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SApermreject/new# cd ../../SAtempreject
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAtempreject# ls -al
total 47404
drwxrwx---  5 111 111     4096 Mar  3  2005 .
drwxr-xr-x 10 111 111     4096 Dec  8  2007 ..
drwxrwx---  2 111 111     4096 Mar  3  2005 cur
drwxrwx---  2 111 111 48472064 Feb  3 08:50 new
drwxrwx---  2 111 111     4096 Mar  3  2005 tmp
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAtempreject# cd new
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAtempreject/new# perl -e 'for(<*>){((stat)[9]<(unlink))}'
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAtempreject/new#


root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAtempreject/new#cd ../../SAteergrube/
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAteergrube# ls -al
total 9220
drwxrwx---  5 111 111    4096 Mar  3  2005 .
drwxr-xr-x 10 111 111    4096 Dec  8  2007 ..
drwxrwx---  2 111 111    4096 Mar  3  2005 cur
drwxrwx---  2 111 111 9408512 Nov 23 03:48 new
drwxrwx---  2 111 111    4096 Mar  3  2005 tmp
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAteergrube# cd new
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAteergrube/new# perl -e 'for(<*>){((stat)[9]<(unlink))}'
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAteergrube/new#

root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAteergrube/new# cd ../../tuplets/
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/tuplets# ls -al | more

Hmm, directories with numbers for names.  This is used by spamd for greylisting.
The docs for sa-exim include a prototype greylistclean.cron file that demonstrates 
how one might automatically clean this mess.  For now, I'm following one of the 
example cron lines and just deleting all tuples older than two weeks.  (This command
took about 15 minutes, understandably.)

root at lists:/var/old-svlug-rfs/var/spool/sa-exim/tuplets# find . -type f -mtime +14 -print0 | xargs -r0 rm


Also another from greylistclean.cron to remove empty directories, which 
again took a long time:


root at lists:/var/old-svlug-rfs/var/spool/sa-exim/tuplets# find . -type d -print0 | xargs -r0 rmdir 2>/dev/null

I also did the Perl mass-delete in SAspamaccept.

/var/old-svlug-rfs/var/spool/sa-exim/SAspamaccept
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAspamaccept# ls -al
total 10484
drwxrwx---  5 111 111     4096 Mar  7  2005 .
drwxr-xr-x 10 111 111     4096 Dec  8  2007 ..
drwxrwx---  2 111 111     4096 Mar  7  2005 cur
drwxrwx---  2 111 111 10702848 Jan 30 16:32 new
drwxrwx---  2 111 111     4096 Mar  7  2005 tmp
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAspamaccept# cd new
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAspamaccept/new# perl -e 'for(<*>){((stat)[9]<(unlink))}'
root at lists:/var/old-svlug-rfs/var/spool/sa-exim/SAspamaccept/new#


This now looks _significantly_ better:

root at lists:/var/old-svlug-rfs/var/spool/sa-exim# du -sh *
156M    old
108K    SAerrorsave
41M     SApermreject
11M     SAspamaccept
9.1M    SAteergrube
49M     SAtempreject
38M     SAtimeoutsave
1019M   tuplets
root at lists:/var/old-svlug-rfs/var/spool/sa-exim# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       48G   23G   23G  51% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
/dev           1005M  4.0K 1005M   1% /var/old-svlug-rfs/dev
tmpfs           202M  360K  202M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none           1008M     0 1008M   0% /run/shm
none            100M     0  100M   0% /run/user
root at lists:/var/old-svlug-rfs/var/spool/sa-exim#




I don't know how happy spamd (SpamAssassin) will be after that mass
deletion, so better restart it in the chroot.

root at lists:/var/old-svlug-rfs/var/spool/sa-exim# chroot /var/old-svlug-rfs/
lists:/# /etc/init.d/spamassassin stop
Stopping SpamAssassin Mail Filter Daemon: spamd.
lists:/# ps auxw | grep spamd
Unknown HZ value! (90) Assume 100.
root     17916  0.0 99.9  1664 589505315 ?   S    01:45   0:00 grep spamd
lists:/# /etc/init.d/spamassassin start
Starting SpamAssassin Mail Filter Daemon: spamd.
lists:/#


I'm going to stop now, having not expected to have to surrender hours 
of my time to this.

Oh, if anyone wants /tmp/largest20, it's one of my favourite little tricks:



#!/usr/bin/perl -w
# You can alternatively just do:  
# find . -xdev -type f -print0 | xargs -r0 ls -l | sort -rn -k +5 | head -20
# Sometimes also handy:  du -cks * | sort -rn
use File::Find;

@ARGV = $ENV{ PWD } unless @ARGV;
find ( sub { $size{ $File::Find::name } = -s if -f; }, @ARGV );
@sorted = sort { $size{ $b } <=> $size{ $a } } keys %size;
splice @sorted, 20 if @sorted > 20;
printf "%10d %s\n", $size{$_}, $_ for @sorted




More information about the volunteers mailing list