[svlug] Procmail (spam) rule for junk tag gaps?

Karsten M. Self kmself at ix.netcom.com
Wed May 14 11:54:02 PDT 2003

I'm getting spam slipping through spamassassin formatted as HTML email
with junk tags, eg:

    P<k0kaymf1bqxcoz2>en<kh544d1za2fui>is En<kzemikh20yhnqa2>larg<krv7w8h3p9
    maz>eme<kbmd107sgr8u>nt Pi<kagxhc6btb55l1n>ll On The Ma<kz69gfa28awdh>rk

...that's a typical "enlargement" spam message.

Are there any procmail geniuses who could give a tip on how to filter
same?  The mail has a very high tag-to-text ratio, and the tags seem not
to have much/any whitespace.  Hmm... 

One source of inspiration is the Chinese character filters I'm using
(the original site is now offline), example:

# To allow _more_ high-bit chars, *decrease* the weight for high-bit lines.
# To allow _fewer high-bit chars, *increase* the weight for high-bit lines.
# Weight is 1/(percent high-bit), e.g.:  1/(0.05) = 20.
# Arbitrarally require message to be at least 3200 bytes to trip filter
# (to exclude short messages w/funky sigs).  This is about 4 lines of
# text.

* > 3200
* -1^1 .
*  2^1 =[0-9A-F][0-9A-F]
* 10^1 [ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿]
* 10^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 10^1 =[A-F][0-9A-F]

(The above includes extended charset characters and may not display

This uses weighting to count characters in a message and only trip if
the total amount of high-characterset characters exceeds a minimum.


