[svlug] Procmail (spam) rule for junk tag gaps?

Karsten M. Self kmself at ix.netcom.com
Wed May 14 11:54:02 PDT 2003


I'm getting spam slipping through spamassassin formatted as HTML email
with junk tags, eg:

    P<k0kaymf1bqxcoz2>en<kh544d1za2fui>is En<kzemikh20yhnqa2>larg<krv7w8h3p9
    maz>eme<kbmd107sgr8u>nt Pi<kagxhc6btb55l1n>ll On The Ma<kz69gfa28awdh>rk
    e<kzk2vrrpjp65k1

...that's a typical "enlargement" spam message.

Are there any procmail geniuses who could give a tip on how to filter
same?  The mail has a very high tag-to-text ratio, and the tags seem not
to have much/any whitespace.  Hmm... 

One source of inspiration is the Chinese character filters I'm using
(the original site is now offline), example:


# To allow _more_ high-bit chars, *decrease* the weight for high-bit lines.
# To allow _fewer high-bit chars, *increase* the weight for high-bit lines.
# Weight is 1/(percent high-bit), e.g.:  1/(0.05) = 20.
# Arbitrarally require message to be at least 3200 bytes to trip filter
# (to exclude short messages w/funky sigs).  This is about 4 lines of
# text.

:0BD
* > 3200
* -1^1 .
*  2^1 =[0-9A-F][0-9A-F]
* 10^1 [ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿]
* 10^1 [ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß]
* 10^1 [àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ]
* 10^1 =[A-F][0-9A-F]


(The above includes extended charset characters and may not display
properly).

This uses weighting to count characters in a message and only trip if
the total amount of high-characterset characters exceeds a minimum.

Peace.

-- 
Karsten M. Self <kmself at ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
   Sick of mal-formed websites?  A stylesheet to override poor design:
     http://twiki.iwethey.org/twiki/bin/view/Main/UserContentCSS




More information about the svlug mailing list