[volunteers] Before I clobber that spam from our archive

Rick Moen rick at linuxmafia.com
Tue Dec 12 23:46:09 PST 2006


Let's look at some aspects of that spam that just hit the list:

> To: <web-team at lists.svlug.org>

Hmm, unlike the usual, it wasn't addressed to "webmaster" exactly, but
rather to the equivalent form that is advertised on the bottom of every
one of our Web pages.

> X-Spam-Status: No, score=2.9 required=5.0

Yep.  Therein lies the tragedy.  Despite this being a particularly tough
type of garbage-mail to catch (gibberish text intended to untrain
Bayesian filters, we actually came close to programmatically detecting
it.  Let's see the significant parts of the SpamAssassin score:

> X-Spam-Report: * -3.3 ALL_TRUSTED Did not pass through any untrusted
> hosts

ALL_TRUSTED is, in effect, the SA test that killed us.

> 3.5 BAYES_80 BODY: Bayesian spam probability is 80 to 95%

Whereas, this part nearly put us over the top of our 5.0 score
requirement, especially along with

> 0.5 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words

and 

> 1.8 HTML_MESSAGE BODY: HTML included in message

The ALL_TRUSTED rule appears (or so I gather without spending time
reading SA docs) to mean:  mail was received directly as a single-hop
delivery directly from the original MTA, and that MTA's IP isn't in any
DNS blocklist.  Presumably, that _usually_ correlates strongly with mail
not being junk -- but not this time.

And where did it come from?  Ah, yes, this virus-infected Windows box
on a cable modem in the South:

> Received: from cpe-71-67-163-78.midsouth.res.rr.com
> ([71.67.163.78]:2851)

As I said, these are _really_ difficult to detect programmatically.

Which leaves non-programmatic detection.  OK, boys'n'girls, here's what
we're going to do for a while:  We're going to disable the ability for
non-subscribed addresses to post directly to this list.  Instead,
non-subscriber posts will be held for the listadmin.  Which means that
you won't be bothered by this stuff in the future, but I'll get to spend
additional time dealing with it.  NOTE:  Non-subscribed posts have been
deliberately allowed here until now, in order for SVLUG members and 
members of the public to reach us without being hassled.  I've explained
this many times here, but an offlist query suggests to me that I've
somehow failed to get through.

And yes, we _can_ improve the programmatic detection and rejection of
spam -- but _not until after system rebuild_.  (Nth explanation on that
point, too.)






More information about the volunteers mailing list