SpamAssassin is rules based, on a known corpus with rule modifications sent out ...

draebek · on Jan 22, 2011

> SpamAssassin is rules based, on a known corpus with rule modifications sent out from time to time. In order to get better detection, you can run something like dspam or crm-114 along with, which are statistics based, and with a short training period, can get close to gmail's accuracy.

FWIW I do train my SA installation (actually amavisd-new) with sa-learn.

> The other benefit to gmail is shared inoculation.

A very good point. My mail server has very, very few users.

On the other hand, I am using DNSBLs, Razor, and Pyzor through SA, so you might think I'd benefit more from that. (Which is not to say SA isn't blocking a ton of spam--it's just not doing as good as Google: one every six weeks or something from Gmail, a handful every day from SA.)

cd34 · on Jan 22, 2011

greylisting can also help.

I run greylisting -> dspam (toe) -> tmda for anything dspam flags and have had 1 missed spam this week. Looking through the tmda queue, I see no false positives. I have had false positives in the past, but, once they reply (similar to spamarrest), it lets them through. I do run a few DNSBLs, but, really haven't see much need to increase it. I'd say 80% of the spam I used to receive was eliminated with greylisting. Probably 60% of it now as spammers are starting to hack actual servers that will retry rather than sending mail from botnets.

LeonidasXIV · on Jan 22, 2011

I personally hate greylisting. I use IMAP IDLE to get notified on new mails immediately, so I don't want the MTAs to bounce mails around. Especially since SpamAssassin works so good.

What I do is the following: SpamAssassin rules + Bayes (currently about 200k mails trained, 25% of them spam) + URIBL. I think I could even tune that with Razor and/or Pyzor but I get to few spam to actually care.

Since that, I've given up hiding my mail. On a long enough timescale, the probability that spammers get your email address is 1, so why bother?

cd34 · on Jan 23, 2011

Greylisting only affects the first time a sender/receiver pair sends you a message, and then only every 31 days if they haven't refreshed the 'seen timer'.

As far as emails I receive, 99% are from people I've corresponded with and sometimes when I'm on the phone and they say, I just sent you my contact information, it is a little disconcerting the first time to have to wait 5-10 minutes.

I know spammers have my email address. I don't want to have to waste the time looking through the spam folder any more than I have to. The way it is now, I have a high probability that every email in my inbox is an actionable email, and I'm not stuck shuffling through a junk folder.

But, you do have a very good point regarding one of the downfalls from greylisting.

pavel_lishin · on Jan 22, 2011

Aren't there alternatives that share the learned rules?

And if not... maybe someone should make one.

The obvious downside is that someone could poison the well. I don't know how to go about fixing that off the top of my head, but I've been drinking. :)

eru · on Jan 22, 2011

How about statistical filtering---to see how well your newly imported rules do?