> SpamAssassin is rules based, on a known corpus with rule modifications sent out from time to time. In order to get better detection, you can run something like dspam or crm-114 along with, which are statistics based, and with a short training period, can get close to gmail's accuracy.
FWIW I do train my SA installation (actually amavisd-new) with sa-learn.
> The other benefit to gmail is shared inoculation.
A very good point. My mail server has very, very few users.
On the other hand, I am using DNSBLs, Razor, and Pyzor through SA, so you might think I'd benefit more from that. (Which is not to say SA isn't blocking a ton of spam--it's just not doing as good as Google: one every six weeks or something from Gmail, a handful every day from SA.)
I run greylisting -> dspam (toe) -> tmda for anything dspam flags and have had 1 missed spam this week. Looking through the tmda queue, I see no false positives. I have had false positives in the past, but, once they reply (similar to spamarrest), it lets them through. I do run a few DNSBLs, but, really haven't see much need to increase it. I'd say 80% of the spam I used to receive was eliminated with greylisting. Probably 60% of it now as spammers are starting to hack actual servers that will retry rather than sending mail from botnets.
I personally hate greylisting. I use IMAP IDLE to get notified on new mails immediately, so I don't want the MTAs to bounce mails around. Especially since SpamAssassin works so good.
What I do is the following: SpamAssassin rules + Bayes (currently about 200k mails trained, 25% of them spam) + URIBL. I think I could even tune that with Razor and/or Pyzor but I get to few spam to actually care.
Since that, I've given up hiding my mail. On a long enough timescale, the probability that spammers get your email address is 1, so why bother?
Greylisting only affects the first time a sender/receiver pair sends you a message, and then only every 31 days if they haven't refreshed the 'seen timer'.
As far as emails I receive, 99% are from people I've corresponded with and sometimes when I'm on the phone and they say, I just sent you my contact information, it is a little disconcerting the first time to have to wait 5-10 minutes.
I know spammers have my email address. I don't want to have to waste the time looking through the spam folder any more than I have to. The way it is now, I have a high probability that every email in my inbox is an actionable email, and I'm not stuck shuffling through a junk folder.
But, you do have a very good point regarding one of the downfalls from greylisting.
FWIW I do train my SA installation (actually amavisd-new) with sa-learn.
> The other benefit to gmail is shared inoculation.
A very good point. My mail server has very, very few users.
On the other hand, I am using DNSBLs, Razor, and Pyzor through SA, so you might think I'd benefit more from that. (Which is not to say SA isn't blocking a ton of spam--it's just not doing as good as Google: one every six weeks or something from Gmail, a handful every day from SA.)