Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SpamAssassin is rules based, on a known corpus with rule modifications sent out from time to time. In order to get better detection, you can run something like dspam or crm-114 along with, which are statistics based, and with a short training period, can get close to gmail's accuracy.

The other benefit to gmail is shared inoculation. By the time you get to your mailbox, a thousand others may have reported it as spam, and it is already marked as spam on your inbox. With a small, personal mailbox, or even a companywide deployment, you may not get enough benefit from shared inoculation.



> SpamAssassin is rules based, on a known corpus with rule modifications sent out from time to time. In order to get better detection, you can run something like dspam or crm-114 along with, which are statistics based, and with a short training period, can get close to gmail's accuracy.

FWIW I do train my SA installation (actually amavisd-new) with sa-learn.

> The other benefit to gmail is shared inoculation.

A very good point. My mail server has very, very few users.

On the other hand, I am using DNSBLs, Razor, and Pyzor through SA, so you might think I'd benefit more from that. (Which is not to say SA isn't blocking a ton of spam--it's just not doing as good as Google: one every six weeks or something from Gmail, a handful every day from SA.)


greylisting can also help.

I run greylisting -> dspam (toe) -> tmda for anything dspam flags and have had 1 missed spam this week. Looking through the tmda queue, I see no false positives. I have had false positives in the past, but, once they reply (similar to spamarrest), it lets them through. I do run a few DNSBLs, but, really haven't see much need to increase it. I'd say 80% of the spam I used to receive was eliminated with greylisting. Probably 60% of it now as spammers are starting to hack actual servers that will retry rather than sending mail from botnets.


I personally hate greylisting. I use IMAP IDLE to get notified on new mails immediately, so I don't want the MTAs to bounce mails around. Especially since SpamAssassin works so good.

What I do is the following: SpamAssassin rules + Bayes (currently about 200k mails trained, 25% of them spam) + URIBL. I think I could even tune that with Razor and/or Pyzor but I get to few spam to actually care.

Since that, I've given up hiding my mail. On a long enough timescale, the probability that spammers get your email address is 1, so why bother?


Greylisting only affects the first time a sender/receiver pair sends you a message, and then only every 31 days if they haven't refreshed the 'seen timer'.

As far as emails I receive, 99% are from people I've corresponded with and sometimes when I'm on the phone and they say, I just sent you my contact information, it is a little disconcerting the first time to have to wait 5-10 minutes.

I know spammers have my email address. I don't want to have to waste the time looking through the spam folder any more than I have to. The way it is now, I have a high probability that every email in my inbox is an actionable email, and I'm not stuck shuffling through a junk folder.

But, you do have a very good point regarding one of the downfalls from greylisting.


Aren't there alternatives that share the learned rules?

And if not... maybe someone should make one.

The obvious downside is that someone could poison the well. I don't know how to go about fixing that off the top of my head, but I've been drinking. :)


How about statistical filtering---to see how well your newly imported rules do?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: