I’ve been meaning to do something about my spam situation, which rather sucks at the moment. Now that my email address appears occasionally on the linux kernel mailing list and elsewhere, I’ve not suprisingly got a signal-to-noise ratio of something like 200:1. I’ve had spam assassin in the loop for a while, but the Bayesian analysis always invokes the OOM killer on my memory poor firewall so it’s not running with full effectiveness. Thus I took a little time out to add a whitelist to the front end to make things go better: if you’re in the whitelist, it skips all the filters. Meanwhile SA will get the threshold cranked up to consider practically everything spam, and I’ll be more likely to insert one-off regexps that trash certain spams that I get over and over again.
To make the whitelist, I went to my existing mail folders (in mbox format, of course). This was easy enough:
#! /bin/sh for i in `/bin/ls | grep -v junk`; do formail -s formail -zx "From: " -zx "To: " < $i | sed -e "s/^.*<(.*)>.*$/1/g" >> whitelist.uns done cat whitelist.uns | sort | uniq > whitelist rm whitelist.uns
Then, it took a while to figure out how to make the proper procmail recipe to use it, but I eventually came up with this:
MAILDIR=Mail FGREP=/bin/fgrep :0: * !^List-Id:.* * ? formail -zx "From:" | ${FGREP} -F -v -i -w -f whitelist { :0fw * <100000 | spamassassin :0: * ^X-Spam-Status: Yes junk :0: * ^Subject:.*Corel Draw.* junk :0: * ^Subject: *[^ ]* new(s)?$ junk :0D: * ^Subject:.*(O|0)EM * junk :0D: * ^Subject:.*[A-Z]+[a-z][a-z][a-z][A-Z]+ junk } :0Ecf | formail -A"X-Whitelist-Passed: OK" # kill html email :0 * ^Content-type: text/html { :0bfw | (echo "[html stripped]"; lynx -dump -force_html -stdin) :0ahfw | formail -i"Content-type: text/plain" }