The epic spam battle from SpamAssassin (10 + year user) to rspamd.

For many System Administrators that have public facing Mailservers, it is an ongoing battle.. SPAM. Since there is money to make, it will never ever go away, but we can try to mitigate this.

Introduction on my usage of anti-spam products:

For many moons I have used the SpamAssassin product in various forms, simply as a client to check every email on delivery, as daemon where multiple servers check one instance, as part of MailScanner where a single (replicated) database was responsible for storing all bits and pieces combined with local additional rules. This worked fine for years, but, our external MX servers are not the most powerful machines in the world. We need to be selective on what we load on them. And the ever increasing spam battle just makes sure that your memory and processing power is going faster then the system(s) could continuously deliver.More rules, more Anti-Virus, more regular expressions, more downloading, parsing and re2c’ing files that gets harder and harder for the systems every time the amount of rules etc increases.

I already mentioned that this worked fine for years. I switched to MailScanner for our MX’es not too long ago, and I am happy with that, except that it takes additional load on the machines, and will only judge about mails when they are already in. I contributed to MailScanner and specifically to the MailWatch project for reasons of LDAP authentication and more of those things, where I found space to improve. Even though I like the system very much, it is not how I want to prevent Spam from coming in. It might be a good fit for you though, it offers a quarantine where users can selectively release emails and mark them as spam and such and you can generate emails that send the amount of potentially missed emails and a link to them etc. Some of our users where happy with that as well, and so was I.

Limitations of our handling of email:

But, resources were becoming a problem. Yes I can upgrade my external MX’es ofcourse and load them with more memory and CPU power, but that costs money. Money that is hard earned in the hosting world, because there is plenty to choose from, even if we give the best prices around, it still takes multiple additional customers to warrant the higher bills (that is not taking into account that profit would be fun for additional investments in the company so that our users can get even better products).

So, given the saturated market, I was not going to spend additional money on our machines just yet. Another thing is that I wanted to prevent spam from coming into the machine in the first place, so reject them at the border where possible, so I do not have to cater them. (See it as border patrol, it’s easier to prevent things coming in, then to handle them once they are in). I noticed that several email servers where already doing that when we forward mail for our domains to lets say gmail or other companies that people are happy to use. Those servers, like gmail, either rate limit you or they just deny the emails before you are able to send them. Leaving you with the problems instead of the gmail user itself. Magnificent. But how does that work? for Postfix, which I use that means using a milter, specifically in this case rmilter, which binds into the product on the SMTP level, checks signatures stored, scans the content and verifies with bayes and a neural network whether this is OK or not, and then either rejects it before processing it, greylisting it when it seems spammy or adds an header to the message and forwards it to it’s final destination. If we are the final destination, then the header is taken into account and the message is automatically put in the Spam folder, or for gmail/hotmail users this is the ‘unwanted email’ folder or whatever it is called nowadays. I have put filters in place, that learn your behaviour, so if a message is put in the Spam folder and is not spam and you move it back to for example the INBOX, then the system learns that it should not mark it as spam and try to do better next time.

The product: rspamd

But what product delivers that ? After talking with a postmaster team member of FreeBSD, I found out about rspamd, and that the author is a fellow-FreeBSD-committer as well. I implemented it (it took some time to learn the curve, but essentially it is rather easy, try it!). It has less load then the various spam assassin products and additional applications that support it (like mailscanner and mailwatch), it does not need a webserver by itself etc. So it reduced my memory footprint with around 400mb’s continuously of less memory usage. That is a whole lot of you have mb’s to spare instead of handing them out.

How does it globally work?

I also configured rspamd to behave like the following;

  • Both our external MX’es have a local bayes-classifier and various other local databases. I used the suggested three database tier on the machine and I extended both machines to use stunnel to contact eachother over the stunnel to the remote database. I changed all configuration options to not only use “servers = “localhost”;“ but instead “servers = “localhost,localhost:26379”; and spreading that across every redis line I could find. I then restarted rspamd on both machines and noticed that there is a lot of things going on, it seems that everything is written and read on both machines. Using the webinterface, you’ll sometime get errors, not sure why that is, and history is not always consistent. but it’s for management purposes only so not very problematic in this case. Both MX’es are checking on their localhost, and “also_check” the remote machine over an internal private network that I have setup.
  • Our internal machines that handle the delivery of the email, use both MX’es as rspamd instance as configured in rmilter. They do not handle anything themselves, except for Virus Scanning (which is also done on the MX but as well on the local machine, but only for email not received from the MX’es, like outgoing email). That means less overhead for those machines and only using the two machines where we know they are working. I also extended these machines to use redis on the MX’es instead of locally and configured them both in the configuration, again using stunnel. rmilter uses the redis databases to store and save messages that we have send and get replies and such. In the future if rspamd is by itself capable of handling this, rmilter will be taken out and only rspamd will run like mentioned.

Learning spam/ham messages:

For now this seems to work very well, I have implemented a dovecot script that triggers when someone moves a message from spam to inbox (‘learn-ham.sh’) and from inbox or other mailboxes to the spambox (‘learn-spam.sh’).

The contents of the files look like the follwing, where learn_spam and learn_ham are in the appropriate places ofcourse.

#!/bin/sh

data=$(cat)

echo “$data” | /usr/local/bin/rspamc -h MX1 -P <secret password for MX1> learn_spam

echo “$data” | /usr/local/bin/rspamc -h MX2 -P <secret password for MX2> learn_spam

Ofcourse it takes additional understanding of how emails work, how your environment works and what is acceptable or not. On the course of just a few days we processed more then 10k of emails (yes there are many providers doing more emails, everyone has it’s own perks ;-)). and we have learned more then 60 emails in just a day after enabling users to do their own training.

One note:

A little note about the rejecting of spam, we only reject spam when the message is really spammy and cannot be easily something else. Most emails that I saw so far are forwarded with an additional header instead of being rejected and the emails that are rejected are really spam. Users will never ever see them, which is good enough for my environment but might be something different for your environment. Please dry-run it at first to see how it matches your environment.

References:

The script for learning spam under dovecot comes from: https://kaworu.ch/blog/2014/03/25/dovecot-antispam-with-rspamd/#comment-2436333602 user Alex.

The documentation I used for rspamd comes from http://www.rspamd.com itself.

The sieve filters that I use for dovecot are from Dovecot itself https://wiki2.dovecot.org/HowTo/AntispamWithSieve

Custom blacklisting of domains and such come from: https://gist.github.com/kvaps/25507a87dc287e6a620e1eec2d60ebc1