Combatting SPAM in ISP Mail Servers

Fighting SPAM is not easy. Here’s an example of how we configured an ISP mail server cluster to reduce the SPAM load…

Introduction

ISPs must provide a mail relay service to their end users. However, mail relay isn’t a revenue generating service and ISPs can’t allocate unlimited staff resources to these services. The problem administrators face is trying to find a balance between unobstructively allowing their clients to send legitimate mail but to stop them from spamming.

The main source of SPAM from clients (SMTP clients in the ISPs IP space) is typically computers infected with bots / malware behind the users router. It may be their own or even a neighbor taking advantage of an unsecured wireless network. There is a smaller secondary source als0 – spammers who have someone gotten hold of a customers SMTP authentication details and are logging in from outside of the ISPs IP space to relay mail through.

Now, the problem this causes is that when the ISP mail servers pass on this SPAM to its destination such as a Yahoo! or GMail recipient, these companies examine the ratio of SPAM to HAM coming from the ISPs servers. When this ratio goes the wrong way, they legitimately block the mail servers meaning that customers of the ISP cannot get their mail delivered to these destinations. This is a big problem.

In this article, I’ll describe a new mail cluster set-up for an ISP to try and ensure their SPAM:HAM ratio stays on the right side.

Solution  Overview

The mail cluster comprises a number of SMTP servers all configured identically (in fact all booting from the same image using PXE and an NFS root file system with dedicated local /var partitions). They are running Debian Stable (Lenny at the time of writing) and the mail server is the excellent Postfix.

Postfix is configured to accept authentication using SASL with Dovecot (configured only for authentication, no POP3 or IMAP services) over TLS on port 25 or SSL on port 465. Users on the ISP’s address space do not need to authenticate to relay mail. This is actually a legacy configuration and quite common among ISPs. If we were to go back many years starting all over again, I’d certainly try and evaluate the merits of only allowing relay after authentication irrespective of your ISP.

Mail server server selection is made via DNS round robin but this could easily be changed to a load balancer. As part of this retrofit, we took the opportunity to IPv6 enable all the servers.

We now need to look at securing these boxes to ensure the level of SPAM is controller. There are three points at which we examine incoming mail:

  1. before it hits the SMTP daemon at all;
  2. after the SMTP headers (MAIL FROM, RCPT TO) have been passed to the daemon but before the mail has been accepted (i.e. before the SMTP DATA command);
  3. after the daemon has accepted the mail.

We’ll examine those in each of the sections below.

Rate Limiting Connections

In order to limit repeated connections to the mail server – such as from a SPAM bot that creates a new SMTP session for every mail (rather than sending many mails in one session), we use iptables recent match module.

This module keeps count of the number of connections in a given time period allow us to take alternative action if the number is above a threshold. In our case, we just drop additional connections until the number of connections drops below that threshold. For example:

-A INPUT -p tcp -m state --state NEW -m multiport
        --dports 25,465 -j SMTP_IN

-A SMTP_IN -p tcp --syn -m recent --name SMTP_RATE_LIMIT --set
-A SMTP_IN -p tcp --syn -m recent --name SMTP_RATE_LIMIT --update
        --seconds 60 ! --hitcount 10 -j SMTP_IN_LIMIT
-A SMTP_IN -j ACCEPT

-A SMTP_IN_LIMIT -j LOG --log-prefix "SMTP_IN: LIMIT EXCEEDED:: "
-A SMTP_IN_LIMIT -j DROP

We have some additional rules where, for example, we have whitelisted core infrastructure IP addresses and also mail from non ISP IP addresses (the purpose here is to stop SPAM bots from within the network).

Checks Before Accepting the Mail

Once the client connects to Postfix we check a number of other items.

We of course use some of Postfix’s own policy checks (such as reject_non_fqdn_sender, reject_non_fqdn_recipient, etc) but that’s pretty par for the course and not discussed below.

The ISP uses a commercial subscription to the Spamhaus data feed service. This is a very reliable database of IP addresses that are known to be sending SPAM (and other classifications). This is the first thing we check – customer or not. If you’re on this list, we do not accept mail from you.

If you’re not coming from the ISP address space, we then initiate gray listing using a Debian package called postfix-gld (see this site) which uses a MySQL database as its storage engine. See greylisting.org for more details on this useful anti-SPAM measure.

Now, if you’ve gotten this far, we use a package called postfix-policyd (see this site) to initiate rate limiting. Clients are limited in an hourly window on how many mails they can send and on how many recipients in total these mails can reach. The exact numbers are a moving goal post for now while we tweak it.

We could of course have used policyd for gray listing also but this posed a significant problem. Our Postfix configuration looks like this:

smtpd_recipient_restrictions =
    ...
    check_policy_service inet:127.0.0.1:10031,
    permit_mynetworks,
    permit_sasl_authenticated,
    check_policy_service inet:127.0.0.1:2525
    ...

As you can see, the issue is that we do not want to gray list our own IP range or people with a valid username and password but we do want to rate limit these. Using policyd for both is not possible unfortunately.

At this point, your mail will be accepted for relay. We not start the next batch of checks.

Content Filtering

We now need to examine these mails to discard SPAM and mails infected with a virus. For this we use the excellent amavisd-new package with SpamAssassin and ClamAV installed on the servers.

We’re still tweaking the rules for best effect but as it stands, if a virus is detected in the mail it’s silently discarded. A mail with a reasonable SPAM score is marked as SPAM in the subject but mails with a high score are also just discarded. DSNs are not sent.

If a mail passes all these checks, it’ll be relayed onwards.

Fighting SPAM is far from easy but the above will make a big dent. Unfortunately we can’t stop just there. We also have scripts to analyse the logs and rate limiting database to pull out the bad guys so the ISP’s tech support team can chase these up retroactively and instruct them on protecting their machines.

Testing SPAM and Virus Filters

I’ve recently performed a complete upgrade of Open Solutions’ mail servers and I’ve now moved onto doing likewise for one of our ISP customers with a lot of users.

These retrofits include installing virus and SPAM filters to protect both ourselves and the ISP customers but also to stop customers who have infected computers from spewing these emails out.

When everything’s up and appears to be working, I like to test both filtering systems to ensure they’re working. Quoting from eicar:

Using real viruses for testing in the real world is rather like setting fire to the dustbin in your office to see whether the smoke detector is working. Such a test will give meaningful results, but with unappealing, unacceptable risks.

Fortunately, test files exist for virus checkers and SpamAssassin:

  • The EICAR standard anti-virus test file can be found here.
  • SpamAssassin created the GTUBE (Generic Test for Unsolicited Bulk Email) for the same purpose and this can be found here.

Asterisk SIP Brute Force Attacks on the Rise

See my article on the company blog for a discussion on this, and a how to on using Fail2ban to help stop these attacks.

SIP Brute Force Attacks on the Increase

On our own Asterisk PBX server for our office and on some customer boxes with open SIP ports, we have seen a dramatic rise in brute force SIP attacks.

They all follow a very common pattern – just over 41,000 login attempts on common extensions such as 200, 201, 202, etc. We were even asked to provide some consultancy about two weeks ago for a company using an Asterisk PBX who saw strange (irregular) calls to African countries.

They were one example of such a brute force attack succeeding because of a common mistake:

  1. an open SIP port with:
  2. common extensions with:
  3. a bad password.

(1) is often unavoidable. (2) can be mitigated by not using the predictable three of four digit extension as the username. (3) is inexcusable. We’ve even seen entries such as extension 201, username 201, password 201. The password should always be a random string mixing alphanumeric characters. A good recipe for generating these passwords is to use openssl as follows:

openssl rand -base64 12

Users should never be allowed to choose their own and dictionary words should not be chosen. The brute force attack tried >41k common passwords.

Preventing or Mitigating These Attacks

You can mitigate against these attacks by putting external SIP users into dedicated contexts which limit the kinds of calls they can make (internal only, local and national, etc); ask for a PIN for international calls; limit time and cost; etc.

However, the above might be a lot of work when simply blocking users after a number of failed attempts can be much easier and more effective. Fail2ban is a tool which can scan log files like /var/log/asterisk/full and firewall IP addresses that makes too many failed authentication attempts.

See VoIP-Info.org for generic instructions or below for a quick recipe to get it running on Debian Lenny.

Quick Install for Fail2ban with Asterisk SIP on Debian Lenny

  1. apt-get install fail2ban
  2. Create a file called /etc/fail2ban/filter.d/asterisk.conf with the following (thanks to this page):
  3. Put the following in /etc/fail2ban/jail.local:
  4. Edit /etc/asterisk/logger.conf such that the date format under [general]reads:
    dateformat=%F %T
  5. Also in /etc/asterisk/logger.conf, ensure full logging is enable with a line such as the following under [logfiles]:
    full => notice,warning,error,debug,verbose
  6. Resart / reload Asterisk
  7. Start Fail2ban: /etc/init.d/fail2ban startand check:
    • Fail2ban’s log at /etc/log/fail2ban.logfor:
      INFO   Jail 'asterisk-iptables' started
    • The output of iptables --list -vfor something like:
      Chain fail2ban-ASTERISK (1 references)
       pkts bytes target     prot opt in     out     source               destination
        227 28683 RETURN     all  --  any    any     anywhere             anywhere

You should now receive emails (assuming you replaced the example addresses above with your own and your MTA is correctly configured) when Fail2ban starts or stops and when a host is blocked.

HTTP Streaming with Encryption under Linux

For a customer of ours, we need to mass encode thousands of video files and also segment and encrypt them for use with Apple’s HTTP Streaming.

For a customer of ours, we need to mass encode thousands of video files and also segment and encrypt them for use with Apple’s HTTP Streaming. (using Amazon EC2 instances for the leg work).

On his blog, Carson McDonald, has put together a good over view of how HTTP Streaming can work under Linux a long with a segmenter.

The one piece of the jigsaw we were missing was encryption and after some work ourselves and with the help of a stackoverflow question, we have a working sequence of commands to successfully and compatibly encrypt segments for playback on Safari and other supported HTTP streaming clients:

  1. Create a key file:
    openssl rand 16 > static.key
  2. Convert the key into hex:
    key_as_hex=$(cat static.key | hexdump -e '16/1 "%02x"')
  3. At this point, let’s assume we have segmented a file of 30 seconds called video_low.ts into ten 3 second segments called video_low_X.ts where X is an integer from 1 to 10. We can then encrypt these as follows:
    for i in {0..9}; do
        init_vector=`printf '%032x' $i`
        openssl aes-128-cbc -e -in video_low_$(($i+1)).ts     
            -out video_low_enc_$(($i+1)).ts -p -nosalt        
            -iv $init_vector -K $key_as_hex
    done
    

With a matching m3u8 file such as the following, the above worked fine:

#EXTM3U
#EXT-X-TARGETDURATION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:3,
#EXT-X-KEY:METHOD=AES-128,URI="http://www.example.com/static.key"
http://www.example.com/video_low_enc_1.ts
#EXTINF:3,
http://www.example.com/video_low_enc_2.ts
...
#EXT-X-ENDLIST

What caught us out was the initialisation vector with is described in the draft IETF document as follows:

128-bit AES requires the same 16-octet Initialization Vector (IV) to
be supplied when encrypting and decrypting. Varying this IV
increases the strength of the cipher.

If the EXT-X-KEY tag has the IV attribute, implementations MUST
use the attribute value as the IV when encrypting or decrypting
with that key. The value MUST be interpreted as a 128-bit
hexadecimal number and MUST be prefixed with 0x or 0X.

If the EXT-X-KEY tag does not have the IV attribute,
implementations MUST use the sequence number of the media
file as the IV when encrypting or decrypting that media file.
The big-endian binary representation of the sequence number
SHALL be placed in a 16-octet buffer and padded (on the left)
with zeros.

Encoding Video for the HTC Desire

A useful script to encode all files passed as parameters(s) for viewing on a HTC Desire.

While I’m writing about video encoding, another task I did recently was encode a load of video files for my HTC Desire (a handset I’d strongly recommend for anyone). The main reason being that I like to watch something while pounding the threadmill in the gym.

A useful script to encode all files passed as parameters(s) (must all end in .avi) is:

#! /bin/bash

src="$*"
dst="_${*%%avi}mp4"

echo -en "Encoding $src\t\t\tPASS1"

ffmpeg -b 600kb -i "$src" -v 0 -pass 1 -passlogfile FF -vb 600Kb \
    -r 25 -an -threads 2 -y "$dst" /dev/null

echo -e "\tPASS2"

ffmpeg -b 600kb -i "$src" -v 0 -pass 2 -passlogfile FF -vb 600Kb \
    -r 25 -threads 2 -y -vol 1536 "$dst" /dev/null

rm FF-0.log

Encoding Full HD as FLV (for Gallery3)

I have a full HD camcorder and I wanted to stick some good quality video on my gallery for relatives to view. So, I needed to convert my sample 100MB MP4 full HD file to a suitably sized FLV for the Gallery. Here’s what I did…

I have a very nice Samsung R10 Full HD Camcorder which I bought last year. After a recent family holiday, I wanted to stick some good quality video on my gallery for relatives to view. The gallery is RC2 of the excellent Gallery 3 package which uses another excellent open source tool called Flow Player to play movies.

So, I needed to convert my test 100MB MP4 full HD file to a suitably sized FLV for the Gallery. My initial attempts with ffmpeg worked fine but the quality (sample) was very poor and changing the bit rate in different ways seemed to make no difference:

ffmpeg -i HDV_0056.MP4 -vb 600k -s vga -ar 22050 -y Test.flv
ffmpeg -i HDV_0056.MP4 -b 600k -s vga -ar 22050 -y Test.flv
ffmpeg -i HDV_0056.MP4 -vb 600k -s vga -ar 22050 -y Test.flv

I then turned to x264 and broke the process down to a number of stages:

  1. Extract the raw video to YUV4MPEG (this creates a 7GB file from my 100MB MP4):
    ffmpeg -i HDV_0056.MP4 HDV_0056.y4m
  2. Encode the video component to H.264/FLV at the specified bit rate with good quality:
    x264 --pass 1 --preset veryslow --threads 0 --bitrate 4000 \
            -o HDV_0056.flv HDV_0056.y4m
    x264 --pass 2 --preset veryslow --threads 0 --bitrate 4000 \
            -o HDV_0056.flv HDV_0056.y4m

    Note that I’m using the veryslow preset which is… very slow! You can use other presets as explained in the x264 man page.

  3. Extract and convert the audio component to MP3 (the sample rate is important):
    ffmpeg -i HDV_0056.MP4 -vn -ar 22050 HDV_0056.mp3
  4. Merge the converted audio and video back together:
    ffmpeg -i HDV_0056.flv -i HDV_0056.mp3 -acodec copy \
            -vcodec copy -y FullSizeVideo.flv

    This yields a near perfect encoding at 22MB. It’s still full size though (HD at 1920×1080).

  5. The last step is to then use ffmpeg to resize the video and it now seems to respect bit rate parameters:
    ffmpeg -i FullSizeVideo.flv -s vga -b 2000k \
            -vb 2000k SmallSizeVideo.flv

The resultant video can be seen here.

Robert Swain has a useful guide for ffmpeg x264 encoding.

Linux on a Dell Vostro 200

Following a recent post to ILUG asking about setting Linux up on a Dell Vostro 200, I followed up with my notes from the time I had to do it a few months back.

This is just a copy of my notes rather than a how-to but any competent Linux user should have no problem. Apologies in advance for the brevity; with luck you’ll be using a later version of Linux which will already have solved the network issue…

The two main issues and fixes were:

  • The SATA CD-ROM was not recognised initially and the fix was set the following parameter in the BIOS:

    BIOS -> Integrated Peripherals -> SATA Mode -> RAID

  • The network card is not recognised during a Linux install. Allow install to complete without network and then download Intel’s e1000 driver from http://downloadcenter.intel.com/ or specifically by clicking here. The one I used then was e1000-7.6.9.tar.gz but the current version appears to be e1000-7.6.15.tar.gz (where the above link heads to – check for later versions).

    My only notes of the install just say “essentially follow instructions in README” so I assume they were good enough! Obviously you’ll need Linux kernel headers at least as well as gcc and related tools.

    Once built and installed, a:

    modprobe e1000

    should have you working. Use dmesg to confirm.

Recovering an LVM Physical Volume

Yesterday disaster struck – during a CentOS/RedHat installation, the installer asked (not verbatim): “Cannot read partition information for /dev/sda. The drive must be initialized before continuing.”.

Now on this particular server, sda and sdb were/are a RAID1 (containing the OS) and a RAID5 partition respectively and sdc was/is a 4TB RAID5 partition from an externally attached disk chassis. This was a server re-installation and all data from sda and sdb had multiple snapshots off site. sdc had no backups of its 4TBs of data.

The installer discovered the drives in a different order and sda became the externally attached drive. I, believing it to be the internal RAID1 array, allowed the installer to initialise it. Oh shit…

Now this wouldn’t be the end of the world. It wasn’t backed up because a copy of the data exists on removable drives in the UK. It would mean someone flying in with the drives, handing them off to me at the airport, bringing them to the data center and copying all the data back. Then returning the drives to the UK again. A major inconvenience. And it’s also an embarrassment as I should have ensured that sda is what I thought it was via the installers other screens.

Anyway – from what I could make out, the installer initialised the drive with a single partition spanning the entire drive.

Once I got the operating system reinstalled, I needed to try and recover the LVM partitions. There’s not a whole lot of obvious information on the Internet for this and hence why I’m writing this post.

The first thing I needed to do was recreate the physical volume. Now, as I said above, I had backups of the original operating system. LVM creates a file containing the metadata of each volume group in /etc/lvm/backup in a file named the same as the volume group name. In this file, there is a section listing the physical volumes and their ids that make up the volume group. For example (the id is fabricated):

physical_volumes {
        pv0 {
                id = "fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn"
                device = "/dev/sdc"     # Hint only

                status = ["ALLOCATABLE"]
                pe_start = 384
                pe_count = 1072319      # 4.09057 Terabytes
        }
}

Note that after I realised my mistake, I installed the OS on the correct partition and after booting, the external drive became /dev/sdc* again. Now, to recreate the physical volume with the same id, I tried:

# pvcreate -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
  Device /dev/sdc not found (or ignored by filtering).

Eh? By turning on verbosity, you find the reason among a few hundred lines of debugging:

# pvcreate -vvvv -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
...
#filters/filter.c:121         /dev/sdc: Skipping: Partition table signature found
#device/dev-io.c:486         Closed /dev/sdc
#pvcreate.c:84   Device /dev/sdc not found (or ignored by filtering).

So pvcreate will not create a physical volume using the entire disk unless I remove partition(s) first. I do this with fdisk and try again:

# pvcreate -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
  Physical volume "/dev/sdc" successfully created

Great. Now to recreate the volume group on this physical volume:

# vgcreate -v md1000 /dev/sdc
    Wiping cache of LVM-capable devices
    Adding physical volume '/dev/sdc' to volume group 'md1000'
    Archiving volume group "md1000" metadata (seqno 0).
    Creating volume group backup "/etc/lvm/backup/md1000" (seqno 1).
  Volume group "md1000" successfully created

Now I have an “empty” volume group but with no logical volumes. I know all the data is there as the initialization didn’t format or wipe the drive. I’ve retrieved the LVM backup file called md1000 and placed it in /tmp/lvm-md1000. When I try to restore it to the new volume group I get:

# vgcfgrestore -f /tmp/lvm-md1000 md1000
  /tmp/lvm-md1000: stat failed: Permission denied
  Couldn't read volume group metadata.
  Restore failed.

After a lot of messing, I copied it to /etc/lvm/backup/md1000 and tried again:

# vgcfgrestore -f /etc/lvm/backup/md1000 md1000
  Restored volume group md1000

I don’t know if it was the location, the renaming or both but it worked.

Now the last hurdle is that on a lvdisplay, the logical volumes show up but are marked as:

  LV Status              NOT available

This is easily fixed by marking the logical volumes as available:

#  vgchange -ay
  2 logical volume(s) in volume group "md1000" now active

Agus sin é. My logical volumes are recovered with all data intact.

* how these are assigned is not particularly relevant to this story.

Nagios Plugin to Check the Status of PRI Lines in Asterisk

I have a number of Asterisk implementations that I keep an eye on that have multiple PRI connections. Knowing if and when they ever go down has the obvious benefits of alerting me to a problem in near real time. But besides that, it allows my customers and I to verify SLAs, track and log issues, etc.

To this end, I have written a Nagios plugin which queries Asterisk’s manager interface and executes the pri show spans CLI command (this is Asterisk 1.4 by the way). The script then parses the output to ascertain whether a PRI is up or not.

The actual code to connect to the manager interface and execute the query is simply:

if( ( $astsock = fsockopen( $host, $port, $errno, $errstr, $timeout ) ) === false )
{
    echo "Could not connect to Asterisk manager: $errstr";
    exit( STATUS_CRITICAL );
}

fputs( $astsock, "Action: Login\r\n");
fputs( $astsock, "UserName: $username\r\n");
fputs( $astsock, "Secret: $password\r\n\r\n"); 

fputs( $astsock, "Action: Command\r\n");
fputs( $astsock, "Command: pri show spans\r\n\r\n");

fputs( $astsock, "Action: Logoff\r\n\r\n");

while( !feof( $astsock ) )
{
    $asttext .= fread( $astsock, 8192 );
}

fclose( $astsock );

if( strpos( $asttext, "Authentication failed" ) !== false )
{
    echo "Asterisk manager authentication failed.";
    exit( STATUS_CRITICAL );
}

This plugin is hard coded to English and expects to find Provisioned, Up, Active for a good PRI. For example, the Asterisk implementations that support the pri show spans command that I have access to return one of:

  • PRI span 1/0: Provisioned, In Alarm, Down, Active
  • PRI span 3/0: Provisioned, Up, Active
  • PRI span 2/0: Up, Active

I’m actually running a slightly older version of Nagios at the moment, version 1.3. To integrate the plugin, first add the following command definition to an appropriate existing or new file under /etc/nagios-plugings/config/:

define command{
        command_name    check_asterisk_pri
        command_line    /usr/lib/nagios/plugins/check_asterisk_pri.php \\
             -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -w $ARG3$ \\
             -c $ARG4$ -n $ARG5$
}

where $ARG1$ is the Asterisk manager username and $ARG2$ is the password. $ARG3$ and $ARG4$ are the warning and critical thresholds respectively whereby if the number of available PRIs reaches one of these values, the appropriate error condition will be set. Lastly, $ARG5$ is the number of PRIs the plugin shouldexpect to find.

NB: the command_line line above is split for readability but it should all be on the one line.

Now create a test for a host in an appropriate file in /etc/nagios/config/:

define service{
        use                             core-service
        host_name                       hostname.domain.ie
        service_description             Asterisk PRIs
        check_command                   check_asterisk_pri!user!pass!2!1!4
}

Ensure that your Nagios server has permissions to access the Asterisk server via TCP on the Asterisk manager port (5038 by default). If on a public network, this should be done via stunnel or a VPN for security reasons.

Lastly, you’ll need a user with the appropriate permissions and host allow statements in your Asterisk configuration (/etc/asterisk/manager.conf):

[username]
secret = password
deny=0.0.0.0/0.0.0.0
permit=1.2.3.4/255.255.255.255
read = command
write = command

The next version may include support for BRI and Zap FXO ports also. I also plan on a Cacti plug in to show the channels on each PRI (up – on a call, down, etc). In any case, updates will be posted here.

The plug in can be download from: http://www.opensolutions.ie/misc/check_asterisk_pri.php.txt

UPDATED 20/03/2012: Aterisk 1.8.9 takes out the word “Provisioned” in “pri show spans”. Thanks to Shane O’Cain.