Scalable E-Mail Filtering
Methods and Techniques
UC Berkeley Security SIG

Jon Kuroda

Note: press space to go forward. pageup/down keys work too
This is an (x)html document in the S5 presentation system, so there are lots of links one can follow. Diagrams are in separate SVG documents which requires browser support — recent FireFox and Opera work.

Some Alternate Titles

"Mail Filtering Deconstructed"

"Content Filtering on the Cheap"

"Content Filtering that Works (Better)"

"Virus Scanning and Spam Tagging That Sucks Less"

"Virus Scanning for a Non-Ideal World"

"Mail Filtering for the Masses"

"Anti-Virus Is Hard. Lets Go Shopping!"

What's this all about?

Scalable: Cheap and Easy to get more capacity (Pay what you want as you go); Does More. Costs Less. Doesn't Suck (as much)
Flexible: Useful in situations other than my own; Looking for a modular/toolkit approach
Server Based: Primarily interested in the MTA side; Less so in delivery-time systems such as Sieve, .forward/procmail, or MUA filters
Content-Filtering: Anti-Virus/Spam; Data Scrubbing/Retention; Auto-Spell-Checking / Auto-Translation

Topics (and Non-Topics)

What I will be (or have been) talking about: Background/Historical Information; Personal Caveats; How different filtering systems work; Ways to deploy email filtering - including examples; Crazy Ideas and Odds and Ends
What I will not be talking about as much: My (MTA|OS) is better than your (MTA|OS); Every single implementation detail; Measures outside of a filtering context
What I hope you (and I) will get out of this: Some understanding of how e-mail filters work and how to use them; Some tools and ideas to take home and try; If I am lucky, a good laugh

Caveats

I'm a *nix/sendmail guy who installed anti-virus software: Examples involving these will have the most detail; I don't do MS Exchange, but I will talk about it (a little)
I'm a realist, not an idealist: I don't work in an ideal IT world; I try not to assume one.
There is nothing new here: I actually didn't think this was that novel; No (in my opinion) out of the ordinary ideas; But, as always, the work is in the documentation

There is no spoon. But I have some lovely sporks.

Note for the online readers, I meant to have some plastic sporks to pass out as random prizes for questions, but 1) I forgot to bring them 2) I had too high of a slide/time ratio.

A (Very) Brief History of E-mail Servers

In the beginning ...

One big server - Shell/Mail/FTP/everything
Servers became (more) affordable - Proliferation and decentralization
Whoa Nellie - Consolidation

Results

Viruses, Spam, Server Attacks, SMTP relay abuse, ...
Partially consolidated services
Legacy servers
Market for anti-(bad stuff) software and systems
Market for people who can manage all of this

Mmmmm, Job Security

Once upon a time ...

It's 2003 in 399 Cory Hall ...

20+ supported separate, disjoint systems accepting email
- Numerous legacy research group mailservers - maillists and mailspools
- Research systems that accept input via SMTP
- One Exchange server.
Policy (then a draft policy) requiring anti-virus measures on mailservers

2. Anti-virus software:
Anti-virus software for any particular type of device currently listed on the Approved Software website must be running and up-to-date on every level of device, including clients, file servers, mail servers, and other types of campus networked devices.
Departmental (@EECS / @CS) MX hosts already had it
I was the new guy
"Hey Jon ..." Beware these words.

Filters: An (Over) Simplified Look

Some pseudocode (should read like Perl) describing what a filter does.
Note that there can be outcomes and actions separate from I/O — side-effects.

while (<INPUT>)
    if (/PATTERN/) {
	mangle $input;
  	print OUTPUT;
        do some_side_effect;
    } else {
        print OUTPUT;
        do some_other_side_effect;
    }

Pretty Diagram

Filters: The Engine

The "brains", does the work of making filtering and other decisions: Anti-virus/spam; "Scrubbing" messages for sensitive data; Anything from rejecting a message to passing it on unmodified
Side-Effects: Logging/Notifications; Updating cached information; Sometimes, all we care about are the side-effects
It may also depend on databases that require periodic* updating: Virus/Spam databases; SpamAssassin Bayesian Analysis (sa-learn); Spam Host lists

* How periodic? How paranoid are you? More on this later.

The Ins and Outs of Filter I/O

Great, we have a filter engine, but how do we get email in and out of the filter?

First, a detour to talk about Pre/Post-queue filtering
Do we have to let people in the door just so we can kick them out?
MTA Plugins
Embrace the MTA extensions
SMTP-aware Filter
It's an MTA, it's a Filter, no wait ... it's both!!
API/Protocol
Good fences make for good filtering.
Network Proxy (Extra Slide)
Black Magic that noone has done, probably

Filter I/O: Pre/Post-queue Filtering, a Slide Without a Home

Filter before or after queueing email? Pre-queue filtering lets one reject mail during an SMTP connection, but it can cause timeouts. Postfix has some good notes on the pros and cons of pre-queue filtering. We may come back to those notes later on.

Pre-queue filtering:

220-mail.example.com ESMTP JavaMail 6.2 Mon, 31 Jul 2006 20:01:23 -0700
ehlo poland.example.com
250-mail.example.com Hello poland.example.com [192.0.34.166]
Mail From: <exile@poland.example.com>
250 OK
Rcpt To: <president@example.com>
250 Accepted
data
354 Enter message, ending with "." on a line by itself
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
.
550 DENIED!! Message contains malware (ClamAV:Eicar-Test-Signature)

Versus post-queue, where filtering occurs but not till after email is accepted

220-post.example.com ESMTP JavaMail 6.2 Mon, 31 Jul 2006 20:02:23 -0700
ehlo siberia.example.com
250-post.example.com Hello siberia.example.com [192.0.34.166]
Mail From: <exile@siberia.example.com>
250 OK
Rcpt To: <president@example.com>
250 Accepted
data
354 Enter message, ending with "." on a line by itself
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
.
250 2.0.0 k713JO1n055810 Message accepted for delivery

Filter I/O: Pre/Post-queue Filtering: Consequences

What implications does this Pre/Post-queue Filtering business have?

Obviously, having a heavy-duty, relatively slow running filter as a pre-queue filter can lead to SMTP timeouts, causing remote MTAs (at least legitimate ones) to retry delivery, causing additional load leading to a downward spiral. So, just because some filter can be used in a pre-queue fashion doesn't meant that it should be used like that.

Rather, think in terms of what one would want as pre-queue filters:

**Candidates for pre-queue filtering**
Fast/Light or Cuts down Mailload	"Not so good" Candidates
DNSBL SPF checks Greylisting SMTP Compliance	Anti-Virus/Spam (CPU intensive) Most other message body manipulation CPU or I/O Intensive side-effects

Filter I/O: MTA Plug-in

Designed for a particular MTA (such as Microsoft Exchange w/VSAPI?)

Becomes part of the MTA (MS Exchange VSAPI uses DLLs)
Integration can increase performance
Filter can make use of MTA specific features to provide non-SMTP filtering

Essentially yields 'full-featured' Standalone SMTP-aware Filter

Can accept and deliver incoming email
Retains all of the MTA's own features, adding in filtering

Operates in MTA's process space

Often shares access privileges
Problem with filter can affect MTA from within
Even if somehow separate processes, often share system resources

I don't find them that flexible, but these can work well for specific situations.

Filter I/O: SMTP-aware Filter

It speaks SMTP, but do we call it an 'MTA'?

May not have all the features found in your actual MTA:
- DNSBL, RCPT Verfication, LDAP, Full SMTP routing, running hot water ...
Can relay to a "Real MTA" or be part of a "Dual MTA" setup, described later

SMTP usually means post-queue filtering only

Postfix has a workaround which is useful for Dual-MTA setups.
Use Postfix as a pre-queue-capable frontend to a post-queue-only filter.

**Examples of COTS SMTP-aware filters**
Hardware Boxes	Software Only
Barracuda Networks Spam Firewall* IronPort Mail Appliances Borderware "Mxtreme"**	TrendMicro Viruswall** Kaspersky Security Software Products * denotes what we use in our group or our department, ** what we have used.

Another Pretty Picture

Filter I/O: Transparent (or is it Opaque?) Proxy (Bonus Slide)

In many ways, similar to an SMTP-aware filter except the filter "intercepts" traffic at IP/Layer 3 or lower, not at the SMTP layer.

Imagine the filter as an invisble box that watches the network traffic on port 25 and silently edits and rewrites packets so that noone is the wiser on either end. For bonus points, you can do this even lower at Layer 2 (ala a bridging firewall)

Other systems get the illusion of connecting directly to the MTA
Filtering system is very stealthy
Can be part of larger IP traffic filtering system
Can be very complex (needs understanding of multiple network layers)

Okay, this all sounds very slick, but I don't know offhand of anything that does this or of anyone who has done this with a homegrown system. Anyone hear of something like this, even homegrown?

Filter: Side Effects and Other Random Bits

Notification E-mails

"You sent a virus" - useless and annoying. Turn it off
"You got a virus" - almost as useless and annoying, maybe amusing
"Virus Deleted" (Cleaned email sent out) - useless/annoying, maybe amusing
- Not a Side-effect but actual bona-fide filter I/O

Saving Viruses/Spam

Potentially amusing, harmful, and profitable

Logging

How else will you know if this works
How else will you fix it when it doesn't?
How else will you know if you need more capacity?
How else will you get that raise?
Right. Raise. UC system

"Virus Deleted" Emails In EECS, we actually send the cleaned e-mails. A default sieve rule on our IMAP server auto-files all such cleaned e-mails to a special folder where users can ignore them or be impressed by how much virus-laden email we're catching for them.

Saving Viruses A user in our department who was doing work with windows viruses asked if we had any he could get his hands on. We save viruses mostly for our amusement and to run stats, but we were able to give him a CD of viruses and get paid some T&M for it.

Deployment: What to do with these tools

We now have some pieces that can be combined in many ways, how can we use them?

Install it Everywhere
- The Simple (But Stupid?) Life

MX Filter
- The "Big Guy at the Entrance" way.

Network Filter Service (The Other Other NFS)
- It seems slick, but is it really useful?

Deployment: Install It Everywhere

Pretty self-explanatory

Pros

Braindead simple
Fine if you have only one or two systems to manage
But that is not the focus here.

Cons

Multiple points of (mis)management
May conflict with licensing schemes
Filter may not be supported/available on some platforms

This does not count as scalable, mmm'kay?

Deployment: MX Filter in a Nutshell

Essentially an SMTP relay that filters along the way

Major Steps

build (or buy) one or more SMTP-aware Filters
Set up SMTP-aware Filters for handoff to 'client MTAs'
MX all of your 'client MTAs' to the SMTP-aware Filters
Sit back, relax, have a tasty beverage

Optional

IP firewalls to limit access to client MTAs
Collect statistics on performance (that whole raise thing)
RCPT verification

A Pretty Diagram

Deployment: MX Filter - Pro and Cons

Pros

Only need a few machines to run the filter
Redundant backup MXs in case of downtime
- Control over mail queue during/after downtime
We can arbitrarily alter the destination MTA
Filter for many MTAs: Exchange, Sendmail, Postfix, ... but only the SMTP vector
Allows us to firewall off client MTAs from the world-at-large

Cons

One Big One: accepting mail for non-existent@client-mta
- Lots of postmaster mail. We hatesssss that. So, some solutions:
  - Make undergrad students read it
  - Make list of valid addresses available to Filter (cron jobs, LDAP, etc)
  - SnertSoft milter-ahead Thanks to UCI for this, I was going to use milter-cli.

Small SMTP delay introduced
Presence of filter system revealed in Received: headers

Deployment: Network Filter Service in a Nutshell

Not a Network File System, nor a Number Field Sieve

Major Steps

Build (or buy) one or more systems running your filter(s)
Make them available via milter or other API
Configure 'protected' MTAs to access filter via API
Sit back, relax, have a tasty beverage

Optional

SMTP-aware Filter(s) as lower priority MXs to queue mail during downtime
IP firewalls to control access to Network Filter Service
Again, collect statistics

A Pretty Diagram

Deployment: Network Filter Service - Pro and Cons

Pros

E-mail goes directly to destination server
- Fewer worries about accepting email for bogus addresses
- One fewer SMTP hop
- Presence of filtering system not announced as openly
Like the MX Filter, it only needs a few systems for filtering, but only protects SMTP

Cons

Requires Filtering API support
May mean more network traffic per filter
MTAs get e-mail filtering but are still open to outside world
Don't get mail queue control during downtime without setting up other MXs

It seems cooler, but maybe not better when supporting disjoint heterogenous mail servers. It may work better in a more uniform managed environment, say an end-to-end mail-service as opposed to "just" protecting someone else's servers.

Deployment: A Detour for Exchange

Microsoft Exchange is, for better or for worse, not going to go away anytime soon. The question is "How best to keep the viruses away from it?"

First, and perhaps only, relevant thing to remember:
You cannot rely upon SMTP filtering as a sole method of anti-virus for Exchange

For example, users can upload files to Exchange via HTTP, from desktops, PDAs, anything. Where have your users' Crackberrys been?

While it is always a good idea to "pre-filter" mail inbound to an Exchange server, you should also make use of Exchange's VSAPI to provide virus scanning of an item whenever a client requests it, not just when the item (message) is accepted and enqueued. Additionally, items are continually rescanned when virus definitions/signatures are updated.

Implementation: Our version

We went with Filtering MXs for deploying our virus filter.

Our guiding principle was "Free Beer Good".

Our tools:

OS: Solaris 10 on Sparc (Originally 8 and 9)
MTA: Open Source Sendmail
Filter I/O: milter
Filters:
- TrendMicro Viruswall (Sendmail Edition)
- SnertSoft Milter-Ahead (for RCPT Verification)
Perl
Caffeine

The Obligatory Pretty Diagram

Our Way: Hardware/OS (Free Beer)

The Systems: Solaris on Sparc

vw1
- Solaris 8 (Soon 10), Netra X1 (500Mhz, 512MB, 40GB)
- originally a general purpose system for our group
- supports jumpstart installs, syslog, staff logins
vw2 - put into place about 6 months later
- Solaris 9 (Now 10), Ultra 2 2300 (2x300Mhz, 1GB, 8GB)
- built from spare parts and decomissioned systems
- basically a free system.
- another group gave us a storage array - will soon do syslogging

Note:
We did not have to spend any extra money to obtain hardware in order to provide virus filtering for our customers. Free Beer Good.

Our Way: Sendmail (Cheap Beer)

Roll our own from source, or use the binaries in Solaris?

Solaris 8's sendmail lacked milter support, so obvious choice there
Solaris 10's sendmail has milter support (including libmilter.so)
- Solaris 10 sendmail supports Berkeley DB but no separate library
- Sun took too long to release public patches for sendmail

Complications:
Building from source only creates a static libmilter by default

not a problem if milters are built from source but
- Commercial Sendmail provides a shared libmilter.so
- Commercial (binary-only) milters rely upon shared libmilter.so
- Trendmicro provides instructions on how to build libmilter.so from Open Source.
Sometimes, libmilter interface changes slightly (older libmilters included more symbols from sendmail that milters came to expect) but easy to deal with.

Aside from our time, we got this for low/no cost and we learned a bit.

Our Way: Dealing with the SMTP parts

Configure sendmail via /etc/mail/relay-domains to relay email

# domains for which we relay mail
# This file is read in only sendmail starts or is sent SIGHUP.
#
cool.EECS.Berkeley.EDU
hot.EECS.Berkeley.EDU
rad.EECS.Berkeley.EDU
here.CS.Berkeley.EDU
there.EECS.Berkeley.EDU

Configure sendmail via /etc/mail/mailertable for [e]smtp/lmtp handoff

# domainname	esmtp:[next-hop-server]
# note use of []'s to suppress MX lookup
# pipe into '/usr/sbin/makemap hash /etc/mail/mailertable' or
# run '/usr/sbin/makemap hash /etc/mail/mailertable < /etc/mail/mailertable'
#
cool.EECS.Berkeley.EDU	esmtp:[cool.EECS.Berkeley.EDU]
hot.EECS.Berkeley.EDU   esmtp:[cool.EECS.Berkeley.EDU]
rad.EECS.Berkeley.EDU   esmtp:[awesome.EECS.Berkeley.EDU]
here.CS.Berkeley.EDU	esmtp:[here.CS.Berkeley.EDU]
there.EECS.Berkeley.EDU esmtp:[here.CS.Berkeley.EDU]

May need to enable mailertable in sendmail.mc and rebuild sendmail.cf:

FEATURE(mailertable, `hash -o /etc/mail/mailertable')

Our Way: Dealing with the SMTP parts - meta-config file

A perl script creates /etc/mail/{mailertable,relay-domains} from a config file.

vw-config.cf:
# sourcefile for /etc/mail/{mailertable,relay-domains} on vw systems
#
# format:
# server SERVER-NAME Freeform comments (can have spaces)
#        CLIENT-NAME
# ^^^^^^ whitespace optional, used for readability
#
# "vw-config" command used to /etc/mail/{mailertable,relay-domains}
#
server cool.EECS.Berkeley.EDU The Coolest Server in Town
    cool.EECS.Berkeley.EDU
    hot.EECS.Berkeley.EDU
server awesome.EECS.Berkeley.EDU A Pretty Awesome Server
    rad.EECS.Berkeley.EDU
server here.CS.Berkeley.EDU mail/ftp/webserver for the Nowhere Group
    here.CS.Berkeley.EDU
    there.EECS.Berkeley.EDU
...

Note: This was a minimalist approach to things. Other options include using an SQL database, LDAP, or some other setup to store this info, as long as you can get it into the form your MTA (sendmail here) can use.

Our Way: Milters - a detour

Before going into specific milters, lets look at how to configure the MTA to use them. Sorry, this is currently Sendmail specific, but here are some notes for Postfix. You should first read the Milter Installation and Configuration page, but here are the important bits for the impatient:

Enable milter support in your MTA (sendmail example here). Building from scratch? Add this to your build config. (newer versions of sendmail may not need this)
```
APPENDDEF(`conf_sendmail_ENVDEF', `-DMILTER')
```

Add lines like this to your sendmail.mc (or local moral equivalent):

INPUT_MAIL_FILTER(`f1', `S=unix:/var/run/f1.sock, F=R')
INPUT_MAIL_FILTER(`f2', `S=unix:/var/run/f2.sock, F=T, T=S:1s;R:1s;E:5m')
INPUT_MAIL_FILTER(`f3', `S=inet:999@localhost, T=C:2m')
dnl can set specific order of filters, else go in order of definition
dnl define(`confINPUT_MAIL_FILTERS', `f2,f1,f3')

Rebuild your sendmail.cf and watch the mail flow and get filtered (or not)

Our Way: Virus Filter Software (Almost Free Beer)

TrendMicro Viruswall (Sendmail Edition)

Has milter support but doesn't know how to use Unix-domain sockets

Can be made to bind to specific address and port, say 127.0.0.1:2701 /etc/iscan/intscan.ini:

Open to the world (modulo firewalls)	Only available to localhost:
[ismilter] svcport=inet:2701	[ismilter] svcport=inet:2701@127.0.0.1

Has a [stupid] web interface that we turn off:
mv /etc/rc2.d/S99ScanHttpd /etc/rc2.d/_s99ScanHttpd
Installer sets up hourly virus defnition updates via cron. We upped this to every 15 minutes, with our two systems overlapping by about half of that.
Only scans for viruses in attachments. Other filters scan entire message

Campus had a license for Trendmicro VirusWall (later, only our department licensed it) which is based on the number of users, not number of systems. So, essentially, the beer was already paid for.

Our Way: RCPT Verification Software (Almost Free Beer)

SnertSoft milter-ahead

It has milter support (duh)
It does pre-queue filtering
Knows how to use Unix-domain sockets
Can be made to bind to socket or specific address and port
- Only available to processes with file level access
  /usr/sbin/milter-ahead unix:/var/run/ahead.sock
- Only available to localhost
  /usr/sbin/milter-ahead inet:2702@127.0.0.1
- Only available to private network
  /usr/sbin/milter-ahead inet:2702@192.168.233.12

This milter used to be be Free Beer until around version 1.x. Now it costs a whole 90€ for a site license — still Pretty Cheap Beer.

Our Way: RCPT verification milter and Sendmail

SnertSoft's milter-ahead makes use of Sendmail's own environment:

/etc/mail/mailertable
- Information on which server to contact for RCPT verification
- We already maintain this for our MTA configuration — Free Beer.
/etc/mail/access
- Popular for access control information in Sendmail
  We use this already for other purposes.
- Optional for basic milter-ahead usage.
- Milter-ahead can make very complex use of the access table
Berkeley DB
- As mentioned, Solaris doesn't ship devtools for this in Solaris 10
- We had to build it and make sure that we kept the compiler flags consistent

Our Way: Connecting Sendmail to the Milters

We have a sendmail with milter support and milters, now to connect them.

This in


/etc/mail/sendmail.mc:

... [all the usual stuff]
INPUT_MAIL_FILTER(`milter-ahead',`S=unix:/var/run/milter-ahead.sock,F=T,T=C:1m;S:30s;R:6m;E:5m')
INPUT_MAIL_FILTER(`virus',`S=inet:2701@127.0.0.1,F=T,T=S:2m;R:2m;E:5m')
... [rest of your config]

Results in this in /etc/mail/sendmail.cf:

...
Xmilter-ahead, S=unix:/var/run/milter-ahead.sock, F=T, T=C:1m;S:30s;R:6m;E:5m
Xvirus, S=inet:2701@127.0.0.1, F=T, T=S:2m;R:2m;E:5m
...

With milter-ahead before virus-scanning, we don't have to virus-scan email for bogus recipients, but to change the order, use this in sendmail.mc:

define(`confINPUT_MAIL_FILTERS', `virus,milter-ahead')

to get this in sendmail.cf:

O InputMailFilters=virus,milter-ahead

Implementation: The Big Mail Server Farm

You have lots of users, a decent budget, and time to set it up right. sure.

Some of the big parts:

Big Farm of MTAs accepting from the world at large
A Big Bad Mailstore that can hold it all
A Big IMAP Server Farm so users can get mail
SMTP Relays for all your users to use
Using proper authentication/authorization methods, of course
And, of course, Systems to keep out spam and viruses

And of course, another diagram

The Big Mail Server Farm

Wait a minute, how does the MX Farm talk to the Milter Farm?

Well, you can buy a load balancer but there is a a somewhat clever, likely horrible, yet totally approved* hack.

INPUT_MAIL_FILTER(`milter',`S=inet:2701@milterfarm.cs.berkeley.edu,F=T,T=S:2m;R:2m;E:5m')

# in DNS
milterfarm	IN	A	10.0.0.10
			A	10.0.0.11
			A	10.0.0.12
			A	10.0.0.13
			...

That's right, DNS roundrobin -- the cheap person's load sharing (not balancing!) system. I asked one of the guys (the guy) behind milter, and he said this was how to do it. I don't remember if this made use of something in sendmail (in which case this won't work with Postfix), or it was something internal to milter. There are some alternatives:

Just install the milter on all the MX systems
Write a "meta"-milter to do the roundrobin load sharing explicitly
See if milter gets up more configurable failover/load sharing capabilities

Crazy Ideas 1

Okay, so you can load share among a farm of servers running a milter, but how can one failover from a milter accessed over a local/unix socket to one accessed via a network socket?

Well, it sounds a little crazy, but instead of setting up the local milter to use a local unix socket, set it up to run over the loopback interface on 127.0.0.1 and add 127.0.0.1 to your DNS round robin.

I've spent all this time (or money) on a huge milter farm, but it's not getting used enough! What can I do to justify this to my boss?

Well, if you spent of time or money setting up anti-virus, it makes sense to try and use it for other things. I just hope you chose a anti-virus scanning engine that can be easily used for something other than SMTP, like ClamAV which can be used for a number of other purposes such as a virus-scanning http proxy.

And, of course, there was that slide back during Filter I/O

Odds and Ends

A check_expn rule I wrote a while ago. This also works for use as a check_vrfy rule. It could probably use a little clean up or canonicalizing into Sendmail Standard Form.

Basic logic:
look up foo.domain.com in /etc/mail/access using the F rule and look for an entry like this:

EXPN:webmaster@foo.domain.com		OK
EXPN:postmaster				OK
EXPN:root@bar.domain.com		DENY

If not, use the A rule to look up the IP address or octet based netblock in /etc/mail/access. (Don't think anyone has done CIDR in sendmail.cf yet ...)

EXPN:127.0.0.1		OK # allow localhost to expn
EXPN:128.32.0.0		OK # allow UCB-ETHER to expn

sendmail.mc:
...
LOCAL_RULESETS
Scheck_expn
R$*			$: $>F <$1> <?> <! Expn> <$1>
R<?> <$*>	    	$: <$&{client_addr}> <$1>
R<$*> <$*>		$: $>A <$1> <?> <! Expn> <$2>
R<OK> <$*>		$@ $1
R$*			$#error $@ 5.7.0 $: "502 sorry, we do not allow this operation."

References/Resources

Standards, Policies, APIs
- UCB Policy
- SMTP aka RFC 2821
- Sendmail Milter
MTAs
MUA/Delivery Time Filters

Vendors
- Commercial
- OpenSource/Free/Cheap/Source
  - SnertSoft aka milter.info
  - SpamAssassin
  - ClamAV

This presentation was done in S5, the xhtml/css presentation system.

Thanks and Acknowledgements

I'd like to thank the following people for their feedback and encouragement:

Alex Brown, my old boss in CUSG for first encouraging me to write this up.
Lars Rohrbach, for giving me the time to continue writing this up.
Emrys Ingersoll, for showing me S5, the xhtml/css presentation system.
Tom Maher, Rob McNicholas, and Mark Kraitchman for their feedback.
Jake Harwood, Chris Ashley, and Karen Eft for getting me to present here.
Paolo Soto, for one final review of the slides.
Murray Kucherawy, for answering my milter questions.
The rest of my co-workers for their patience as I finished this up.
The almighty Caffeine.

Wednesday September 27, 2006

Scalable E-Mail Filtering, © 2006

Scalable E-Mail Filtering Methods and Techniques UC Berkeley Security SIG

Jon Kuroda

UC Berkeley EECS, CUSG

jkuroda[at]EECS[dot]Berkeley[dot]EDU

http://www.EECS.Berkeley.EDU/~jkuroda/talks/mailfiltering/

Note: press space to go forward. pageup/down keys work too This is an (x)html document in the S5 presentation system, so there are lots of links one can follow. Diagrams are in separate SVG documents which requires browser support — recent FireFox and Opera work.

Some Alternate Titles

What's this all about?

Topics (and Non-Topics)

Caveats

A (Very) Brief History of E-mail Servers

Once upon a time ...

It's 2003 in 399 Cory Hall ...

Filters: An (Over) Simplified Look

Filters: The Engine

The Ins and Outs of Filter I/O

Filter I/O: Pre/Post-queue Filtering, a Slide Without a Home

Filter I/O: Pre/Post-queue Filtering: Consequences

Filter I/O: MTA Plug-in

Filter I/O: SMTP-aware Filter

Filter I/O: API/Protocol

Filter I/O: SMTP and Milter compared

Filter I/O: Dual MTA and Milter Compared

Filter I/O: Transparent (or is it Opaque?) Proxy (Bonus Slide)

Filter: Side Effects and Other Random Bits

Deployment: What to do with these tools

Deployment: Install It Everywhere

Deployment: MX Filter in a Nutshell

Deployment: MX Filter - Pro and Cons

Deployment: Network Filter Service in a Nutshell

Deployment: Network Filter Service - Pro and Cons

Deployment: A Detour for Exchange

Implementation: Our version

Our Way: Hardware/OS (Free Beer)

Our Way: Sendmail (Cheap Beer)

Our Way: Dealing with the SMTP parts

Our Way: Dealing with the SMTP parts - meta-config file

Our Way: Milters - a detour

Our Way: Virus Filter Software (Almost Free Beer)

Our Way: RCPT Verification Software (Almost Free Beer)

Our Way: RCPT verification milter and Sendmail

Our Way: Connecting Sendmail to the Milters

Implementation: The Big Mail Server Farm

The Big Mail Server Farm

Crazy Ideas 1

Odds and Ends

References/Resources

Thanks and Acknowledgements

Scalable E-Mail Filtering
Methods and Techniques
UC Berkeley Security SIG

Note: press space to go forward. pageup/down keys work too
This is an (x)html document in the S5 presentation system, so there are lots of links one can follow. Diagrams are in separate SVG documents which requires browser support — recent FireFox and Opera work.