How I Caught the Spam and What I Did With it When I Caught it
How the Mail Gets Into our Filtering Program

Mark-Jason Dominus
Thursday, October 14, 1999 03:43:35 PM
Our filter program will be run by the mail
transport agent (MTA), which is the program
that is responsible for receiving mail from
the network and for delivering it to the right
place. With sendmail, for example,
you can put a line like this one into your
.forward file:
"| /home/mjd/bin/mailfilter"
When mail arrives, sendmail will run
the mailfilter program and hand
it the mail message on its standard input.
mailfilter can then decide whether
to deliver the message by writing it to the
mailbox, or whether to throw it away, or whether
to do something else. Most MTAs have an option
to deliver mail to a program in this way. My
system was using the superb qmail
MTA, so I would put the line
| /home/mjd/bin/mailfilter
into my ~/.qmail
file. (Actually my filter program is named
deliver.aol.q2. Please don't ask
why, because I don't remember.)
Reading the Message
What does this mailfilter program
need to do? Obviously, the first thing it must
do is read the mail message in from the standard
input. Code to read in an email message is very
simple in Perl:
1 { local $/ = "";
2 $header = <STDIN>;
3 undef $/;
4 $body = <STDIN>;
5 }
This reads the header of the message into
$header and the body into
$body. What's going on here? The
Perl <...> operator reads
a line of input from some filehandle. But
what's a line? Normally, it's any sequence
of characters that's terminated by a newline
character. Why a newline? Because that's the
default setting of the Perl $/
special variable. If you change $/,
that changes Perl's idea of what a line looks
like. If you changed it to contain a period,
then Perl would think that a `line' was any
sequence of characters that ended with a period.
There are two special settings for
$/, however. If you set
$/ to the empty string, as I
did on line 1, the <...>
operator reads by paragraphs instead of lines;
consecutive paragraphs are separated by a blank
line. Each call to <...>
reads in one complete paragraph. Since the
header of a mail message is a paragraph,
separated from the body by a blank line,
line 2 reads the entire header into
$header.
The local on line 1 confines the
changes to $/ to the block, so that when control reaches line 5, the original value
of $/ is automatically restored. We might be doing file I/O in other parts of our
program, and if we didn't put $/
back to normal we'd get weird results when the
<...> operator didn't behave
the way we expected. Using local
ensures that we won't forget to put it back the way it was.
On line 3 we see the other special setting
of $/. If $/ is
undefined, then there is no line termination
sequence, and Perl's <...>
operator reads the entire rest of the input all
at once. This is sometimes called `slurping'
the input. Line 4 reads the entire message body
into the variable $body.
Next: Message Headers »