How I Caught the Spam and What I Did With it When I Caught it
An Unpleasant Surprise

Mark-Jason Dominus
Thursday, October 14, 1999 03:43:35 PM
In May of 1994, while I was reading the phl.food newsgroup, I saw something new. It was a message with this subject:
U.S. Green Card Lottery - New Immigration Opportunity
That was not what I expected to see in phl.food,
so I wrote to the author:
This has nothing at all to do with food, and you posted it to
phl.food. Please be more polite in the future and keep
announements in relevant and appropriate groups.
And he sent me a reply:
People in your group are interested. Why do you wish to
deprive them of what they consider to be important
information??
I was really startled. Naïvely I had
expected that the author would recognize that
he had done something incorrect once it was
pointed out to him. Gosh, was I wrong!
That was the beginning of my life with spam. It
was the now infamous `Green Card Spam' from
Lawrence Canter and Martha Siegel, a pair
of incompetent lawyers. But they were on the
leading edge of a big trend. Within two years
the newsgroups were clogged with spam, and at
the same time email spam was becoming common.
Spam, Spam Everywhere
The email spam was really starting to bother
me towards the end of 1996, when I was getting
several junk messages each week. (How quaint
that seems now!) I tried to figure out what
to do about the spam. Some of the plans
worked out well. Some were instant failures.
Some were failures but it took me years to
decide that they didn't work---those are the
most interesting ones.
My basic problem was that it annoyed me to get
spam mail in my inbox every morning. I wanted to
address this problem directly. Complaining back
to the the source wasn't what I was looking for,
because I would still be annoyed. One solution
would have been to stop getting annoyed, but I
didn't have much luck with that, so I tried to
think of ways to prevent the spam from getting
into my inbox in the first place. I started
thinking about filtering.
The idea of filtering is simple: write a program
that will examine each incoming message, and
if it looks like spam, throw it away. If not,
the program delivers it to my inbox as usual.
The idea is simple, but the implementation
is hard. What does spam look like? And if I
couldn't recognize it infallibly, what would
happen when I made a mistake?
Failing to recognize spam is called a false
negative, and is only a minor problem. The
only thing that would happen was that my blood
pressure would go up imperceptibly. But the
opposite false positive problem,
of incorrectly recognizing and discarding
some message that wasn't spam, was much more
frightening. If I erroneously identified a
non-spam message as spam, and threw it away,
I might lose important messages from clients,
or from my lawyers, or from someone else
important. So I decided that whatever happened
no message must ever be thrown away until I had
had a chance to look at it. And messages that
are rejected mustn't go into an oubliette
somewhere unless someone is notified. I
had nightmares about getting a call from my
lawer, asking why I hadn't gotten an important
communication the previous month, and having
to say it had gone into my spam box.
Next: Filtering Strategies »