Back to article
How I Caught the Spam and What I Did With it When I Caught it
An Unpleasant Surprise
October 14, 1999
In May of 1994, while I was reading the phl.food newsgroup, I saw something new. It was a message with this subject:
����U.S. Green Card Lottery - New Immigration Opportunity
That was not what I expected to see in phl.food, so I wrote to the author:
����This has nothing at all to do with food, and you posted it to
And he sent me a reply:
����People in your group are interested. Why do you wish to
I was really startled. Na�vely I had expected that the author would recognize that he had done something incorrect once it was pointed out to him. Gosh, was I wrong!
That was the beginning of my life with spam. It was the now infamous `Green Card Spam' from Lawrence Canter and Martha Siegel, a pair of incompetent lawyers. But they were on the leading edge of a big trend. Within two years the newsgroups were clogged with spam, and at the same time email spam was becoming common.
Spam, Spam EverywhereThe email spam was really starting to bother me towards the end of 1996, when I was getting several junk messages each week. (How quaint that seems now!) I tried to figure out what to do about the spam. Some of the plans worked out well. Some were instant failures. Some were failures but it took me years to decide that they didn't work---those are the most interesting ones. My basic problem was that it annoyed me to get spam mail in my inbox every morning. I wanted to address this problem directly. Complaining back to the the source wasn't what I was looking for, because I would still be annoyed. One solution would have been to stop getting annoyed, but I didn't have much luck with that, so I tried to think of ways to prevent the spam from getting into my inbox in the first place. I started thinking about filtering.
The idea of filtering is simple: write a program that will examine each incoming message, and if it looks like spam, throw it away. If not, the program delivers it to my inbox as usual.
The idea is simple, but the implementation is hard. What does spam look like? And if I couldn't recognize it infallibly, what would happen when I made a mistake?
Failing to recognize spam is called a false
negative, and is only a minor problem. The
only thing that would happen was that my blood
pressure would go up imperceptibly. But the
opposite false positive problem,
of incorrectly recognizing and discarding
some message that wasn't spam, was much more
frightening. If I erroneously identified a
non-spam message as spam, and threw it away,
I might lose important messages from clients,
or from my lawyers, or from someone else
important. So I decided that whatever happened
no message must ever be thrown away until I had
had a chance to look at it. And messages that
are rejected mustn't go into an oubliette
somewhere unless someone is notified. I
had nightmares about getting a call from my
lawer, asking why I hadn't gotten an important
communication the previous month, and having
to say it had gone into my spam box.