February 22, 2019

How I Caught the Spam and What I Did With it When I Caught it

An Unpleasant Surprise

  • October 14, 1999
  • By Mark-Jason Dominus
In May of 1994, while I was reading the phl.food newsgroup, I saw something new. It was a message with this subject:

����U.S. Green Card Lottery - New Immigration Opportunity

That was not what I expected to see in phl.food, so I wrote to the author:

����This has nothing at all to do with food, and you posted it to
����phl.food. Please be more polite in the future and keep
����announements in relevant and appropriate groups.

And he sent me a reply:

����People in your group are interested. Why do you wish to
����deprive them of what they consider to be important

I was really startled. Na�vely I had expected that the author would recognize that he had done something incorrect once it was pointed out to him. Gosh, was I wrong!

That was the beginning of my life with spam. It was the now infamous `Green Card Spam' from Lawrence Canter and Martha Siegel, a pair of incompetent lawyers. But they were on the leading edge of a big trend. Within two years the newsgroups were clogged with spam, and at the same time email spam was becoming common.

Spam, Spam Everywhere

The email spam was really starting to bother me towards the end of 1996, when I was getting several junk messages each week. (How quaint that seems now!) I tried to figure out what to do about the spam. Some of the plans worked out well. Some were instant failures. Some were failures but it took me years to decide that they didn't work---those are the most interesting ones. My basic problem was that it annoyed me to get spam mail in my inbox every morning. I wanted to address this problem directly. Complaining back to the the source wasn't what I was looking for, because I would still be annoyed. One solution would have been to stop getting annoyed, but I didn't have much luck with that, so I tried to think of ways to prevent the spam from getting into my inbox in the first place. I started thinking about filtering.

The idea of filtering is simple: write a program that will examine each incoming message, and if it looks like spam, throw it away. If not, the program delivers it to my inbox as usual.

The idea is simple, but the implementation is hard. What does spam look like? And if I couldn't recognize it infallibly, what would happen when I made a mistake?

Failing to recognize spam is called a false negative, and is only a minor problem. The only thing that would happen was that my blood pressure would go up imperceptibly. But the opposite false positive problem, of incorrectly recognizing and discarding some message that wasn't spam, was much more frightening. If I erroneously identified a non-spam message as spam, and threw it away, I might lose important messages from clients, or from my lawyers, or from someone else important. So I decided that whatever happened no message must ever be thrown away until I had had a chance to look at it. And messages that are rejected mustn't go into an oubliette somewhere unless someone is notified. I had nightmares about getting a call from my lawer, asking why I hadn't gotten an important communication the previous month, and having to say it had gone into my spam box.

Most Popular LinuxPlanet Stories