October 30, 2014
 
 
RSSRSS feed

Charting and Graphing Logfiles for Linux Server Admins

Pie Charts: Who's sending spam?

  • March 10, 2011
  • By Akkana Peck
Some Linux server admins are comfortable with wading through text logfiles, but why wade when you can create beautiful charts and graphs that highlight trouble spots? Try the excellent CairoPlot for beautiful, informative visual server log analysis.
As a data junkie, I'm forever looking for better ways to display charts and graphs, especially from Python. There are lots of Python plotting packages available, but if you want output that's pretty enough that even your Mac friends won't laugh, the best bet I've found is CairoPlot.

CairoPlot isn't packaged for most distros, but it's an easy install. The current release is version 1.1 at the CairoPlot Launchpad page. You can download the cairoplot-1.1.tar.gz from there, or check it out with bzr if you prefer. (Once 1.2 is ready the project may move to Sourceforge.)

Extract the tarball:
$ tar xvf cairoplot-1.1.tar.gz
then copy one file, cairoplot-1.1/CairoPlot.py, to the directory where you'll be developing your Python script.

Pie Charts: Who's sending spam?

When playing with plotting, finding a good source of data is always the first step. For this project, let's analyze a Postfix log file, /var/log/mail.info to look at the sources of one class of spam.

A casual glimpse through the file reveals that we're getting a lot of mail delivery attempts where the sender claims an address that doesn't really exist, like this one:

Mar 5 15:05:45 mailserver postfix/smtpd[29764]: NOQUEUE: reject: RCPT from 212.199.94.45.static.012.net.il[212.199.94.45]: 450 4.7.1 <ex02.maccabiworld.org>: Helo command rejected: Host not found; from=<> to= proto=ESMTP helo=

Our postfix server rejects mail like this, because it's usually spam. Properly configured mail servers shouldn't make up bogus addresses -- though a few misconfigured ones do.

But where do these bogus requests come from? Do they come from specific countries? How many from .com or .org versus from specific country domains?

To find out, I'll create a Python dictionary, then use CairoPlot to plot a pie chart. Each key in the dictionary will be a top-level domain, e.g. "com"; the value will be the number of rejected messages seen from that domain.

Parsing the Log File

Filling out the dictionary means parsing /var/log/mail.info. The address each message really came from shows up in the RCPT from; get it using Python's re module. Since this is an article about CairoPlot, not Python regular expressions, just take my word for the code that follows.

#! /usr/bin/env python

import CairoPlot, re

MAIL_INFO = "/var/log/mail.info"

# Dictionary to store the results as (domain : number of rejects)
rejected = {}

# Parse mail.info to find all the 'NOQUEUE: reject' lines and
# figure out what top-level domains (TLDs) they're coming from.
f = open(MAIL_INFO)
for line in f :
    if line.find('status=sent') > 0 :
        pass
    elif line.find('NOQUEUE: reject') > 0 :
        # An attempt we rejected. Look for a pattern like
        # RCPT from foo.example.com[nnn.nnn.nnn.nnn]
        rcpt = re.search("RCPT from ([^[]*)\[([0-9\.]+)\]", line)
        if not rcpt :
            continue
        # Now rcpt.group(1) is the reverse-DNS hostname (if any)
        # from the log file, rcpt.group(2) is the IP address.
        if rcpt.group(1) and rcpt.group(1) != 'unknown' :
            hostname = rcpt.group(1)
        else :
            hostname = None

        # Find the part after the last "."
        tld = "Unknown"   # default there's no "." in the hostname
        if hostname :
            dot = hostname.rfind(".")
            if dot >= 0 :
                tld = hostname[dot+1:]
        if tld in rejected :
            # We've seen this TLD before; add 1.
            rejected[tld] += 1
        else :
            # First time we've seen this TLD.
            rejected[tld] = 1
f.close()

At the end of this, rejected is a dictionary suitable for passing to CairoPlot, like this:

{'ru': 3, 'ch': 1, 'ma': 2, 'rs': 2, 'it': 4, 'hu': 1, 'cz': 1, 'ar': 2, 'il': 35, 'br': 16, 'es': 1, 'co': 2, 'net': 4, 'com': 24, 'pl': 7, 'at': 2}
Sitemap | Contact Us