Analyse Your Web Server in 10 Minutes

By: James Andrews
Monday, August 23, 1999 05:42:20 PM EST
URL: http://www.linuxplanet.com/linuxplanet/tutorials/929/1/

Introduction

If you are trying to run a web site, perhaps the top questions are "Where is my traffic coming from? Who are these people and why are they visiting my site?" To answer these questions the initial course of action is to examine the log files. One of the acknowledged leaders in logfile analysis on Unix-based servers is the `analog' program. analog is available on Linux, of course, and that's what we will be looking at here. Within 10 minutes you will have

  • Installed analog from a package
  • Edited an analog configuration file
  • Set up a cron job to do daily reports

    Analog Install and Setup

    Find the name 'analog' in your package manager and install it. Redhat calls it "analog-3.0-1.i386.html", Debian "analog"

    See our article on package managers for more details.

    Tweaking the configuration

    Analog has many many many configuration options for squeezing all kinds of bizarre statistics out of your log server files. But the clock is ticking, so we won't get into that now. Here is a simple sample file to get you started


    DNS WRITE
    DNSFILE /tmp/dnsfile.txt
    HOSTNAME "LinuxPlanet"
    HOSTEXCLUDE mordell.ex.ac.uk
    OUTFILE /home/james/public_html/outputfile.html

    The first two lines are to speed up host name lookup, and the HOSTNAME is part of the titles used to pretty up the report. HOSTEXCLUDE is used to ignore hosts we aren't interested in, and OUTFILE is where the report output goes. The report is generated in html format, so load it into your browser.

    Help! The graphics look wrong

    The images in the reports should look ok. The package installer will put them in the right places for analog to pick them up. However, if they don't appear, then look at the URLs. (In Debian Linux the images are held at /usr/doc/analog/images/, but the html says /doc/analog/images. This is because of Debian policy: the /usr/doc tree is supposed to be web accessible.) To get back to the point ..to fix this problem, look at your web server configuration files. These are in /etc/apache or /var/lib/httpd. Edit srm.conf and http.conf. One will have a section with Aliases; the exact file varies with the version of Apache. Add an alias like:


    Alias /doc/analog/images/ /usr/doc/analog/images/

    and restart the server with the command apachectl graceful

    Then it should be fine. If it isn't then you are accessing the server report with a file:/ url, use http:// instead and all will be well.

    Making the report run every day

    Next you want the report to run every day; type in crontab -e at your shell prompt and then add this line to the file:

    0 3 * * * /usr/bin/analog +g/home/james/analog.ini

    This means run the command given at 3:00 a.m. daily.

    See the tutorial "Time for Linux", that explains time-keeping and scheduling jobs on Linux, for more details.

    If You Have More Time

    That's your ten minutes up! You should now have things set up to be able to get basic information, updated daily, about how your web server performs. But there are always other little tweaks you can try.

    If you have more time

    If the cron job is mailing you each time it runs saying it cannot run a "referrer report" and you want a list of which other sites are linking to you, then you will need to edit your http.conf file to make sure that referrer data is being generated.

    If you read the excellent analog docs and start adding features, or more likely trimming the data in the report, then here are a few tips. First there is an option -settings to dump analog's state. This can be very useful if you have made a typo or picked the wrong options by mistake, as it in effect "reads back" your analog options file. It will also show you the default settings. Another tip: if you use DNS WRITE and a DNSFILE, analog works so fast that experimenting is easy. Once all the DNS information is cached, analog will process more than a megabyte of logfile content per second. So just alter the settings and run it again and again.

    A more complex configuration file


    MONTHLY OFF
    WEEKLY OFF
    DAILY OFF
    REFREPEXCLUDE http://www.linuxplanet.com/linuxplanet/*
    FILEEXCLUDE /ads/*
    DNS WRITE
    DNSFILE /tmp/dnsfile.txt
    HOSTNAME "Linuxplanet"
    DOMAINSFILE /root/alli
    REFSITE ON
    DOMAIN ON
    REFERRER ON

    Here are some selected further options for analog. MONTHLY OFF, WEEKLY OFF and DAILY OFF turn off the reports that deal with months, weeks and days. If you are running the report every day you might like to ignore this data. REFREPEXCLUDE http://www.linuxplanet.com/linuxplanet/* means: in the referring URL report ignore all requests from that site. In this case I do not want to see any internal links on Linuxplanet, only the links from outside. FILEEXCLUDE /ads/* ignores the adverts in the stats. As the content editor on Linuxplanet, the hit rates on the adverts is not of much interest to me.

    The other ON directives make sure that the reports I do want are on. REFERRER would be off by default.

    What you do with the statistics this article offers is up to you. Take a look at this excellent guide to web server log analysis at The Web Designers Virtual Library for some ideas.

    Copyright Jupitermedia Corp. All Rights Reserved.