|
Analyse Your Web Server in 10 Minutes
IntroductionIf you are trying to run a web site, perhaps the top questions are "Where is my traffic coming from? Who are these people and why are they visiting my site?" To answer these questions the initial course of action is to examine the log files. One of the acknowledged leaders in logfile analysis on Unix-based servers is the `analog' program. analog is available on Linux, of course, and that's what we will be looking at here. Within 10 minutes you will have
Analog Install and SetupFind the name 'analog' in your package manager and install it. Redhat calls it "analog-3.0-1.i386.html", Debian "analog" See our article on package managers for more details. Tweaking the configurationAnalog has many many many configuration options for squeezing all kinds of bizarre statistics out of your log server files. But the clock is ticking, so we won't get into that now. Here is a simple sample file to get you started DNS WRITE
The first two lines are to speed up host name lookup, and the HOSTNAME is part of the titles used to pretty up the report. HOSTEXCLUDE is used to ignore hosts we aren't interested in, and OUTFILE is where the report output goes. The report is generated in html format, so load it into your browser. Help! The graphics look wrongThe images in the reports should look ok. The package installer will put them in the right places for analog to pick them up. However, if they don't appear, then look at the URLs. (In Debian Linux the images are held at /usr/doc/analog/images/, but the html says /doc/analog/images. This is because of Debian policy: the /usr/doc tree is supposed to be web accessible.) To get back to the point ..to fix this problem, look at your web server configuration files. These are in /etc/apache or /var/lib/httpd. Edit srm.conf and http.conf. One will have a section with Aliases; the exact file varies with the version of Apache. Add an alias like: Alias /doc/analog/images/ /usr/doc/analog/images/
and restart the server with the command Then it should be fine. If it isn't then you are accessing the server report with a file:/ url, use http:// instead and all will be well. Making the report run every dayNext you want the report to run every day; type in This means run the command given at 3:00 a.m. daily. See the tutorial "Time for Linux", that explains time-keeping and scheduling jobs on Linux, for more details.
If You Have More TimeThat's your ten minutes up! You should now have things set up to be able to get basic information, updated daily, about how your web server performs. But there are always other little tweaks you can try. If you have more timeIf the cron job is mailing you each time it runs saying it cannot run a "referrer report" and you want a list of which other sites are linking to you, then you will need to edit your http.conf file to make sure that referrer data is being generated. If you read the excellent analog docs and start adding features, or more likely trimming the data in the report, then here are a few tips. First there is an option -settings to dump analog's state. This can be very useful if you have made a typo or picked the wrong options by mistake, as it in effect "reads back" your analog options file. It will also show you the default settings. Another tip: if you use DNS WRITE and a DNSFILE, analog works so fast that experimenting is easy. Once all the DNS information is cached, analog will process more than a megabyte of logfile content per second. So just alter the settings and run it again and again. A more complex configuration file
Here are some selected further options for analog. MONTHLY OFF, WEEKLY OFF and DAILY OFF turn off the reports that deal with months, weeks and days. If you are running the report every day you might like to ignore this data. REFREPEXCLUDE http://www.linuxplanet.com/linuxplanet/* means: in the referring URL report ignore all requests from that site. In this case I do not want to see any internal links on Linuxplanet, only the links from outside. FILEEXCLUDE /ads/* ignores the adverts in the stats. As the content editor on Linuxplanet, the hit rates on the adverts is not of much interest to me. The other ON directives make sure that the reports I do want are on. REFERRER would be off by default. What you do with the statistics this article offers is up to you. Take a look at this excellent guide to web server log analysis at The Web Designers Virtual Library for some ideas.
|