Back to article
Keep Tabs on Network Services with Nagios
Nagios Ain't Gonna Insist On Sainthood
June 12, 2006
Nagios provides an advanced server and device monitoring solution. It has become the de facto standard among other service monitoring applications, and is highly competitive with the non-free ones. This article will explain why Nagios is useful, and then cover some installation concepts to help get you started.
Previously known as the NetSaint project, Nagios morphed into its current form about four years ago. Nagios can monitor servers and their services, as well as network devices. More primitive monitoring solutions only allow for a simple ping to detect whether or not a server is still up and running. All too often administrators find that a server will respond to pings, but no services are actually working. Nagios connects to various different network services to test for functionality. To test, for example, a mail server, it will connect and wait to get the SMTP greeting before it declares the service operational. Nagios will monitor most common network-based services out of the box, and plug-ins exist for most anything else.
Plug-ins is where Nagios really shines. People have written countless feature extensions for Nagios, from SNMP-based queries to instant messenger hooks that allow notices to be sent via ICQ. Nagios has the built-in ability to send notifications of outages to a group of administrators, normally done via an email-to-pager gateway. By utilizing the available plug-ins, you can configure Nagios's notifications in many ways. One of the more popular plug-ins is a daemon that receives and generates alerts based on SNMP traps. The Nagios Exchange is a forum for finding and exchanging useful plugins. Browse around and see what some creative people have done. They've even created a Nagios live CD that you can use for testing (it comes preinstalled on a Knoppix disk).
Nagios and its many plug-ins can monitor an astounding number of services, and more. It can check, intelligently, all of the standard services: Web, SSH, telnet (gasp), ftp, etc. But that's only the backend. Nagios wouldn't be very useful if it didn't notify people of these outages. It does, and it also provides a very intuitive Web page. But we aren't talking about just a simple page that displays some errors, a la syslog output. Nagios presents a control center from which you can monitor, acknowledge, and view the history of all your alerts.
From the webpage you can view all of your hosts, and their status, in an easily read red-equals-bad, green-equals-good interface. But wait, there's more, as the infomercials would say. Nagios also allows users to comment on an event and includes a "schedule downtime" feature. When you comment on an alert, everyone else knows that you're working on it, or at least that you've acknowledged the failure and returned to bed. When scheduling a downtime, Nagios suppresses all notifications of failures so that your pager battery isn't depleted during the window of time that you expect things to be broken.
All of this software still wouldn't be useful if you had to hire an administrator to maintain Nagios, like some commercial applications require. As you might have guessed, Nagios is fairly painless to get up and running. The concept is just like most other open source applications: download, compile, install, configure.
There are a few things to realize, though. Nagios needs to be able to SSH into your servers, and you probably want to run it as its own user. So use your account management system and create a 'nagios' user on all of your machines.