Friday, January 19, 2007

Server Monitoring: Nagios

As I mentioned last time, I am looking for an easy, free, stable, and highly functional network monitoring system. Nagios is my first venture into this investigation. Nagios has been around for awhile, and I believe that I evaluated it a few years ago before throwing my hands up in frustration and doing a quick and dirty solution myself. In all fairness, I was incredibly busy at the time, and probably didn't give it a fair shake.

According to the website, it will do exactly what I want:

Nagios is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser.
Well, I've never been much of one for believing in the whole truth in advertising thing, so I decided to give it a go on my own, and see how it works.

The network I decided to test it on consists of six machines. In addition, I have two development servers in an external data center that were doing nothing but humming, so I elected to include them in the tests. The machines are a mixture of FreeBSD and a couple of different Linux distros (CentOS and Debian).

Prior to actually installing this package, I did a bit of reading on their website. They have a number of helpful screenshots; here are a few that were of interest to me.

This is the status detail screen (and it would appear that someone is having a bad day with this particular network!). It looks quite helpful, and provides a good "dashboard" view of the various processes on a given machine.

Now this is interesting -- a status map of a network segment. I'm not sure how you define the map (but suspect it is painful), but it is an interesting method of graphically representing the layout of various workstations and servers. Nice touch.

This gives the bird's eye view of all monitored services. Simple, and effective. It seems that you can group services together, which would be very helpful.

After I finished amusing myself with screenshots and the propoganda on their site, I went over to the Wikipedia entry to see what it had to say. It's a short article, and simply lists the services it monitors, has pointers to helpful install guides, and mentions that it came out in 2002, when it used to be called NetSaint. The talk page had this obscure comment:
I would say this is a very handy application when argumented with Cacti. I just set up one and its really cool to see it in action.
I presume "argumented" was supposed to be "augmented", but there you go. But was is this "Cacti"? I'll have to find out.

I'm going to try a test install of this on the weekend. I'll keep you posted.