Learning NAGIOS 3.0
上QQ阅读APP看书,第一时间看更新

Storage Space

Making sure that a system is not running out of space is very important. A lack of disk space for basic paths such as /var/spool or /tmp might cause unexpected results throughout the entire system. Quotas that are not properly set up for home directories might also cause disk space to run out in a few minutes under certain circumstances.

Nagios can monitor storage space and warn administrators before such problems happen. It is also possible to monitor remote shares on other disks without mounting them. This would be useful for easily monitoring disk space on Windows boxes, without installing the dedicated Windows Nagios tools described in Chapter 10, Advanced Monitoring.

Virtual Memory Monitoring

Making sure that a system is not running out of swap space is essential to the system's correct behavior. Many operating systems have mechanisms that kill the most resource -intensive processes when the system is running out of memory, and this usually leads to many services not functioning properly—many vital processes are not properly respawned in such cases. It is, therefore, a good idea to monitor swap space usage, in order to be able to handle low memory issues on critical systems. Nagios offers a plugin to monitor each swap device independently, as well as the ability to monitor cumulative values. The syntax and description of these options are as follows:

check_swap [-a] [-v] -w limit -c limit

Values for the -w and -c options can be supplied in the form of <value>%, in which case the <value> percent must be free in order not to cause an exception to be generated. They can also be supplied in the form <value><unit> (for example, 1000k, 100M, 1G), and in this case, a test fails if less than the specified amount of swap space is available.

A sample definition of a check is as follows:

  define command
  {
    command_name  check_swap
    command_line  $USER1$/check_swap –w $ARG1$ -c $ARG2$
  }

Monitoring IDE/SCSI SMART

Nagios offers a standard plugin that uses SMART (Self-Monitoring, Analysis, and Reporting Technology System) technology to monitor and report the failure of disk operations. This plugin operates on top of the SMART mechanism and verifies the status of local hard drives. If supported by the underlying IDE and SCSI hardware, this plugin allows the monitoring of hard disk failures. The syntax is as follows:

check_ide_smart [-d <device>] [-i] [-q] [-1] [-O] [-n]

The table below provides a description of the accepted options:

A sample definition of a command to monitor a particular device and report failed tests is as follows:

  define command
  {
    command_name  check_ide_smart
    command_line  $USER1$/check_ide_smart –d $ARG1$ -1 –q -n
  }

Checking Disk Space

One of the most common checks is checking one or more mounted partitions for available space. Nagios offers a plugin for doing this. This plugin offers very powerful functionality, and can be set up to monitor one, several, or all partitions mounted in a system. The syntax for the plugin is as follows:

check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
           [-C] [-E] [-e] [-g group] [-k] [-l] [-M] [-m] [-R path ]
           [-r path] [-t timeout] [-u unit] [-v] [-X type] 
           [-d <database>] [-l <logname>] [-p <password>]

The most commonly-used options for this plugin are described in the following table:

Values for the -w and -c options can be supplied in the form <value>%, in which case <value> percent must be free in order not to cause a state to occur. They can also be specified in the form of <value><unit> (for example, 800k, 50M, and 4G) in which case, a test fails if the available space is less than the specified amount . Checks for inode availability (options -W and -K) can only be specified in the form <value>.

It is possible to check a single partition or specify multiple -p, -r or -R options, and check if all matching mount points have sufficient disk space. It is sometimes better to define separate checks for each partition so that if the limits are exceeded on several of these, each one is tracked separately. The sample check commands for a single partition and for all partitions are shown in the following examples:

  define command
  {
    command_name  check_partition
    command_line  $USER1$/check_disk –p $ARG1$ –w $ARG2$ -c $ARG3$
  }
  define command
  {
    command_name  check_local_partitions
    command_line  $USER1$/check_disk –A –l –w $ARG1$ -c $ARG2$
  }

Both of these commands expect warning and critical levels, but the first example also requires a partition path or device as the first argument. It is possible to build more complex checks either by repeating the -p parameter or by using -r to include several mount points.

Testing Free Space for Remote Shares

Nagios offers plugins that allows the monitoring of remote file systems exported over the SMB/CIFS protocol, the standard protocol for file sharing used by Microsoft Windows®. This allows you to check whether a specified user is able to log on to a particular file server and to monitor the amount of free disk space on the file server. The syntax of this command is as follows:

check_disk_smb -H <host> -s <share> -u <user> -p <password> 
               -w <warn> -c <crit> [-W <workgroup>] [-P <port>]

Options specific to this plugin are described in the following table:

Values for the -w and -c options can be specified in the form <value>%, in which case <value> percent must be free in order to not generate an exception. They can also be specified in form of <value><unit> (for example, 800k, 50M, and 4G), in which case, the test fails if the available space is less than the specified amount

This command uses the smbclient command to communicate over SMB protocol. It is, therefore, necessary to have the Samba client package installed on the machine where the test will be run.

Sample command definitions to check connectivity to a share without checking for disk space, and also to verify disk space over SMB, are as follows:

  define command
  {
    command_name  check_smb_connect
    command_line  $USER1$/check_disk_smb –H $HOSTADDRESS$ -w 100% -c 100% -u $ARG1$ -p $ARG2$ -s $ARG3$
  }
  define command
  {
    command_name  check_smb_space
    command_line  $USER1$/check_disk_smb –H $HOSTADDRESS$
                  -u $ARG1$ -p $ARG2$ -s $ARG3$ -w $ARG4$ -c $ARG5$
  }

Both of the commands require the passing of a username, password and share name as arguments. The latter example also requires the passing of warning and critical value limits to check. The first example will only issue a critical state if a partition has no space left. It is also worth noting that Samba 3.x servers report quota as disk space, if this is enabled for the specified user. Therefore, this might not always be an accurate way to measure disk space.