Monday, August 31, 2015

Monitor Remote Linux System with Nagios using SSH


NRPE is the most popular method to monitor remote Linux systems. But in some cases we don’t want to install NRPE on remote system or we can’t install it or we may restricted by firewall since NRPE requires TCP port 5666 to be opened. In that case we can make use SSH using check_by_ssh method.









Above picture shows Nagios server checks remote linux servers using ssh protocol, in this case the the monitoring host is the ssh client whereas the remote servers are the ssh servers.

In this tutorial, I will setup a Nagios server to monitor 1 remote host (remote1), all running RHEL6.x.

Download Nagios installer from nagios.org (I use nagios-4.0.8 (nagios core)) and nagios-plugins-2.0.3. Extract the tarballs to the server.

Step 1:  Create Nagios User and Group

useradd nagios
passwd nagios
groupadd nagcmd
usermod -G nagcmd nagios

Step 2: Install Nagios and its dependencies on the Nagios host

yum -y install openssl-devel.x86_64
yum -y install perl-Date-Manip.noarch
yum -y install perl-TimeDate.noarch perl-Date-Calc.noarch perl-DateTime.x86_64
yum -y install mysql.x86_64
yum install -y httpd php gcc glibc glibc-common gd gd-devel make net-snmp

Depends on your rhel installation, nagios may need additional libraries or packages to be installed.

./configure --with-nagios-group=nagios --with-command-group=nagcmd

Carefully look at the messages, see dependencies it requires before compile, once everything is good proceed with

make all ; make install ;  make install-init ;  make install-config ; make install-commandmode

Above is one liner for make, install program, libraries and configuration files.

Step 3: Install plugins

./configure --with-nagios-group=nagios --with-command-group=nagcmd --with-gd-lib=/usr/lib --with-gd-inc=/usr/include --with-openssl=/usr/bin/openssl  --enable-perl-modules

make && make install

The plugins are installed at /usr/local/nagios/libexec/ ,

Set httpd and nagios service start at runlevel 3 & 5, start nagios and httpd service.

chkconfig --level 35 nagios on
chkconfig --level 35 httpd on


Step 4: Create a Default User for Web Access

htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Type in the nagiosadmin password, this is the password you will use to login at the main page.

service nagios start
service httpd start

Access the main page at  http://Your-server-IP-address/nagios

To this point, you will see only one server which is localhost (nagios server).

Step 5: Configure password-less ssh login to the remote servers.

To ensure that the central Nagios server is able to connect to the remote host via SSH in a manner that does not require a password. This would require creating a password-less public/private keypair as the user running the Nagios service (typically "nagios"), sending the public key to the remote server, and then (as user "nagios") logging into the remote system.

On the remote servers, add nagios user.

useradd nagios
passwd nagios
groupadd nagcmd
usermod -G nagcmd nagios

On Nagios server, if you work as root change to nagios user,

su - nagios

Create a public key and distribute to the remotes servers.

ssh-keygen -t dsa

enter default values and not to specify passphrase,
the public key will be created in /home/nagios/.ssh/id_dsa.pub

ssh-copy-id -i /home/nagios/.ssh/id_dsa.pub nagios@remote1

Once you done with above task, ssh to the remote server, this time you won't be asked for password.

Step 6: Copy nagios plugins to the remote server.

On Nagios server as nagios user copy the plugins to the remote server, I will copy them all to /home/nagios/libexec ,

scp /usr/local/nagios/libexec/*  nagios@remote1:/home/nagios/libexec

On Nagios server, the plugins are at /usr/local/nagios/libexec/ while on the remote1 the location is at /home/nagios/libexec

Step 7: Detail configuration

On the Nagios server, in the localhost.cfg configuration file, define the remote hosts to check, you can also define host group for a better view in the nagios web page.

I will define entries for the nagios server itself called nms and the remote server named remote1

# Define a host for the nagios server in credit dept

define host{
        use              linux-server
        host_name        nms
        alias            nms
        address          192.168.122.16
        icon_image       redhat.gif
        statusmap_image  redhat.gd2
        check_command    check_tcp_port #command1
        passive_checks_enabled 1
}


## Define a host for linux servers in credit dept in a remote site

## 1) remote1 ##

define host{
        use                     linux-server
        host_name               remote1
        alias                   remote1
        address                 10.22.122.122
icon_image              redhat.gif
        statusmap_image         redhat.gd2
check_command           check_tcp_port #command1
passive_checks_enabled 1
}


## Now lets define the service to check in remote1 server

# remote1

define service{
        use                     local-service
        host_name               remote1
        service_description     MySQL
        check_command           check_ssh_mysql_port #command2
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     VSFTPD-HA
        check_command           check_ssh_ftp #command3
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     VSFTPD-CERT-EXPIRY
        check_command           check_ssh_ssl_cert #command4
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     Disk Utilization
        check_command           check_ssh_free_space #command5
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     MEMORY Utilization
        check_command           check_ssh_free_mem #command6
        }

define service{
        use                     local-service
        host_name               remote1
        service_description     Uptime
        check_command           check_ssh_uptime #command7
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     MySQL Replication
        check_command           check_ssh_mysqlrepl #command8
        }



## End host and service definitions


The icon_image and statusmap_image will set an icon with redhat.gif and redhat.gd2 icon for that particular host in nagios web. The check_command will use a predefine command called check_tcp_port which will check port 22 (ssh), if it is connected then the host status is up.
We will come to check command (red color) after this.

Why don't just use ping command to check, as I said earlier you might have a remote server which is behind firewall and not pingable for whatever reason.

Now lets see the command.cfg entries and its relation to localhost.cfg

## Check using SSH command ##


define command {
command_name check_ssh_port
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ssh -t 60 -H $HOSTADDRESS$" -E
}

#command1
define command {
command_name check_tcp_port
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 22
}


#command2
define command {
command_name check_mysql_port
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_tcp -p 3306 -t 120 $HOSTADDRESS$" -E -t 120
}

#command3
define command {
command_name check_ssh_ftp
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ftp -p 21 -t 60 $HOSTADDRESS$" -E
}

#command4
define command {
command_name check_ssh_ssl_cert
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ssl_cert -H $HOSTADDRESS$ -p 21 -P ftp -s -w 120 -c 60" -E -t 120
}

#command5
define command {
command_name check_ssh_free_space
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_disk -u GB -w 20% -c 10% -p / -p /opt -p /var" -E -t 120
}

#command6
define command {
command_name check_ssh_free_mem
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_mem.pl -f -w 10 -c 5" -E -t 120
}

#command7
define command {
command_name check_ssh_uptime
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_uptime" -E -t 120
}

#command8
define command {
command_name check_ssh_mysqlrepl
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_mysql_slavestatus.sh -H localhost -P 3306 -u xxxxx -p xxxxxx -w 60 -c 120" -E -t 120
}

## End command definitions

You may also pre-run the command manually before applying to nagios configuration. Giving example below, on the Nagios server as nagios user to check the uptime of remote server, run

/usr/local/nagios/libexec/check_by_ssh -H remote1 -C "/home/nagios/libexec/check_uptime -u days -w 5 -c 1" -E

Once the host, service and commands are well defined, you may check first if there any error

/usr/local/nagios/bin/nagios -v  /usr/local/nagios/etc/nagios.cfg

Finally, apply the changes made with restarting nagios

/etc/init.d/nagios restart

You will need to give few minutes for the next round check to run, you will see two hosts with service status defined in the configurations.


Host check status - checking port 22 (ssh)
















Service status on a host