NRPE is the most popular method to monitor remote Linux systems. But in some cases we don’t want to install NRPE on remote system or we can’t install it or we may restricted by firewall since NRPE requires TCP port 5666 to be opened. In that case we can make use SSH using check_by_ssh method.
Above picture shows Nagios server checks remote linux servers using ssh protocol, in this case the the monitoring host is the ssh client whereas the remote servers are the ssh servers.
In this tutorial, I will setup a Nagios server to monitor 1 remote host (remote1), all running RHEL6.x.
Download Nagios installer from nagios.org (I use nagios-4.0.8 (nagios core)) and nagios-plugins-2.0.3. Extract the tarballs to the server.
Step 1: Create Nagios User and Group
useradd nagios
passwd nagios
groupadd nagcmd
usermod -G nagcmd nagios
Step 2: Install Nagios and its dependencies on the Nagios host
yum -y install openssl-devel.x86_64
yum -y install perl-Date-Manip.noarch
yum -y install perl-TimeDate.noarch perl-Date-Calc.noarch perl-DateTime.x86_64
yum -y install mysql.x86_64
yum install -y httpd php gcc glibc glibc-common gd gd-devel make net-snmp
Depends on your rhel installation, nagios may need additional libraries or packages to be installed.
./configure --with-nagios-group=nagios --with-command-group=nagcmd
Carefully look at the messages, see dependencies it requires before compile, once everything is good proceed with
make all ; make install ; make install-init ; make install-config ; make install-commandmode
Above is one liner for make, install program, libraries and configuration files.
Step 3: Install plugins
./configure --with-nagios-group=nagios --with-command-group=nagcmd --with-gd-lib=/usr/lib --with-gd-inc=/usr/include --with-openssl=/usr/bin/openssl --enable-perl-modules
make && make install
The plugins are installed at /usr/local/nagios/libexec/ ,
Set httpd and nagios service start at runlevel 3 & 5, start nagios and httpd service.
chkconfig --level 35 nagios on
chkconfig --level 35 httpd on
Step 4: Create a Default User for Web Access
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
Type in the nagiosadmin password, this is the password you will use to login at the main page.
service nagios start
service httpd start
Access the main page at http://Your-server-IP-address/nagios
To this point, you will see only one server which is localhost (nagios server).
Step 5: Configure password-less ssh login to the remote servers.
To ensure that the central Nagios server is able to connect to the remote host via SSH in a manner that does not require a password. This would require creating a password-less public/private keypair as the user running the Nagios service (typically "nagios"), sending the public key to the remote server, and then (as user "nagios") logging into the remote system.
On the remote servers, add nagios user.
useradd nagios
passwd nagios
groupadd nagcmd
usermod -G nagcmd nagios
On Nagios server, if you work as root change to nagios user,
su - nagios
Create a public key and distribute to the remotes servers.
ssh-keygen -t dsa
enter default values and not to specify passphrase,
the public key will be created in /home/nagios/.ssh/id_dsa.pub
ssh-copy-id -i /home/nagios/.ssh/id_dsa.pub nagios@remote1
Once you done with above task, ssh to the remote server, this time you won't be asked for password.
Step 6: Copy nagios plugins to the remote server.
On Nagios server as nagios user copy the plugins to the remote server, I will copy them all to /home/nagios/libexec ,
scp /usr/local/nagios/libexec/* nagios@remote1:/home/nagios/libexec
On Nagios server, the plugins are at /usr/local/nagios/libexec/ while on the remote1 the location is at /home/nagios/libexec
Step 7: Detail configuration
On the Nagios server, in the localhost.cfg configuration file, define the remote hosts to check, you can also define host group for a better view in the nagios web page.
I will define entries for the nagios server itself called nms and the remote server named remote1
# Define a host for the nagios server in credit dept
define host{
use linux-server
host_name nms
alias nms
address 192.168.122.16
icon_image redhat.gif
statusmap_image redhat.gd2
check_command check_tcp_port #command1
passive_checks_enabled 1
}
## Define a host for linux servers in credit dept in a remote site
## 1) remote1 ##
define host{
use linux-server
host_name remote1
alias remote1
address 10.22.122.122
icon_image redhat.gif
statusmap_image redhat.gd2
check_command check_tcp_port #command1
passive_checks_enabled 1
}
## Now lets define the service to check in remote1 server
# remote1
define service{
use local-service
host_name remote1
service_description MySQL
check_command check_ssh_mysql_port #command2
}
define service{
use local-service
host_name remote1
service_description VSFTPD-HA
check_command check_ssh_ftp #command3
}
define service{
use local-service
host_name remote1
service_description VSFTPD-CERT-EXPIRY
check_command check_ssh_ssl_cert #command4
}
define service{
use local-service
host_name remote1
service_description Disk Utilization
check_command check_ssh_free_space #command5
}
define service{
use local-service
host_name remote1
service_description MEMORY Utilization
check_command check_ssh_free_mem #command6
}
define service{
use local-service
host_name remote1
service_description Uptime
check_command check_ssh_uptime #command7
}
define service{
use local-service
host_name remote1
service_description MySQL Replication
check_command check_ssh_mysqlrepl #command8
}
## End host and service definitions
The icon_image and statusmap_image will set an icon with redhat.gif and redhat.gd2 icon for that particular host in nagios web. The check_command will use a predefine command called check_tcp_port which will check port 22 (ssh), if it is connected then the host status is up.
We will come to check command (red color) after this.
Why don't just use ping command to check, as I said earlier you might have a remote server which is behind firewall and not pingable for whatever reason.
Now lets see the command.cfg entries and its relation to localhost.cfg
## Check using SSH command ##
define command {
command_name check_ssh_port
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ssh -t 60 -H $HOSTADDRESS$" -E
}
#command1
define command {
command_name check_tcp_port
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 22
}
#command2
define command {
command_name check_mysql_port
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_tcp -p 3306 -t 120 $HOSTADDRESS$" -E -t 120
}
#command3
define command {
command_name check_ssh_ftp
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ftp -p 21 -t 60 $HOSTADDRESS$" -E
}
#command4
define command {
command_name check_ssh_ssl_cert
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ssl_cert -H $HOSTADDRESS$ -p 21 -P ftp -s -w 120 -c 60" -E -t 120
}
#command5
define command {
command_name check_ssh_free_space
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_disk -u GB -w 20% -c 10% -p / -p /opt -p /var" -E -t 120
}
#command6
define command {
command_name check_ssh_free_mem
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_mem.pl -f -w 10 -c 5" -E -t 120
}
#command7
define command {
command_name check_ssh_uptime
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_uptime" -E -t 120
}
#command8
define command {
command_name check_ssh_mysqlrepl
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_mysql_slavestatus.sh -H localhost -P 3306 -u xxxxx -p xxxxxx -w 60 -c 120" -E -t 120
}
## End command definitions
You may also pre-run the command manually before applying to nagios configuration. Giving example below, on the Nagios server as nagios user to check the uptime of remote server, run
/usr/local/nagios/libexec/check_by_ssh -H remote1 -C "/home/nagios/libexec/check_uptime -u days -w 5 -c 1" -E
Once the host, service and commands are well defined, you may check first if there any error
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Finally, apply the changes made with restarting nagios
/etc/init.d/nagios restart
You will need to give few minutes for the next round check to run, you will see two hosts with service status defined in the configurations.
Host check status - checking port 22 (ssh) |
Service status on a host |