Monday, August 31, 2015

Monitor Remote Linux System with Nagios using SSH


NRPE is the most popular method to monitor remote Linux systems. But in some cases we don’t want to install NRPE on remote system or we can’t install it or we may restricted by firewall since NRPE requires TCP port 5666 to be opened. In that case we can make use SSH using check_by_ssh method.









Above picture shows Nagios server checks remote linux servers using ssh protocol, in this case the the monitoring host is the ssh client whereas the remote servers are the ssh servers.

In this tutorial, I will setup a Nagios server to monitor 1 remote host (remote1), all running RHEL6.x.

Download Nagios installer from nagios.org (I use nagios-4.0.8 (nagios core)) and nagios-plugins-2.0.3. Extract the tarballs to the server.

Step 1:  Create Nagios User and Group

useradd nagios
passwd nagios
groupadd nagcmd
usermod -G nagcmd nagios

Step 2: Install Nagios and its dependencies on the Nagios host

yum -y install openssl-devel.x86_64
yum -y install perl-Date-Manip.noarch
yum -y install perl-TimeDate.noarch perl-Date-Calc.noarch perl-DateTime.x86_64
yum -y install mysql.x86_64
yum install -y httpd php gcc glibc glibc-common gd gd-devel make net-snmp

Depends on your rhel installation, nagios may need additional libraries or packages to be installed.

./configure --with-nagios-group=nagios --with-command-group=nagcmd

Carefully look at the messages, see dependencies it requires before compile, once everything is good proceed with

make all ; make install ;  make install-init ;  make install-config ; make install-commandmode

Above is one liner for make, install program, libraries and configuration files.

Step 3: Install plugins

./configure --with-nagios-group=nagios --with-command-group=nagcmd --with-gd-lib=/usr/lib --with-gd-inc=/usr/include --with-openssl=/usr/bin/openssl  --enable-perl-modules

make && make install

The plugins are installed at /usr/local/nagios/libexec/ ,

Set httpd and nagios service start at runlevel 3 & 5, start nagios and httpd service.

chkconfig --level 35 nagios on
chkconfig --level 35 httpd on


Step 4: Create a Default User for Web Access

htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Type in the nagiosadmin password, this is the password you will use to login at the main page.

service nagios start
service httpd start

Access the main page at  http://Your-server-IP-address/nagios

To this point, you will see only one server which is localhost (nagios server).

Step 5: Configure password-less ssh login to the remote servers.

To ensure that the central Nagios server is able to connect to the remote host via SSH in a manner that does not require a password. This would require creating a password-less public/private keypair as the user running the Nagios service (typically "nagios"), sending the public key to the remote server, and then (as user "nagios") logging into the remote system.

On the remote servers, add nagios user.

useradd nagios
passwd nagios
groupadd nagcmd
usermod -G nagcmd nagios

On Nagios server, if you work as root change to nagios user,

su - nagios

Create a public key and distribute to the remotes servers.

ssh-keygen -t dsa

enter default values and not to specify passphrase,
the public key will be created in /home/nagios/.ssh/id_dsa.pub

ssh-copy-id -i /home/nagios/.ssh/id_dsa.pub nagios@remote1

Once you done with above task, ssh to the remote server, this time you won't be asked for password.

Step 6: Copy nagios plugins to the remote server.

On Nagios server as nagios user copy the plugins to the remote server, I will copy them all to /home/nagios/libexec ,

scp /usr/local/nagios/libexec/*  nagios@remote1:/home/nagios/libexec

On Nagios server, the plugins are at /usr/local/nagios/libexec/ while on the remote1 the location is at /home/nagios/libexec

Step 7: Detail configuration

On the Nagios server, in the localhost.cfg configuration file, define the remote hosts to check, you can also define host group for a better view in the nagios web page.

I will define entries for the nagios server itself called nms and the remote server named remote1

# Define a host for the nagios server in credit dept

define host{
        use              linux-server
        host_name        nms
        alias            nms
        address          192.168.122.16
        icon_image       redhat.gif
        statusmap_image  redhat.gd2
        check_command    check_tcp_port #command1
        passive_checks_enabled 1
}


## Define a host for linux servers in credit dept in a remote site

## 1) remote1 ##

define host{
        use                     linux-server
        host_name               remote1
        alias                   remote1
        address                 10.22.122.122
icon_image              redhat.gif
        statusmap_image         redhat.gd2
check_command           check_tcp_port #command1
passive_checks_enabled 1
}


## Now lets define the service to check in remote1 server

# remote1

define service{
        use                     local-service
        host_name               remote1
        service_description     MySQL
        check_command           check_ssh_mysql_port #command2
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     VSFTPD-HA
        check_command           check_ssh_ftp #command3
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     VSFTPD-CERT-EXPIRY
        check_command           check_ssh_ssl_cert #command4
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     Disk Utilization
        check_command           check_ssh_free_space #command5
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     MEMORY Utilization
        check_command           check_ssh_free_mem #command6
        }

define service{
        use                     local-service
        host_name               remote1
        service_description     Uptime
        check_command           check_ssh_uptime #command7
        }


define service{
        use                     local-service
        host_name               remote1
        service_description     MySQL Replication
        check_command           check_ssh_mysqlrepl #command8
        }



## End host and service definitions


The icon_image and statusmap_image will set an icon with redhat.gif and redhat.gd2 icon for that particular host in nagios web. The check_command will use a predefine command called check_tcp_port which will check port 22 (ssh), if it is connected then the host status is up.
We will come to check command (red color) after this.

Why don't just use ping command to check, as I said earlier you might have a remote server which is behind firewall and not pingable for whatever reason.

Now lets see the command.cfg entries and its relation to localhost.cfg

## Check using SSH command ##


define command {
command_name check_ssh_port
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ssh -t 60 -H $HOSTADDRESS$" -E
}

#command1
define command {
command_name check_tcp_port
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p 22
}


#command2
define command {
command_name check_mysql_port
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_tcp -p 3306 -t 120 $HOSTADDRESS$" -E -t 120
}

#command3
define command {
command_name check_ssh_ftp
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ftp -p 21 -t 60 $HOSTADDRESS$" -E
}

#command4
define command {
command_name check_ssh_ssl_cert
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_ssl_cert -H $HOSTADDRESS$ -p 21 -P ftp -s -w 120 -c 60" -E -t 120
}

#command5
define command {
command_name check_ssh_free_space
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_disk -u GB -w 20% -c 10% -p / -p /opt -p /var" -E -t 120
}

#command6
define command {
command_name check_ssh_free_mem
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_mem.pl -f -w 10 -c 5" -E -t 120
}

#command7
define command {
command_name check_ssh_uptime
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_uptime" -E -t 120
}

#command8
define command {
command_name check_ssh_mysqlrepl
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/home/nagios/libexec/check_mysql_slavestatus.sh -H localhost -P 3306 -u xxxxx -p xxxxxx -w 60 -c 120" -E -t 120
}

## End command definitions

You may also pre-run the command manually before applying to nagios configuration. Giving example below, on the Nagios server as nagios user to check the uptime of remote server, run

/usr/local/nagios/libexec/check_by_ssh -H remote1 -C "/home/nagios/libexec/check_uptime -u days -w 5 -c 1" -E

Once the host, service and commands are well defined, you may check first if there any error

/usr/local/nagios/bin/nagios -v  /usr/local/nagios/etc/nagios.cfg

Finally, apply the changes made with restarting nagios

/etc/init.d/nagios restart

You will need to give few minutes for the next round check to run, you will see two hosts with service status defined in the configurations.


Host check status - checking port 22 (ssh)
















Service status on a host



Saturday, April 11, 2015

FTP - FTPes vs Firewall and IPS - Who is blocking ??


Couple of months ago I really had a strange situation ,the ftp client suddenly was not be able to login to the ftp server (vsftpd) and not to mention to put and get data. All this while it works.

Since the setup was in the customer premise, I need to find out what went wrong, first they told me that they replace the old firewall to a brand new, aha.. could that be the culprit?

Details of my environment:

1) There were a FTP server (vsftpd) running on RHEL 6.5.
    - selinux enabled
    - SSL enabled and that makes the communication and file transfer in an encrypted form hence the connection being used is ftpes
    - passive port enabled, the control port is port 21 tcp and range of 30000-30100 tcp for data port.

Clik Here for more explanation on active vs passive ftp.


2) 2 FTP clients namely IBM data power (an appliance) and Filezilla ftp client on Windows

The connection illustration below shows three Agencies, I name it Agency-A, Agency-B and Agency-X being the ftp clients. They all are at different sites on different network and different firewalls.

Lets focus on Agency-x to Agency-A connection, since the issue was between them.



Below are the steps I used to find the root cause:

1) Testing made behind firewall in Agency-A between the client and server works fine.
2)  Now, from Agency-x to Agency-A, there were three set of test I used:
 
    A) use default setting with ssl enabled (ftpes) - failed
    B) disabled ssl (configured in vsftpd) - success
    C) telnet to port 21 and randomly port 30000-30100 - success, means firewall not blocking the tcp   ports.

Could that be the setting in vsftpd, I don't think so since it worked previously, however let get more information from the log.

On the vsftpd.conf, I enabled the debug_ssl option, for a complete of the configuration, lets look the following:

anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=022
dirmessage_enable=YES
xferlog_enable=NO
connect_from_port_20=YES
xferlog_file=/var/log/xferlog
xferlog_std_format=YES
ftpd_banner=Welcome to server FTP service.
chroot_local_user=YES
listen=YES

pam_service_name=vsftpd
userlist_enable=YES
tcp_wrappers=YES

ssl_enable=YES
allow_anon_ssl=NO
force_local_data_ssl=YES
force_local_logins_ssl=YES
ssl_tlsv1=YES
ssl_sslv2=NO
ssl_sslv3=NO
ssl_ciphers=HIGH
require_ssl_reuse=NO
ssl_request_cert=no

rsa_cert_file=/etc/vsftpd/server.crt
rsa_private_key_file=/etc/vsftpd/server.key

pasv_enable=YES
pasv_min_port=30000
pasv_max_port=30100

dual_log_enable=YES
log_ftp_protocol=YES
vsftpd_log_file=/var/log/vsftpd.log
xferlog_enable=YES
xferlog_std_format=NO
xferlog_file=/var/log/xferlog
debug_ssl=yes

Now, lets look at the vsftpd.log entries

05:54:52 2015 [pid 20383] CONNECT: Client "10.x.x.x"
Thu Jan 22 05:54:52 2015 [pid 20383] FTP response: Client "10.x.x.x", "220 Welcome to server FTP service."
Thu Jan 22 05:54:52 2015 [pid 20383] FTP command: Client "10.x.x.x", "AUTH TLS"
Thu Jan 22 05:54:52 2015 [pid 20383] FTP response: Client "10.x.x.x", "234 Proceed with negotiation."
Thu Jan 22 05:54:52 2015 [pid 20383] DEBUG: Client "10.x.x.x", "SSL_accept failed: error:00000000:lib(0):func(0):reason(0)"

 "DEBUG: Client "10.x.x.x", "SSL_accept failed: error:00000000:lib(0):func(0):reason(0)"

Above line indicate something wrong with the SSL, literally it says "Hey FTP Server, I am the client, since you require ssl connection then I need your certificate to continue hand shaking, but I am not getting it"

Well then who is interrupting or denying the handshake ?? 

It took me few days to google information, what could be root cause, the hunting for clues brought me to this search:

"http://www.experts-exchange.com/Software/Internet_Email/File_Sharing/Q_22690366.html"

As I have subscription with Redhat, I opened a case and that tooks sometime for Redhat support engineer to trace, from the trace it notice that the client terminate the connection, but why?

strace output:

~~~
16966 write(0, "234 Proceed with negotiation.\r\n", 31) = 31 <0.000034>
16966 read(0, 0x7f6fb88c5d30, 11)       = -1 ECONNRESET (Connection reset by peer) <0.003219>        <<---
16966 brk(0x7f6fb88fa000)               = 0x7f6fb88fa000 <0.000031>
16966 fcntl(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 <0.000072>
16966 write(4, "Fri Jan 23 17:01:34 2015 [pid 16"..., 127) = 127 <0.000067>
16966 fcntl(4, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0 <0.000018>
16966 fcntl(0, F_GETFL)                 = 0x2 (flags O_RDWR) <0.000023>
16966 fcntl(0, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000017>
16966 write(0, "500 OOPS: ", 10)        = -1 EPIPE (Broken pipe) <0.000022>
16966 --- SIGPIPE (Broken pipe) @ 0 (0) ---
16966 rt_sigreturn(0xd)                 = -1 EPIPE (Broken pipe) <0.000016>
16966 write(0, "error:00000000:lib(0):func(0):re"..., 39) = -1 EPIPE (Broken pipe) <0.000017>
16966 --- SIGPIPE (Broken pipe) @ 0 (0) ---
~~~


As soon as vsftpd server proceeds with SSL connection, client terminates the connection. Here we can see connection reset by peer.

Now, the last thing to check is the firewall itself. To make the long story short, I met the firewall guy and ask him to check if there indicators in the firewall log, but it was not there, no nothing about firewall blocking designated ports.

Again, this is not about port connectivity issue, I emphasized on checking if there rules made to deny encrypted content to pass through the firewall since protocol being used is ftpes.

Walla... finally he notice that the default setting that comes with the appliance is Intrusion Prevention System (IPS) is ENABLED and DROPPED encrypted file transfer protocol.

It took just couple of minutes for the firewall guy to configure to allow the protocol, root cause found and problem resolved !!