Saturday, May 24, 2014

Nagios: No output returned from plugin

I set up a small Nagios environment to monitor a few VM's for disk space and other things. I have experience with Nagios and love it, this time I used YUM to install (Nagios v 3.5.1) to make it easier on me and ran into the following error:


The nagios host is CentOS 6.2 and the VM's I am monitoring are different versions of 6.x. Here are my configurations:

On the Nagios server server_name.cfg includes the service to check:
 define service{  
   use                     generic-service  
   host_name               LB3  
   service_description     Primary Partition Free Space  
   check_command           check_nrpe!check_disk  
   }       

On the VM client in the nrpe.conf:
 command[check_disk]=/usr/lib64/nagios/plugins/check_disk --units GB -M -w 20% -c 10% -p /dev/mapper/vg_LB3-lv_root  

Some of you already know what is going on, but I was confused because I was able to pass check_nrpe from the server to the client manually just fine:


And get a reply back from checks, so its not a plugin or firewall issue, for example:

(I do have check_total_procs defined in nrpe.cfg on the client)

So although "No output returned from plugin" is pretty generic, in my case the Nagios server is having an issue passing check_nrpe to the client.  It wasn't until I combed through the ALL of my config files and ran across the check_nrpe command definition in commands.cfg:
 # NRPE command definition  
 define command{  
      command_name      check_nrpe  
      command_line      /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t $ARG2$  
 }  
 define command{  
      command_name      check_nrpe_1arg  
      command_line      /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$  
 }  

Ah-ha! check_nrpe is expecting two arguments, -c and -t, yet when I define the service, I am only passing ARG1 (check_disk in my example above).  So I had two options, add an ARG2 to the service definitions or change it to pass check_nrpe_1arg.  I decided to pass the -t (timeout) so I updated the all of service definitions like so:
 define service{  
   use                    generic-service  
   host_name              LB3  
   service_description    Primary Partition Free Space  
   check_command          check_nrpe!check_disk!60  
   }       

And now after a service nagios restart and force re-check the checks are working now: