We have an SBS 2011 box that we want to monitor for a client. Nagios was set up for them before to notify the managers of low disk space on their servers, and now they can be notified of any weird AD or Exchange issues on their SBS server. We are going to be using a cocktail of technologies that will help us talk to the SBS box as well as a host of powershell plugins and Nagios default commands that will reply back with the health of the system.
- check_nt will allow us to check important processes and services.
- check_nrpe will allow us to run powershell monitor scripts on the client machine
- check_smtp will make sure exchange is listening for email.
First we will tackle setting up everything on the client machine for check_nt. First download the latest NSClient++ (the 64 bit works for me) and run. Accept the licence. Select Typical install. On the NSClient++ Configuration window I like to check the box to Allow all users to write config file. On the next window put your Nagios server IP in and I select everything except for NSCA (not needed for this server).
Next Next (or 1-3 Next's) then Install.
First thing you will want to do it is adds port 5666 (nrpe) and 12489 (check_nt) to the windows firewall so Nagios can talk to it. Next thing open C:\Program Files\NSClient++\nsclient.ini and double check a couple things:
- CheckExternalScripts = 1
- allowed hosts = 192.168.1.18 #Nagios server IP
- NRPEServer = 1
- NSClientServer = 1
Any changes made to nsclient.ini require the NSClient++ service to be restarted in services.msc. Restart it for good messure.
After the NSClient++ install and opening of the firewall we now should be able to see if check_nt is working so lets log into the Nagios server and test the check_nt checks. Browse to your plugin folder, in my case it is /usr/local/nagios/libexec and run the following, looking for the proper output:
Notice that is a capitol H for host. You can always do a .\check_nt -h to help troubleshoot. Now lets open our Nagios configs and add the checks in.
Full disclosure: the way I organize my servers is to have a separate .cfg for each server sitting in the nagios config directory, in my case it is: /usr/local/nagios/etc/servers. Inside the folder there are configs called servername.cfg and inside the configs I define the hostname and the service definitions to run against the client server. There is more than one way to skin a grape and I prefer this way because some checks I want to run against some servers and some check I don't, so breaking it out by server makes sense to me. If I had 1000 servers to monitor I would probably not do it this way, but I digress. We will open our servername.cfg and add the check_nt checks:
define service{
use generic-service
host_name sbsserver.local
service_description Uptime
check_command check_nt!UPTIME
}
define service{
use generic-service
host_name sbsserver.local
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}
define service{
use generic-service
host_name sbsserver.local
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
}
define service{
use generic-service
host_name sbsserver.local
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
}
define service{
use generic-service
host_name sbsserver.local
service_description C:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
define service{
use generic-service
host_name sbsserver.local
service_description D:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l d -w 80 -c 90
}
define service{
use generic-service
host_name sbsserver.local
service_description Drive Space H:\ Exchange Logs
check_command check_nt!USEDDISKSPACE!-l h -w 80 -c 90
}
define service{
use generic-service
host_name sbsserver.local
service_description Drive Space I:\ Mailbox DBs
check_command check_nt!USEDDISKSPACE!-l i -w 80 -c 90
}
define service{
use generic-service
host_name sbsserver.local
service_description Microsoft Exchange Active Directory Topology
check_command check_nt!PROCSTATE!-d SHOWALL -l MSExchangeADTopologyService.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description Microsoft Exchange Protected Service Host
check_command check_nt!PROCSTATE!-d SHOWALL -l Microsoft.Exchange.ProtectedServiceHost.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description Microsoft Exchange Service Host
check_command check_nt!PROCSTATE!-d SHOWALL -l Microsoft.Exchange.ServiceHost.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description Microsoft Exchange System Attendant
check_command check_nt!PROCSTATE!-d SHOWALL -l mad.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description Active Directory Domain Services
check_command check_nt!PROCSTATE!-d SHOWALL -l lsass.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description DNS Server Service
check_command check_nt!PROCSTATE!-d SHOWALL -l dns.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description DFS Namespace Service
check_command check_nt!PROCSTATE!-d SHOWALL -l dfssvc.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description DFS Replication Service
check_command check_nt!PROCSTATE!-d SHOWALL -l DFSRs.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description Intersite Messaging Service
check_command check_nt!PROCSTATE!-d SHOWALL -l ismserv.exe
}
define service{
use generic-service
host_name sbsserver.local
service_description Microsoft Exchange Forms Based Authentication Service
check_command check_nt!SERVICESTATE!-d SHOWALL -l MSExchangeFBA
}
define service{
use generic-service
host_name sbsserver.local
service_description Microsoft Exchange Information Store
check_command check_nt!SERVICESTATE!-d SHOWALL -l MSExchangeIS
}
Yikes, that is a lot of content, we are checking system uptime, NSClient++ version, cpu load, memory usage, and drive spaces. We are are also checking AD and Exchange services and processes. Some of this you will need, some of it you won't, so customize for your environment. These checks I found from several different sources, see below for those. Next we need to add the check_nt command definition to commands.cgf, this may already be done, that's OK.
Open /usr/local/nagios/etc/objects/commands.cfg and add the following:
# 'check_nt' command definition
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}
At this point our check_nt checks should be working. Do a service nagios restart on the Nagios server and make sure the configs are good and the service starts. Back to your Nagios admin panel you will see a sea of new checks under your host, and will eventually be crunched by the Nagios server. Any issues with this please check out the links below to go into more depth with Nagios, check_nt and how it all works together.
check_nrpe
With check_nrpe we can run some custom powershell scripts against the SBS box to help monitor Active Directory and Exchange. Back to our SBS server, lets download and put the scripts in there new home. I am using the following scripts I got from the very helpful telnetport25.com.
- Exchange2010BackupMonitoring.ps1
- Exchange2010ContentIndexMonitor.ps1
You will also want to set the powershell script execution policy to Bypass so that NSClient++ can run the script. Once the scripts are in place open powershell, browse to the scripts folder and execute the scripts as a test:
If you have issues here make sure your user has access to the exchange shell plugin and exchange command-lets. Once that is done its time to edit nsclient.ini. Browse to C:\Program Files\NSClient++ and open nsclient.ini in your favorite editor and add the following to the end of the file:
[/settings/external scripts/scripts]
check_exbackup=cmd /c echo scripts\Exchange2010BackupMonitoring.ps1 | PowerShell.exe -Command -
check_exindex=cmd /c echo scripts\Exchange2010ContentIndexMonitor.ps1 | PowerShell.exe -Command -
What we are doing with this is going to call check_exbackup from the Nagios server and NSClient++ will know what to do with it based on these entries. Save, close, and restart the NSClient++ service.
Back to the Nagios server lets test our new checks out. Once again browse to where your plugins are, /usr/local/nagios/libexec, in my case and run the following:
We want to edit the servername.cfg and add the checks for these new scripts. Add the following:
define service{
use generic-service
host_name sbsserver.local
service_description Exchange DB Content Indexing
check_command check_nrpe!check_exindex!60
}
define service{
use generic-service
host_name sbsserver.local
service_description Microsoft Exchange Backups
check_command check_nrpe!check_exbackup!60
}
You will notice we are calling check_nrpe, and it might be added to commands.cfg by default, lets check by hand. Browse to /usr/local/nagios/etc/objects and open commands.cfg and add/look for the following:
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t $ARG2$
}
Notice we set a timeout (-t) of 60 but with the exchange scripts we may need to set it to 120, that is because powershell has to load the exchange command-lets and that could take extra time, so increasing the wait time might be needed. Save and do a service nagios restart to check the configs.
Our nagios panel is looking nice (see title image!) Now you are comfortable with check_nt and check_nrpe you can go crazy with plugins, there are a lot of options. Check out the Exchange and Windows Server plugin sections of the Nagios Exchange for more goodness. There are also some Active Directory scripts in the Windows Server section if you feel you want more monitor other than the check_nt services listed above.
check_smtp
The vanilla nagios-plugins package has a nice check_smtp plugin we are going to use to say "helo" to our exchange box. Lets go back to our Nagios server and into the Nagios plugins folder (/usr/local/nagios/libexec) and test it out:
OK looks good to me, once again lets add it to our servername.cgf and make sure its in commands.cfg:
define service{
use generic-service
host_name sbsserver.local
service_description Check SMTP
check_command check_smtp!60
}
And command definition:
# 'check_smtp' command definition
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ -t $ARG1$
}
Save everything and do a nagios service restart. With this we are running the plugin locally on the Nagios server and simply asking Exchange if its up, there is no need to check_nt, check_nrpe or NSCient++, its all happening on the Nagios server. Once nagios gets around to it we should have a healthy reply:
thanks and resources:
Thanks to those who helped me get this going:
- nsclient.org - Getting Started Tutorial
- telnetport25.com - Exchange Monitoring Part 1
- telnetport25.com - Exchange Monitoring Part 2
- telnetport25.com - Exchange Monitoring Part 3
- telnetport25.com - Exchange Monitoring Part 4
- exchange.nagios.org - Windows Checks
- exchange.nagios.org - Exchange Server Checks