NetSaint Configuration

Service Configuration

Service Definition

Format:service[<host>]=<description>;<check_period>;<max_attempts>;<check_interval>;<retry_interval>;<contactgroups>;<notification_interval>;<notification_period>;<notify_recovery>;<notify_critical>;<notify_warning>;<event_handler>;<check_command>
Example 1: service[rosie]=FTP;24x7;3;5;1;nt-admins;120;24x7;1;1;1;;check_ftp
Example 2: service[dev]=HTTP;24x7;3;5;1;nt-admins;240;24x7;1;1;1;;check_http2!192.168.0.2!/!88
Example 3: service[real]=Zombie Processes;24x7;3;5;1;linux-admins;240;24x7;1;1;1;;check_procs!5!10!Z

A service definition is used to identify a "service" that runs on a host. The term "service" is used very loosely. It can mean an actual service that runs on the host (POP, SMTP, HTTP, etc.) or some other type of metric associated with the host (response to a ping, number of logged in users, free disk space, etc.). The different arguments to a service definition are outlined below.

<host> This is the short name of the host that the service "runs" on or is associated with.
<check_period> This is the short name of the time period that identifies when this service can be checked. Services checks are scheduled in such a way that they are only checked (or rechecked) during times that are valid within the specified service check time period. See the "Time Periods" documentation in the theory of operation section for more information on how time periods works and potentials problems with using them improperly.
<max_attempts> This is the number of times that NetSaint will retry the service check if it returns any state other than an OK state. Setting this value to 1 will cause NetSaint to generate an alert (if the service check detected a problem) without retrying the service check again.
<check_interval> This is the number of "time units" to wait before scheduling the next "regular" check of the service. "Regular" checks are those that occur when the service is in an OK state or when the service is in a non-OK state, but has already been rechecked max_attempts number of times. Unless you've changed the interval_length value in the main configuration file from the default value of 60, this number will mean minutes.
<retry_interval> This is the number of "time units" to wait before scheduling a re-check of the service. Services are rescheduled at the retry interval when the have changed to a non-OK state. Once the service has been retried max_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length value in the main configuration file from the default value of 60, this number will mean minutes.
<contactgroups> This is a comma-delimited list of the short names of contact groups that should be notified about problems or recoveries for this service. If a problem or recovery occurs for this service, NetSaint will attempt to notify all the contacts in each contact group (depending on the notification options that are set below).
<notification_interval> This is the number of "time units" to wait before re-notifying a contact that this service is still at a non-OK state. Unless you've changed the interval_length value in the main configuration file from the default value of 60, this number will mean minutes.
<notification_period> This is the short name of the time period that identifies when notifications about problems or recoveries for this service may be sent out. If a service problem or recovery occurs outside valid times within this time period, notifications will not be sent out. See the "Time Periods" documentation in the theory of operation section for more information on how time periods works and potentials problems with using them improperly.
<notify_recovery> This value determines whether or not alert notifications will be generated if the service recovers from a non-OK state. Set this value to 1 if the service should generate alerts for recoveries, 0 if it shouldn't. Note: If a contact is configured to not receive recovery notifications, they will not be notified of any recoveries for this service, regardless of this setting.
<notify_critical> This value determines whether or not alert notifications will be generated if the service is in a CRITICAL state. Set this value to 1 if the service should generate alerts for critical states, 0 if it shouldn't. Note: If a contact is configured to not receive critical notifications, they will not be notified of any critical states for this service, regardless of this setting.
<notify_warning> This value determines whether or not alert notifications will be generated if the service is in a WARNING or UNKNOWN state. Set this value to 1 if the service should generate alerts for warning/unknown states, 0 if it shouldn't. Note: If a contact is configured to not receive warning/unknown notifications, they will not be notified of any warning/unknown states for this service, regardless of this setting.
<event_handler> This is the short name of the command that should be run whenever a change in the status of the services is detected (i.e. whenever it goes down or recovers). Read the documentation on event handlers for a more detailed explanation of how to write scripts for handling events. If you do not wish to define an event handler for the service, leave this option blank (as shown in the examples above).
<check_command>

This is the command that NetSaint will run in order to check the status of the service. There are three command formats that can be used:

1. "Vanilla" Command: The command name is just the name of command that was previously defined. Example 1 above shows this type of command.
2. Command w/ Arguments: This is basically the same as the "vanilla" command style, but with command options separated by a ! character. Example 2 above shows this type of command. Arguments are separated from the command name (and other arguments) with the ! character. The command should be defined to make use of the $ARGx$ macros. In Example 2 above, $ARG1$ would resolve to 134.84.92.128, $ARG2$ would resolve to /, and $ARG3$ would resolve to 88 for that particular service. Note: NetSaint will handle a maximum of sixteen command line arguments ($ARG1$ through $ARG16$).
3. "Raw" Command Line: You may optionally specify an actual command line to be executed. To do so you must enclose the entire command line in double quotes. The outer double quotes will be stripped off before the command is actually executed. No macros are processed inside of raw command lines. Note: I haven't really tested this format too much, but it should work. Remember that the command must return a proper status level. See the documentation on writing plugins for numeric codes for each status level.