NetSaint Configuration

Miscellanous Settings

Program mode
Timing interval length
Agressive host checking option
Method to use to determine time between checks
Global host eventhandler
Global service Eventhandler
Inter-check sleep time
Service check interleave factor
Maximum concurrent checks
Service reaper frequency


Program Mode

Format: program_mode=<a/s>
Example: program_mode=a

Timing Interval Length

Format: interval_length=<seconds>
Example: interval_length=60

This is the number of seconds per "unit interval" used for timing in the scheduling queue, re-notifications, etc. "Units intervals" are used in the host configuration file to determine how often to run a service check, how often of re-notify a contact, etc.

Important: The default value for this is set to 60, which means that a "unit value" of 1 in the host configuration file will mean 60 seconds (1 minute). I have not really tested other values for this variable, so proceed at your own risk if you decide to do so!

Agressive Host Checking Option

Format: use_agressive_host_checking=<0/1>
Example: use_agressive_host_checking=0

Beginning with release 0.0.4, NetSaint tries to be a little smarter about how and when it checks the status of hosts. In general, disabling this option will allow NetSaint to make some smarter decisions and check hosts a bit faster. Enabling this option will increase the amount of time required to check hosts, but may improve reliability a bit. If you want to know more about exactly what this option does, search the source code in the netsaint.c file for the string "use_agressive_host_checking" and read some of the comments I've added. Unless you have problems with NetSaint not recognizing that a host recovered, I would suggest not enabling this option.

Inter-Check Delay Method

Format: inter_check_delay_method=<n/d/s>
Example: inter_check_delay_method=s

This option allows you to control how service checks are initially "spread out" in the event queue. Using a "smart" delay calculation (the default) will cause NetSaint to calculate an average check interval and spread initial checks of all services out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended unless you are testing the service check parallelization functionality. Using no delay will cause all service checks to be scheduled for execution at the same time. This means that you will generally have large CPU spikes when the services are all executed in parallel. Values are as follows:

Global Host Event Handler Option

Format: global_host_event_handler=<command>
Example: global_host_event_handler=log-host-event-to-db

This option allows you to specify a host event handler command that is to be run for every host state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each host definition. The command argument is the short name of a command definition that you define in your host configuration file. More information on event handlers can be found here.

Global Service Event Handler Option

Format: global_service_event_handler=<command>
Example: global_service_event_handler=log-service-event-to-db

This option allows you to specify a service event handler command that is to be run for every service state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each service definition. The command argument is the short name of a command definition that you define in your host configuration file. More information on event handlers can be found here.

Inter-Check Sleep Time

Format: sleep_time=<seconds>
Example: sleep_time=1

This is the number of seconds that NetSaint will sleep before checking to see if the next service check in the scheduling queue should be executed. Note that NetSaint will only sleep after it "catches up" with queued service checks that have fallen behind.

Service Interleave Factor

Format: service_interleave_factor=<s|n>
Example: service_interleave_factor=s

This variable determines how service checks are interleaved. Interleaving allows for a more even distribution of service checks, reduced load on remote hosts, and faster overall detection of host problems. With the introduction of service check parallelization, remote hosts could get bombarded with checks if interleaving was not implemented. This could cause the service checks to fail or return incorrect results if the remote host was overloaded with processing other service check requests. Setting this value to 1 is equivalent to not interleaving the service checks (this is how versions of NetSaint previous to 0.0.5 worked). Set this value to s (smart) for automatic calculation of the interleave factor unless you have a specific reason to change it. The best way to understand how interleaving works is to watch the status CGI (detailed view) when NetSaint is just starting. You should see that the service check results are spread out as they begin to appear.

Maximum Concurrent Service Checks

Format: max_concurrent_checks=<max_checks>
Example: max_concurrent_checks=20

This option allows you to specify the maximum number of service checks that can be run in parallel at any given time. Specifying a value of 1 for this variable essentially prevents any service checks from being parallelized. You'll have to modify this value based on the system resources you have available on the machine that runs NetSaint, as it directly affects the maximum load that will be imposed on the system (processor utilization, memory, etc.).

Service Reaper Frequency

Format: service_reaper_frequency=<frequency_in_seconds>
Example: service_reaper_frequency=10

This option allows you to control the frequency in seconds of service "reaper" events. "Reaper" events process the results from parallelized service checks that have finished executing. These events consitute the core of the monitoring logic in NetSaint.