Docs

Alerting

Alert Rules

Alert rules are comprised of Alerts Criteria and Actions

  • Alert Criteria - lets you choose between the status of an agent, the status of a check, or any arbitrary metric being stored in Outlyer.

  • Actions - allow an event to occur when one of the alerts meets one of its criteria. This can be triggering an email, sending a message to a Slack channel, or performing a GET on a webhook.

More actions will be supported in the future.

For both Host status and Service check based alerting, Outlyer uses return codes based on the POSIX specification of returning a positive value. Further to this, Outlyer follows the common Nagios monitoring numerics for status values. This allows you to drop Nagios scripts into Outlyer with the knowledge they will work as intended. Generally these are 0, 1, 2 or 3. Please see the tables below for further explanation.

Triggers

If multiple criteria are created within a rule, then ANY one of criteria needs to be met to trigger and ALL actions are run at the action level.

There are two levels of trigger, Critical and Warning.

  • Critical - will only action the alerts when a criteria meets the critical level.

  • Warning - will action when the warning level meets warning levels and again when critical is breached.

Host Based Alerts

Host based alerts allow you to raise an alert based on the status of an Outlyer agent, container or device.

Host check return values:

Numeric Value Service Status Status Description
0 OK The host check command was able to check the host and it appeared to be respond
1 Warning The host check command was able to check the host, but it appeared to be above some “warning” threshold or did not appear to be working properly
2 Critical The host check command detected that either the host was not running or it was above some “critical” threshold
3+ Unknown Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation.

The state of a host is calculated by Outlyer dependent on the connection to our service.

The state of a container is calculated by the Outlyer agent when discovering against the Docker, or Kubernetes API.

The state of a device is based upon the return code of the host check_command in the discovery configuration.

Actions are raised based on the number of warning or critical states returned in a row. Host statuses are checked every 30 seconds, by default.

Service Check Based Alerts

Service based alerts allow you to raise an alert based on the output status of a check.

Service checks return values:

Numeric Value Service Status Status Description
0 OK The plugin was able to check the service and it appeared to be functioning properly
1 Warning The plugin was able to check the service, but it appeared to be above some “warning” threshold or did not appear to be working properly
2 Critical The plugin detected that either the service was not running or it was above some “critical” threshold
3+ Unknown Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation.

These return values come from the scheduled check.

The plugin that is testing the service should exit with one of the values in the table above and based on these, you can alert on a certain number of critical or warning states returned in a row. Service check runs happen every 30 seconds by default.

Metric Based Alerts

Metric based alerts allow you to raise an alert based on any metric value being returned to, and stored, in Outlyer.

Metrics can be returned from the service check formats, Nagios, Prometheus and Native. They are independent values from the return code of the plugin.

It is possible for metrics to still be returned from a service check that is returning a non-zero status. For example, a simple HTTP based check of a website may be down, and the plugin will return a 2-critical, but also return metrics for the check, such as the time taken to fail the check.

Actions can be raised based on threshold levels of the metric. Outlyer will raise an action based on the metric value being either above a certain value, or below it. The action is raised when this value has been in this state for an amount of time (based in minutes).