Introduction to Monitoring with Zabbix #4 – Triggers

Last time, I discussed hosts and host groups, which are used to organize items. Now let’s take a look at how triggers can be used to distinguish between normal states and problem states based on Item values.

Trigger functions

A trigger is a setting provided by Zabbix to determine distinguish between normal and problem values.By assigning a trigger to an item, it is possible to determine whether this item’s value corresponds to a normal state or a problem state.

Triggers can use functions, comparison operators, and/or logical operators to apply conditional expressions to items and decide whether they correspond to normal states or problem states. One or more trigger settings can be set for multiple items, and conversely, multiple items can be used in the settings of a single trigger. With these settings, it is possible to make complex decisions about problems and their recovery.
The following settings can be applied to triggers. Events will be described later.

Item: Description

——-—–

Name: Trigger name
Can use a macro such as HOST.NAME
Severity: Select from *Not classified*/*Information*/*Warning*/*Mild problem*/*Serious problem*/*Critical problem*
Problem condition expression: An expression that evaluates to true in normal states, and false in problem states
For details, see *Problem condition expressions* below
OK event generation: Select from *Expression* (using a problem condition expression), *Recovery expression*, or *None*
An OK event occurs if the conditional expression evaluates to true when *Expression* is selected, or if the conditional expression and recovery expression both evaluate to true when *Recovery expression* is selected. When *None* is selected, an OK event only occurs if a problem is closed manually
Problem event generation mode: Select from *Single* (only generated at first occurrence) or *Multiple* (generated every time the conditional expression is satisfied)
Recovery expression: Describes the conditions under which a problem is resolved
This can be entered when *Recovery expression* has been selected for *OK event generation*
OK event closes: Select from *All problems* or *All problems if tag values match*
A recovery event is generated for all hosts set with this trigger when *All problems* is selected, and for hosts that match the tag/value when *All problems if tag values match* is selected
Tag: You can specify multiple tag/value pairs to be used when an OK event closes
By setting the tag to a host or service, you can specify the conditions under which the host is recovered
Allow manual close: When a problem has occurred, this setting allows it to be closed manually
URL: The URL to be used when a problem has occurred with an item
Description: A description of the trigger
Since it can be included in the notification details that are sent when an action occurs, you can use a macro such as HOST.NAME to dynamically change the problem notification details
Enabled: Specifies whether or not the trigger itself is enabled

#Event

A trigger generates a problem event when a problem has occurred. An OK event is generated when a problem has returned to normal. These events perform the actions that are executed when problems occur and when they are recovered. The action of generating an OK event is sometimes referred to as “closing” the event.

#Problem condition expressions

Problem condition expressions and recovery expressions are written using the following sort of syntax. The parts enclosed by curly braces correspond to a condition.

“`

{: . () }

“`

Item: Description

——-—–

server: Server name
key: Item name
function: A function that refers to collected values, the current time, and other factors
parameter: A value passed to a function as an argument
A bare number represents the number of seconds before the present time. A number preceded by a hash symbol (’#’) represents the number of observations from the end
operator: Calculates the results of an operation applied to conditions and constants in the first half of the expression
constant: The operand of the condition in the first half
Can either be a simple numerical value, or another condition

Some functions operate on numbers, some operate on times, and some operate on strings. Functions that operate on numbers include last (), avg (), count (), min (), max () and sum (). Time-related functions include now (), date () and time (), and string-related functions include diff (), regexp () and strlen (). For more details about functions, see the official documentation.

Operators include numerical operators (+,— , \*, /), numerical comparison operators (\<, \<=, =, =\>, \>, \<=\>), and logical operators (and, or, not).

Operators can be entered directly, or alternatively they may be set by means of the *Add* button to the right of the conditional expression in the Web UI trigger creation/setting dialog. Here, you can select (valid values of) server/key/function/operator from the drop-down menu, and create conditional expressions simply by inserting numbers for the parameters (comparison only) and constants. To set multiple conditional expressions, manually enter a logical operator and press the *Add* button again to add it.

Problem condition expressions can be complicated, so please check the official documentation for details.

Severity

In the trigger settings, you can specify a level of severity. The severity can be set to one of the following levels (in ascending order of seriousness):

Not classified
Information
Warning
Mild problem
Serious problem
Critical problem

Dependencies

You can specify dependencies in trigger settings. For example, consider the following monitoring setup:

Life/death monitoring of an L3 switch
Life/death monitoring of an L3 switch’s SNMP port
Life/death monitoring of the Zabbix agent in a target host
Monitoring a host’s CPU/memory/HDD

In this case, when a problem has occurred in 1, a problem will also be detected in 2. Similarly, if problems occur in 2 or 3, a problem will also occur in 4. In this case, a dependency relationship of the form 1 ←2 ←3 ←4 can be set. If you do that, then an event will only be reported for the uppermost trigger in which a problem occurred.

Other Events

Zabbix supports other types of event besides Problem/OK events.

Event: When generated
Service Up: Every time Zabbix detects an active service.
Service Down: Every time Zabbix cannot detect a service.
Host Up: If at least one of the services is up for the IP.
Host Down: If no services are responding.
Service Discovered: If a service comes back after downtime or is discovered for the first time.
Service Lost: If a service is lost after being up.
Host Discovered: If a host comes back after downtime or is discovered for the first time.
Host Lost: If a host is lost after being up.

The discovery functions introduced in the last column use events to perform actions.

Stopping an Event manually

Normally, when an event causes a change of state from Problem to OK, the problem event is automatically resolved. However, if it is difficult to make this decision automatically, the problem can be resolved manually. In this case, *Allow manual close* must be enabled in the Zabbix trigger settings.
An Event can be manually closed from the Update problem dialog in the Zabbix monitoring window.