Let’s have a look at how the triggers and events work in Zabbix

Introduction to Monitoring with Zabbix #4

Last time, I discussed hosts and host groups, which are used to organize items. Now let’s take a look at how triggers can be used to distinguish between normal states and problem states based on Item values.

Last time, I discussed hosts and host groups, which are used to organize items. Now let’s take a look at how triggers can be used to distinguish between normal states and problem states based on Item values.

Trigger functions

A trigger is a setting provided by Zabbix to determine distinguish between normal and problem values.By assigning a trigger to an item, it is possible to determine whether this item’s value corresponds to a normal state or a problem state.

Triggers can use functions, comparison operators, and/or logical operators to apply conditional expressions to items and decide whether they correspond to normal states or problem states. One or more trigger settings can be set for multiple items, and conversely, multiple items can be used in the settings of a single trigger. With these settings, it is possible to make complex decisions about problems and their recovery.
The following settings can be applied to triggers. Events will be described later.

Item: Description

——-—–

  • Name: Trigger name
    Can use a macro such as HOST.NAME
  • Severity: Select from *Not classified*/*Information*/*Warning*/*Mild problem*/*Serious problem*/*Critical problem*
  • Problem condition expression: An expression that evaluates to true in normal states, and false in problem states
    For details, see *Problem condition expressions* below
  • OK event generation: Select from *Expression* (using a problem condition expression), *Recovery expression*, or *None*
    An OK event occurs if the conditional expression evaluates to true when *Expression* is selected, or if the conditional expression and recovery expression both evaluate to true when *Recovery expression* is selected. When *None* is selected, an OK event only occurs if a problem is closed manually
  • Problem event generation mode: Select from *Single* (only generated at first occurrence) or *Multiple* (generated every time the conditional expression is satisfied)
  • Recovery expression: Describes the conditions under which a problem is resolved
    This can be entered when *Recovery expression* has been selected for *OK event generation*
  • OK event closes: Select from *All problems* or *All problems if tag values match*
    A recovery event is generated for all hosts set with this trigger when *All problems* is selected, and for hosts that match the tag/value when *All problems if tag values match* is selected
  • Tag: You can specify multiple tag/value pairs to be used when an OK event closes
    By setting the tag to a host or service, you can specify the conditions under which the host is recovered
  • Allow manual close: When a problem has occurred, this setting allows it to be closed manually
  • URL: The URL to be used when a problem has occurred with an item
  • Description: A description of the trigger
    Since it can be included in the notification details that are sent when an action occurs, you can use a macro such as HOST.NAME to dynamically change the problem notification details
  • Enabled: Specifies whether or not the trigger itself is enabled

#Event

A trigger generates a problem event when a problem has occurred. An OK event is generated when a problem has returned to normal. These events perform the actions that are executed when problems occur and when they are recovered. The action of generating an OK event is sometimes referred to as “closing” the event.

#Problem condition expressions

Problem condition expressions and recovery expressions are written using the following sort of syntax. The parts enclosed by curly braces correspond to a condition.

“`

{: . () }

“`

Item: Description

——-—–

  • server: Server name
  • key: Item name
  • function: A function that refers to collected values, the current time, and other factors
  • parameter: A value passed to a function as an argument
    A bare number represents the number of seconds before the present time. A number preceded by a hash symbol (’#’) represents the number of observations from the end
  • operator: Calculates the results of an operation applied to conditions and constants in the first half of the expression
  • constant: The operand of the condition in the first half
    Can either be a simple numerical value, or another condition

Some functions operate on numbers, some operate on times, and some operate on strings. Functions that operate on numbers include last (), avg (), count (), min (), max () and sum (). Time-related functions include now (), date () and time (), and string-related functions include diff (), regexp () and strlen (). For more details about functions, see the official documentation.

Operators include numerical operators (+,— , \*, /), numerical comparison operators (\<, \<=, =, =\>, \>, \<=\>), and logical operators (and, or, not).

Operators can be entered directly, or alternatively they may be set by means of the *Add* button to the right of the conditional expression in the Web UI trigger creation/setting dialog. Here, you can select (valid values of) server/key/function/operator from the drop-down menu, and create conditional expressions simply by inserting numbers for the parameters (comparison only) and constants. To set multiple conditional expressions, manually enter a logical operator and press the *Add* button again to add it.

Problem condition expressions can be complicated, so please check the(official documentation for details.

Severity

In the trigger settings, you can specify a level of severity. The severity can be set to one of the following levels (in ascending order of seriousness):

  1. Not classified
  2. Information
  3. Warning
  4. Mild problem
  5. Serious problem
  6. Critical problem

Dependencies

You can specify dependencies in trigger settings. For example, consider the following monitoring setup:

  1. Life/death monitoring of an L3 switch
  2. Life/death monitoring of an L3 switch’s SNMP port
  3. Life/death monitoring of the Zabbix agent in a target host
  4. Monitoring a host’s CPU/memory/HDD

In this case, when a problem has occurred in 1, a problem will also be detected in 2. Similarly, if problems occur in 2 or 3, a problem will also occur in 4. In this case, a dependency relationship of the form 1 ←2 ←3 ←4 can be set. If you do that, then an event will only be reported for the uppermost trigger in which a problem occurred.

Other Events

Zabbix supports other types of event besides Problem/OK events.

  • Event: When generated
  • Service Up: Every time Zabbix detects an active service.
  • Service Down: Every time Zabbix cannot detect a service.
  • Host Up: If at least one of the services is up for the IP.
  • Host Down: If no services are responding.
  • Service Discovered: If a service comes back after downtime or is discovered for the first time.
  • Service Lost: If a service is lost after being up.
  • Host Discovered: If a host comes back after downtime or is discovered for the first time.
  • Host Lost: If a host is lost after being up.

The discovery functions introduced in the last column use events to perform actions.

Stopping an Event manually

Normally, when an event causes a change of state from Problem to OK, the problem event is automatically resolved. However, if it is difficult to make this decision automatically, the problem can be resolved manually. In this case, *Allow manual close* must be enabled in the Zabbix trigger settings.
An Event can be manually closed from the Update problem dialog in the Zabbix monitoring window.

Conclusion

In this column, we looked at Zabbix triggers. These allow you to determine whether an item has a normal value or a problem value.

In the next column, I’ll explain how actions can be used to perform certain tasks in response to events generated by triggers.

Satoru Miyazaki

PREVIOUS ARTICLE NEXT ARTICLE