8.2. Monitoring#

Monitoring in Nubus for UCS collects metrics from managed systems and compares them with predefined alert conditions. This helps administrators detect system operation issues early and respond promptly.

Use this page to set up and manage monitoring for Nubus for UCS. It explains the required components, how to configure and assign alerts, which preconfigured checks are available, and how to create custom alerts.

For information about installing and using the UCS Dashboard, see UCS Dashboard.

8.2.1. Installation#

To install the UCS Dashboard components, see Installation. In addition to the UCS Dashboard components, install the Prometheus Alertmanager app on the system that sends alert notifications. For information about how to install apps through the App Center, see How to install applications.

On each monitored Nubus for UCS system, install the UCS Dashboard Client app. Nubus for UCS installs the univention-monitoring-client package by default for alert functionality. Verify that the univention-monitoring-client package is present before you continue.

Prometheus Alertmanager

The Prometheus Alertmanager app sends notifications for firing alerts, for example by email. Configure the app settings as shown in Fig. 8.4.

The figure shows the Alertmanager app settings for SMTP and email recipients.

Fig. 8.4 Alertmanager app settings for SMTP and email recipients#

Before you configure Alertmanager, make sure that a reachable SMTP server is available and that you have recipient email addresses. Alertmanager supports the SMTP authentication methods PLAIN, LOGIN, and CRAM-MD5. It also supports TLS communication. If you leave all authentication-related fields empty, Alertmanager doesn’t use SMTP authentication. After you save the settings, send a test alert to verify that email notifications work.

univention-monitoring-client

The package univention-monitoring-client provides standard alert plugins that check system health.

You can install the following packages to add alerts beyond the standard checks from the univention-monitoring-client package. Some services install their required package during installation. For example, when you set up the Active Directory Connection, Nubus for UCS also installs the monitoring plugin.

  • univention-monitoring-raid: Monitoring of the software RAID status.

  • univention-monitoring-smart: Test of the S.M.A.R.T. status of hard drives.

  • univention-monitoring-opsi: Test of software distribution OPSI.

  • univention-monitoring-cups: Test of CUPS printing system.

  • univention-monitoring-squid: Test of Squid proxy server.

  • univention-monitoring-samba: Test of the Samba services.

  • univention-monitoring-s4-connector: Test of the S4 Connector.

  • univention-monitoring-ad-connector: Test of the Active Directory Connection.

8.2.2. Configuration#

Use the Management UI and Alertmanager to configure and handle alerts.

You use alerts to monitor a service or status. For example, administrators can monitor free disk space. You can assign any number of computers to an alert object. You manage alerts in the Monitoring module of the Management UI. Use the object type Alert, as shown in Fig. 8.5. For more information, see Alerts section.

Note

Prometheus doesn’t provide an LDAP interface for monitoring configuration. A listener module generates the configuration files when administrators add, edit, or remove alerts.

The figure shows the Monitoring module in the Management UI with the configuration of an alert.

Fig. 8.5 Configuring an alert#

8.2.2.1. General tab - Monitoring management#

This section describes the fields on the General tab in the Monitoring management module of the Management UI.

Name

An unambiguous name for the alert.

Alert group

Defines the group that includes the alert. Multiple alerts can belong to the same group.

Query expression

Defines the Prometheus query expression that triggers the alert. The alert triggers when the query returns a non-empty vector.

For details about the syntax, see the Prometheus documentation.

For clause

Defines how long the query expression must return a non-empty result before the alert triggers.

Summary template

Specifies the alert title. It appears on the alert dashboard and in alert email notifications.

Description template

Specifies the alert description. It appears on the alert dashboard and in alert email notifications.

Labels

Prometheus attaches labels to alerts. These labels help you query alerts. For example, you can use the label severity with the value critical or warning.

Template Values

Query expressions, descriptions, and summaries can use variables. For example, %max% references the value max.

8.2.2.2. Hosts tab - Monitoring management#

This section describes the fields on the Hosts tab in the Monitoring management module of the Management UI.

Assigned hosts

Prometheus runs the query on the computers listed here. The listener module runs the tests for the alert. It replaces %instance% in the query expression with a regular expression that matches the assigned hosts.

8.2.2.3. Assign monitoring alerts to computers#

Prometheus can monitor all computers that you manage in the Management UI.

To assign alerts to a computer, do the following:

  1. In the Management UI, open the Computers module and select the computer.

  2. On the Advanced settings tab, under Alerts, add the monitoring alerts that you want to assign, as shown in Fig. 8.6.

  3. Save your changes.

  4. Verify that the assigned alerts appear in the Assigned monitoring alerts list.

Assigned monitoring alerts

Lists the monitoring alerts assigned to the current computer. You can add or remove alerts here.

The figure shows the assignment of monitoring alerts to a computer.

Fig. 8.6 Assigning alerts to a host#

8.2.2.4. Alert silences#

Use the Alertmanager web interface to silence firing alerts for a specific period. For more information about silences, see the Prometheus Alertmanager documentation.

8.2.3. Preconfigured monitoring checks#

Nubus for UCS sets up basic monitoring tests automatically. All alerts use the severity label with the value critical or warning.

UNIVENTION_DISK_ROOT and UNIVENTION_DISK_ROOT_WARNING

Monitors how full the / partition is. The alert fires if the remaining free space falls below 25% or 10% by default.

UNIVENTION_DNS

Tests the local DNS server and checks whether the public DNS server resolves the hostname www.univention.de. If the UCS domain has no DNS forwarder, the request fails. In that case, you can use the FQDN of the Primary Directory Node, for example, by setting the monitoring/dns/lookup-domain UCR variable, to test name resolution.

UNIVENTION_LDAP_AUTH

Monitors the LDAP server on Nubus for UCS Directory Nodes.

UNIVENTION_LOAD and UNIVENTION_LOAD_WARNING

Monitors the system load.

UNIVENTION_NTP and UNIVENTION_NTP_WARNING

Requests the time from the NTP service on the monitored Nubus for UCS system. If the time differs by more than 60 or 120 seconds, the alert fires.

UNIVENTION_SMTP

Tests whether the SMTP server is reachable. The alert fires if the server isn’t reachable.

UNIVENTION_SSL and UNIVENTION_SSL_WARNING

Tests how long the Nubus for UCS TLS certificates remain valid. Use this plugin only on Primary Directory Node and Backup Directory Node systems.

UNIVENTION_SWAP and UNIVENTION_SWAP_WARNING

Monitors the utilization of the swap partition. The alert fires if the remaining free space falls below the threshold of 40% or 20% by default.

UNIVENTION_REPLICATION and UNIVENTION_REPLICATION_WARNING

Monitors LDAP replication. The alert detects a failed.ldif file, stalled replication, and large differences between transaction IDs.

UNIVENTION_NSCD and UNIVENTION_NSCD2

Tests the availability of the name server cache daemon (NSCD). If no NSCD process runs, the check triggers a critical alert. If more than one process runs, the check triggers a warning alert.

For information about NSCD, see Name service cache daemon.

UNIVENTION_WINBIND

Tests the availability of the Winbind service. If no process runs, the check triggers a critical alert.

UNIVENTION_SMBD

Tests the availability of the Samba service. If no process runs, the check triggers an alert.

UNIVENTION_NMBD

Tests the availability of the NMBD service, which handles the NetBIOS service in Samba. If no process runs, the check triggers an alert.

UNIVENTION_JOINSTATUS and UNIVENTION_JOINSTATUS_WARNING

Tests the join status of a system. If the system hasn’t joined yet, the check triggers a critical alert. If join scripts haven’t run, the check triggers a warning alert.

UNIVENTION_KPASSWDD

Tests the availability of the Kerberos password service. Use this check only on Primary Directory Node and Backup Directory Nodes. If the service doesn’t run exactly one process, the check triggers an alert.

UNIVENTION_PACKAGE_STATUS

Monitors the status of installed Debian packages. If any package has the status half-installed, the check triggers an alert.

UNIVENTION_SLAPD_MDB_MAXSIZE and UNIVENTION_SLAPD_MDB_MAXSIZE_WARNING

Monitors how many free memory pages remain in the mdb backend of SLAPD across multiple directories.

UNIVENTION_LISTENER_MDB_MAXSIZE and UNIVENTION_LISTENER_MDB_MAXSIZE_WARNING

Monitors how many free memory pages remain in the mdb backend of SLAPD across the directories that the Listener uses.

You can use additional monitoring alerts after you install the required packages. For installation details, see Installation.

UNIVENTION_OPSI

Monitors the OPSI daemon. If no OPSI process runs or the OPSI proxy isn’t accessible, the alert fires.

UNIVENTION_SMART_SDA

Tests the S.M.A.R.T. status of the hard drive /dev/sda. Corresponding alerts are available for the hard drives /dev/sdb, /dev/sdc, and /dev/sdd.

UNIVENTION_ADCONNECTOR and UNIVENTION_ADCONNECTOR_WARNING

Monitors the status of the Active Directory Connection:

  • If no connector process runs, the alert fires.

  • If more than one process runs per connector instance, the check triggers a warning alert.

  • If rejects occur, the check triggers a warning alert.

  • If the AD server isn’t reachable, the alert fires.

You can use this plugin in multi-connector instances.

UNIVENTION_CUPS

Monitors the CUPS daemon. If no cupsd process runs or the web interface isn’t accessible, the check triggers a critical alert.

UNIVENTION_SQUID

Monitors the Squid proxy. If no Squid process runs or the Squid proxy isn’t accessible, the alert fires.

UNIVENTION_RAID and UNIVENTION_RAID_WARNING

Monitors RAID device status. The check triggers a warning alert for the following RAID statuses:

  • Rebuilding

  • Reconstruct

  • Replaced Drive

  • Expanding

  • Warning

  • Verify

The check triggers a critical alert for the following RAID statuses:

  • Degraded

  • Dead

  • Failed

  • Error

  • Missing

UNIVENTION_S4CONNECTOR and UNIVENTION_S4CONNECTOR_WARNING

Monitors the Samba server. If the server is reachable and rejects occur, the check triggers a warning alert. If the server isn’t reachable, the check triggers a critical alert.

UNIVENTION_SAMBA_REPLICATION

Monitors Samba replication. The alert fires when replication fails.

8.2.4. Extend monitoring with new alerts#

Use custom alert checks to collect additional metrics and define alerts for them.

Create a custom alert when you want to collect additional metrics and monitor them in Prometheus. Before you begin, make sure that:

  • You have administrative access to the target Nubus for UCS system.

  • The target system includes the required monitoring components.

  • Every target system includes the external programs or libraries that your custom check uses.

Custom alert checks export metrics from the local system to Prometheus. A PromQL query uses these metrics to define the alert. For more information about writing custom checks, see the Prometheus documentation.

To create a custom alert, do the following:

  1. Copy the custom alert check script to /usr/share/univention-monitoring-client/scripts/ on the target Nubus for UCS system.

  2. Replace PLUGIN with the filename of your custom alert check script. Then make the script executable with the command in Listing 8.1.

    Listing 8.1 Make PLUGIN script executable#
    $ chmod a+x PLUGIN
    
  3. Write valid Prometheus metrics to a .prom file in /var/lib/prometheus/node-exporter/.

  4. In the Management UI, configure the custom alert. For details, see Configuration.

  5. Enter the Prometheus expression for the script metric in the Query expression field.

  6. Assign the custom alert to the required systems. For details, see Assign monitoring alerts to computers.

  7. Verify that Prometheus reads the exported metrics. For example, confirm that the metric appears in Prometheus and that the alert is available in the alert configuration.

Note

All alert checks that Nubus for UCS provides use Python. Custom checks can use Perl, Python, or shell scripts. These checks don’t require external libraries or programs unless the script uses them.

See also

Metric and label naming

for information about Prometheus naming conventions.

Exposition formats

for information about text-based format of a .prom file.