This tutorial on troubleshooting poller issues is the first installment of a series devoted to the resolution of common issues reported by clients and managed by our teams.
Poller issues generally take two forms
#1: The poller is stated as “not running” on the poller configuration page:
#2: The last update time is older than 15 minutes and highlighted in yellow:
Solve this issue in Centreon 21.04
For the purpose of this tutorial, we will use a simple distributed Centreon platform with the following assets:
- a central server with an embedded DB instance (IP 192.168.56.125)
- a poller (IP 192.168.56.126)
Note that in the following examples, commands will be shown as executed with the root user. Never forget that with great power comes great responsibilities!
Quick tour: Overview of a Centreon typical architecture
For efficient and targeted troubleshooting, you need to know what the different components of a Centreon platform are, and how they interact with each other. As a quick reminder, here’s a platform example comprising a central server and a poller.
- The poller’s monitoring engine (centreon-engine) executes a bunch of monitoring scripts ( probes) to query the monitored devices.
- Results are returned by the centreon-engine’s cbmod module to the centreon-broker instance on the central (BBDO protocol, TCP 5669).
- The central server’s centreon-broker module handles and processes the results, populating DB tables or generating graphical RRD files.
- The web interface (Apache server and PHP backend) displays the monitoring information to the user.
- The centreon-gorgone module exports to the pollers the configurations made on the interface and launches external commands; centreon-gorgone is a client/server module, active both on the central server and in the pollers (ZMQ protocol TCP 5556)
Now that we know who does what in a Centreon architecture, let’s get to the troubleshooting
Check 1 : network connections
The poller must be connected to the TCP port 5669 of the Central server (BBDO flows). Conversely, the central server must be connected to the TCP port 5556 of the poller (ZMQ centreon-gorgone flows)
The netstat command is helpful to check this, but ss can also be used.
When executing this command on the poller, we should get the following results:
[root@centreon-poller ~]# netstat -plant | egrep '5556|5669' tcp 0 0 0.0.0.0:5556 0.0.0.0:* LISTEN 3761/perl tcp 0 0 192.168.56.126:5556 192.168.56.125:57466 ESTABLISHED 3761/perl tcp 0 0 192.168.56.126:33598 192.168.56.125:5669 ESTABLISHED 3554/centengine
From the Central, the results will be the same, but in the opposite direction:
[root@centreon-central ~]# netstat -plant | egrep '5669|5556' tcp 0 0 0.0.0.0:5669 0.0.0.0:* LISTEN 1436/cbd tcp 0 0 192.168.56.125:57466 192.168.56.126:5556 ESTABLISHED 1781/gorgone-proxy tcp 0 0 192.168.56.125:5669 192.168.56.126:33574 ESTABLISHED 1436/cbd tcp 0 0 127.0.0.1:5669 127.0.0.1:58200 ESTABLISHED 1436/cbd tcp 0 0 127.0.0.1:58200 127.0.0.1:5669 ESTABLISHED 1430/centengine
Should the status be different from “ESTABLISHED”, checking data flows and firewall logs will usually help locate the problem. .
Another common reason of the network connection failure is that the Linux firewall may be running on either the central or the poller.
This can be easily checked by launching the following command:
[root@centreon-central ~]# systemctl status firewalld firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1)
If the firewalld process is still active, stop and disable it:
[root@centreon-central ~]# systemctl stop firewalld [root@centreon-central ~]# systemctl disable firewalld
If a configuration error is identified at this stage (for example: an erroneous IP address when creating a Poller), it will be necessary to check and correct the entry at various levels of the Centreon configuration, we will see that in a future article. 😉.
Check 2: The centreon-engine process
This step aims to check that the monitoring engine is up and running on the poller. First let’s see if the centengine process (daemon of the centreon-engine module) is acting as it should:
[root@centreon-poller ~]# systemctl status centengine centengine.service - Centreon Engine Loaded: loaded (/usr/lib/systemd/system/centengine.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2021-02-10 09:56:35 CET; 1h 6min ago Main PID: 4463 (centengine) CGroup: /system.slice/centengine.service └─4463 /usr/sbin/centengine /etc/centreon-engine/centengine.cfg
Current status should be active (running)”. Ensuring the “enabled” state is active is also important, as it allows the daemon to automatically start with the system, in case of a power outage or an unexpected reboot (did someone just pull the plug?).
If status is “disabled”, activate the service at startup with the following command:
[root@centreon-poller ~]# systemctl enable centengine Created symlink from /etc/systemd/system/centreon.service.wants/centengine.service to /usr/lib/systemd/system/centengine.service.
Sometimes the centengine daemon can be reported as dead as in the following example:
[root@poller ~]# systemctl status centengine centengine.service - Centreon Engine Loaded: loaded (/usr/lib/systemd/system/centengine.service; enabled; vendor preset: disabled) Active: inactive (dead) since Wed 2021-02-10 09:56:35 CET; 22s ago Process: 22846 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Process: 4547 ExecStart=/usr/sbin/centengine /etc/centreon-engine/centengine.cfg (code=exited, status=0/SUCCESS) Main PID: 4547 (code=exited, status=0/SUCCESS)
Just try to force the restart: systemctl restart centengine and check the logs in /var/log/centreon-engine/centengine.log for clues about how it ended up this way. The most common reasons would be:
- SELinux configuration
- Rights on folders and files
- Missing libraries or dependencies…
Check 3: The centreon-gorgone process
As you briefly saw above, the centreon-gorgone module (carried by the gorgoned process) was introduced in Centreon 21.04, replacing the former the centcore module.
The centreon-gorgone module provides a client/server connection between central and pollers whereas centcore only knew how to copy files using SSH (pretty much).
A defaulting poller can sometimes be attributed to a fault in the centreon-gorgone module, as it is responsible for collecting the “Is running or not” status of pollers. It may happen, for example, that the monitoring is running normally, returning current and real-time data, but the poller displays as “not running.” Here’s how to run the check in that case.
- First, identify and save the ID of the problematic poller. Go to the Configuration > pollers and hover the mouse on the poller name:
On the URL link at page bottom, the related poller’s ID will show.
- On both sides (central and poller), check that the gorgoned daemon is up and running:
[root@centreon-central ~]# systemctl status gorgoned gorgoned.service - Centreon Gorgone Loaded: loaded (/etc/systemd/system/gorgoned.service; enabled; vendor preset: disabled) Active: active (running) since mer. 2021-02-10 09:56:03 CET; 2h 10min ago Main PID: 3413 (perl) CGroup: /system.slice/gorgoned.service ├─3413 /usr/bin/perl /usr/bin/gorgoned --config=/etc/centreon-gorgone/config.yaml --logfile=/var/log/centreon-gorgone/gorgoned.log --severity=info ├─3421 gorgone-nodes ├─3428 gorgone-dbcleaner ├─3435 gorgone-autodiscovery ├─3442 gorgone-cron ├─3443 gorgone-engine ├─3444 gorgone-statistics ├─3445 gorgone-action ├─3446 gorgone-httpserver ├─3447 gorgone-legacycmd ├─3478 gorgone-proxy ├─3479 gorgone-proxy ├─3486 gorgone-proxy ├─3505 gorgone-proxy └─3506 gorgone-proxy févr. 10 09:56:03 cent7-2010-ems systemd[1]: Started Centreon Gorgone.
[root@centreon-poller centreon-engine]# systemctl status gorgoned gorgoned.service - Centreon Gorgone Loaded: loaded (/etc/systemd/system/gorgoned.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2021-02-10 09:53:16 CET; 2h 13min ago Main PID: 4384 (perl) CGroup: /system.slice/gorgoned.service ├─4384 /usr/bin/perl /usr/bin/gorgoned --config=/etc/centreon-gorgone/config.yaml --logfile=/var/log/centreon-gorgone/gorgoned.log --severity=info ├─4398 gorgone-dbcleaner ├─4399 gorgone-engine └─4400 gorgone-action
- On the central server, check that messages related to the ID are present and do not throw error messages:
- On the poller, check that the gorgoned configuration file/etc/centreon-gorgone/config.d/40-gorgoned.yaml exists and is similar to this one :
[root@centreon-poller centreon-engine]# cat /etc/centreon-gorgone/config.d/40-gorgoned.yaml name: gorgoned-Poller description: Configuration for poller Poller gorgone: gorgonecore: id: 2 external_com_type: tcp external_com_path: "*:5556" authorized_clients: - key: 3H2jXp7D7PC7OTM1ifosCO0l7iqJkf60lHWGWYnR5qY privkey: "/var/lib/centreon-gorgone/.keys/rsakey.priv.pem" pubkey: "/var/lib/centreon-gorgone/.keys/rsakey.pub.pem" modules: - name: action package: gorgone::modules::core::action::hooks enable: true - name: engine package: gorgone::modules::centreon::engine::hooks enable: true command_file: "/var/lib/centreon-engine/rw/centengine.cmd"
If not, check this link to create it again.
Check 4: Managing the centreon-broker process
Centreon-broker flows between the central and the poller are essential for a fully operational and efficient Centreon platform..
Beyond network issues (lost connection BBDO TCPTCP/5669, real-time data processing can potentially be interrupted, which will display the poller as “Not updated.” Here are the steps to check the Centreon broker.
- First, check the version consistency between the cbmod module (on the poller) and the centreon-broker module on the central. They should always be the same:
[root@centreon-central ~]# rpm -qa | grep centreon-broker centreon-broker-cbd-21.04.3-5.el7.centos.x86_64 centreon-broker-storage-21.04.3-5.el7.centos.x86_64 centreon-broker-21.04.3-5.el7.centos.x86_64 centreon-broker-cbmod-21.04.3-5.el7.centos.x86_64 centreon-broker-core-21.04.3-5.el7.centos.x86_64 [root@centreon-poller ~]# rpm -qa | grep cbmod centreon-broker-cbmod-21.04.3-5.el7.centos.x86_64
- On the central server, check that the cbd daemon is up and running:
[root@centreon-central ~]# systemctl status cbd cbd.service - Centreon Broker watchdog Loaded: loaded (/usr/lib/systemd/system/cbd.service; enabled; vendor preset: disabled) Active: active (running) since mer. 2021-02-10 09:41:01 CET; 5h 13min ago Main PID: 1529 (cbwd) CGroup: /system.slice/cbd.service ├─1529 /usr/sbin/cbwd /etc/centreon-broker/watchdog.json ├─1537 /usr/sbin/cbd /etc/centreon-broker/central-broker.json └─1538 /usr/sbin/cbd /etc/centreon-broker/central-rrd.json
Like the other vital daemons, it should appear as “active (running)” and have its 3 children processes: cbwd, cbd broker et cbd rrd.
- Check the logs: they’re located in the /var/log/centreon-broker directory. On the Central, look for the central-broker-master.log file; on the poller, look for the module-poller.log file:
[root@centreon-central ~]# ls -ltr /var/log/centreon-broker/ total 8 -rw-rw-r-- 1 centreon-broker centreon-broker 1039 5 févr. 10:43 central-broker-master.log-20210205 -rw-rw-r-- 1 centreon-broker centreon-broker 360 5 févr. 10:43 central-module-master.log-20210205 -rw-rw-r-- 1 centreon-broker centreon-broker 240 5 févr. 10:43 central-rrd-master.log-20210205 -rw-rw-r-- 1 centreon-broker centreon-broker 3610 5 févr. 10:43 watchdog.log-20210205 -rw-rw-r-- 1 centreon-broker centreon-broker 1402 10 févr. 09:41 watchdog.log -rw-rw-r-- 1 centreon-broker centreon-broker 669 10 févr. 09:41 central-broker-master.log -rw-rw-r-- 1 centreon-broker centreon-broker 200 10 févr. 09:41 central-rrd-master.log -rw-rw-r-- 1 centreon-broker centreon-broker 400 10 févr. 10:00 central-module-master.log [root@centreon-poller ~]# ls -ltr /var/log/centreon-broker/ total 4 -rw-r--r--. 1 centreon-engine centreon-engine 1330 Feb 10 11:12 module-poller.log
These files should not have recent error messages.
In older versions of Centreon we could find these kind of entries:
Error : conflict_manager : error in the main loop
If so, simply restart the centreon-broker daemon on the Central server:
[root@centreon-central ~]# systemctl restart cbd
- Last but not least, on the poller, check that the cbmod module is successfully loaded when centreon-engine starts; this part of information should be in the centengine.log file:
[root@centreon-poller ~]# grep -i cbmod.so /var/log/centreon-engine/centengine.log [1612947395] [4463] Event broker module '/usr/lib64/nagios/cbmod.so' initialized successfully
What’s Next?
In this tutorial, we did a tour of the primary checks that should be done whenever a poller seems to be in a funny state. Of course, other causes of issues exist and sometimes the solutions described here wouldn’t make the trick. Hopefully, this will however give you clues on how to start investigating.
Still blocked and running out of coffee? Keep calm & let’s visit us on our Community Slack! There will always be someone keen to give you a hand (if not running out of coffee). Suggestions are also always welcomed!
In episode 2, we will focus on operational issues: Unable to set a downtime, acknowledge an alarm and so on. Stay tuned!