Configuring monitoring services using Zabbix

Introduction

This page describes how Alexis Huxley installed and configured a Zabbix server. It is mostly take from the official documentation.

Note:

  • installation uses Postgres as the database backend as installation using mariadb and mysql failed for reasons explained here.

Installing the Zabbix server

  1. Add an entry to /etc/apt/sources.list and the associated repository key by either:
    1. Run …
      wget http://repo.zabbix.com/zabbix/3.2/ubuntu/pool/main/z/zabbix-release/zabbix-release_3.2-1+trusty_all.deb
      dpkg -i zabbix-release_3.2-1+trusty_all.deb
      apt-get update

    or:

    1. Run:
      echo "deb http://repo.zabbix.com/zabbix/3.2/ubuntu trusty main" >> /etc/apt/sources.list
      apt-key adv --quiet --keyserver pgp.mit.edu --recv-keys 79EA5ED4
      apt-key adv --quiet --keyserver pgp.mit.edu --recv-keys A14FE591
      

    or:

    1. modify pcms to perform the above steps.
  2. Install the LAPP stack as follows:
    1. Run:
      apt-get install apache2
      apt-get install postgresql postgresql-client
      apt-get install php7.0-pgsql php7.0-curl php7.0-json php7.0-cgi \
          php7.0 libapache2-mod-php7.0
      
  3. Install Zabbix as follows:
    apt-get install zabbix-server-pgsql zabbix-get zabbix-frontend-php php7.0-bcmath \
        php7.0-mbstring php7.0-xml
    
  4. Automatically authenticate the zabbix database user if it connects from the local host by adding the following to /etc/postgresql/9.5/main/pg_hba.conf:
    local all zabbix trust    #  must appear before 'local all all peer'!

    and running:

    systemctl restart postgresql
  5. Create a database account and a database for Zabbix to use and initialise the database:
    su - postgres
    psql
    CREATE DATABASE zabbix WITH ENCODING='UTF-8';
    CREATE USER zabbix WITH PASSWORD 'zabbix';
    GRANT ALL ON DATABASE zabbix TO zabbix;
    \q
    zcat /usr/share/doc/zabbix-server-pgsql/create.sql.gz | psql -U zabbix zabbix
    exit
  6. Edit /etc/zabbix/zabbix_server.conf and set:
    DBHost=
    DBName=zabbix
    DBUser=zabbix
    DBPassword=zabbix

    (Some of these are already set to these values.)

  7. Start Zabbix by running:
    service zabbix-server start
    update-rc.d zabbix-server enable

    and verify it started by examining the process list and the contents of /var/log/zabbix/zabbix_server.log.

  8. Edit /etc/zabbix/apache.conf and setting date.timezone in the stanza for php7.0.
  9. Fix a bug whereby the Zabbix configuration is not actually loaded by Apache by running:
    ln -s ../conf.d/zabbix /etc/apache2/conf-available/zabbix.conf
    a2enconf zabbix
    service apache2 reload
  10. Start the Zabbix web front-end by running:
    service apache2 restart
  11. Go to http://<your-zabbix-server>/zabbix, navigate through the screens, verifying there are no errors, enter the zabbix database password and finally login (username: Admin; password: zabbix).
  12. If you need to proxy access to Zabbix then add a stanza like the following to the front-end webserver:
        ProxyPass /zabbix/ http://trenne.pasta.net/zabbix/
        ProxyPassReverse /zabbix/ http://trenne.pasta.net/zabbix/
  13. Configure the logging of client IPs as described at Apache procedures.
  14. Enable http authentication as follows:
    1. Go to Administration, then Authentication, select HTTP.
    2. Set up an htdigest by adding a stanza like the following (this must be on the Zabbix server, not any front-end webserver):
      <Location "/zabbix/">
          Require valid-user
              AuthType Digest
              AuthName "Zabbix Service"
              AuthUserFile /etc/apache2/zabbix.htdigest
      </Location
    3. and run:
      a2enmod auth_digest
    4. Add entries to the htdigest file for Admin as follows:
      touch /etc/apache2/zabbix.htdigest
      htdigest /etc/apache2/zabbix.htdigest "Zabbix Service" Admin

Monitoring the Zabbix server

  1. Run:
    apt-get install zabbix-agent
    service zabbix-agent start
  2. In the web interface go to Configuration and the Hosts, locate the server in the host list and click its Disabled link to enable it.

Tuning the server configuration

  1. In order to allow OS identification from the Zabbix web interface, add the following entry to /etc/sudoers using visudo:
    zabbix ALL = NOPASSWD: /usr/bin/nmap
  2. We assume that any Linux host must run sshd, ntpd and the Zabbix agent and must be pingable. Therefore these will be incorporated into the Linux OS template as follows:
    1. Ensure no hosts are being monitored by Zabbix yet. (If hosts have been added and they already make reference to one of these templates then it will be impossible to add these templates to the Linux OS Template, because that would create duplicate references.)
    2. Navigate to Configuration, then Templates, then Template OS Linux, then click Linked Templates.
    3. Add Template App SSH Service, Template App NTP Service. (Template App Zabbix agent should already be present by default.)
    4. Click update and confirm that succeeded.

    and make Zabbix hide problems with these services when there are ping problems (so that only the ping problem shows up) as follows:

    1. Navigate to Configuration, then Templates then Template App SSH Service, then Triggers, then SSH service is down on {HOST.NAME}, then Dependencies.
    2. Add all three ping-related triggers as dependencies. (It is not enough to choose only the two that depend on the third!)
    3. Click update and confirm that succeeded.
    4. Repeat for Template App NTP Service and Template App Zabbix Agent (using the ‘Zabbix agent on {HOST.NAME} is unreachable for 5 minutes trigger for the latter).
  3. The discovery agent (I know that’s not what it’s really called, but I’m not familiar enough with Zabbix yet to correctly name it) sets pre-defined thresholds for several items in its triggers. These thresholds may need to be adjusted to be either host- or host-thing-specific (e.g. host-filesystem-specific). In overview, the idea is:
    1. provide a global default threshold in a context-aware usermacro (i.e. a global variable)
    2. provide a host- or host-thing-specific threshold (i.e. a local variable)
    3. update the trigger expression to use the specified threshold (i.e. the local variable if it is specified otherwise the global variable)

    Although we do it in a slightly different order to simplify the procedure:

    1. Go to ‘Monitoring’, then ‘Problems’ and find the problem in the list; you may need to adjust the filters, etc. to make it easier to find.
    2. Click on the value in the ‘Problem’ column, and in the popup menu select ‘Configuration’; this should lead to a host-specific trigger configuration screen. (You can see that it is a host-specific trigger because the expression mentions a specific hostname although the expression should not be editable.)
    3. At the top of the screen it should say either ‘Parent triggers’ and provide a link to the parent trigger, in which case:
      1. Set the host-specific value for the macro as follows:
        1. Below where it says ‘Triggers’, near the upper left hand corner, should be ‘All hosts / <specific-hostname>’. Open the specific hostname name (e.g. ‘rotini.pasta.net’) in a new browser tab and click on ‘Macros’.
        2. Create a macro with a suitable name (e.g.  {$PROCS_COUNT_THRESHOLD}) and set its value to the host-specific value you want.
        3. Click update, confirm that succeeded, and close that tab.
      2. Click the link to the parent triggers (e.g. Template OS Linux). This leads to a ‘Triggers’ screen.
      3. Set a global default value for your macro as follows:
        1. Below where it says ‘Triggers’, near the upper left hand corner, should be ‘All templates / Template …’. Open the template name (e.g. ‘Linux OS Template’) in a new browser tab and click on ‘Macros’
        2. Create a macro with a suitable name (e.g.  {$PROCS_COUNT_THRESHOLD}) and set its value to that that was defined in the expression on the ‘Triggers’ screen.
        3. Click update, confirm that succeeded, and close that tab.
      4. Update the trigger to use the values provided:
        1. Update the expression to use the macro (e.g. replace 300 with {$PROCS_COUNT_THRESHOLD}).
        2. Update the name of the trigger prototype as well (e.g. ‘More than {$PROCS_COUNT_THRESHOLD} processes on {HOST.NAME}’).
        3. Click update and confirm that succeeded.

      or it should say ‘Discovered by’ and provide a link to the LLD, in which case:

      1. Click the link to the LLD. This leads to a ‘Trigger prototypes’ screen.
      2. Set the host-specific value for the macro as follows:
        1. Open the blue portion of the trigger prototype name in a new tab; this should lead to a host-specific trigger prototypes screen. (You can see that it is a host-specific trigger prototype because the expression mentions a specific hostname although the expression should not be editable.)
        2. Click the name of the specific host near the top of the screen and then click ‘Macros’.
        3. Add  a macro with a suitable name and context (e.g. {$FS_FREE_THRESHOLD:/vol/small}, {$FS_FREE_THRESHOLD:/staging/mail}) and set its value to the host-thing-specific value you want.
        4. Click update, confirm that succeeded, and close that tab.
      3. Click the grey portion of the trigger prototype name; this leads to a template-specific trigger prototypes screen.
      4. Locate the offending trigger prototype and click its name; this should lead to a non-host-specific trigger prototypes. (You can see that it is a non-host-specific trigger prototype because the expression does not mention any hostnames and the expression should be editable.)
      5. Set a global default value for your macro as follows:
        1. Below where it says ‘Trigger prototypes’, near the upper left hand corner, should be ‘All templates / Template …’. Open the template name (e.g. ‘Linux OS Template’) in a new browser tab and click on ‘Macros’
        2. Create a macro with a suitable name (e.g.  {$FS_FREE_THRESHOLD}, {$SWAP_FREE_THRESHOLD}) and set its value to that that was defined in the expression on the ‘Trigger prototypes’ screen.
        3. Click update, confirm that succeeded, and close that tab.
      6. Update the trigger to use the values provided:
        1. Update the expression to use the macro (e.g. replace 20 with {$FS_FREE_THRESHOLD:”{#FSNAME}”}).
        2. Update the name of the trigger prototype as well (e.g. ‘Free disk space is less than {$FS_FREE_THRESHOLD:”{#FSNAME}”}% on volume {#FSNAME}’).
        3. Click update and confirm that succeeded.

    Many thanks to ‘whosgonna’ and ‘q1x’ on  irc.freenode.net for this procedure!

  4. Configure email alerts as follows:
    1. Go to Administration, then Media Types, then Email, and set:
      • SMTP server: <your-mail-server>
      • SMTP helo: <zabbix-server-fqhn>
      • SMTP email: zabbix@<your-domain>     (it is essential that this contains an ‘@’ sign)
    2. Go to Configuration, Actions and then ‘Report problems to Zabbix administrators’, tick Enabled and click Update.
  5. To avoid alerts of the form ‘Zabbix discoverer processes more than 75% busy’, edit //etc/zabbix/zabbix_server.conf and set:
    StartDiscoverers=10

    and then run:

    systemctl restart zabbix-server
  6. I had alerts for ‘Disk I/O is overloaded’ for all systems whenever they ran apt-get update. Make Zabbix more tolerant as follows:
    1. Navigate to Configuration, then Templates, then Template OS Linux, then click Triggers and then click Disk I/O is overloaded on {HOST.NAME}.
    2. In the expression, Change the period over which I/O times are averaged for this trigger from 5m to 10m.
    3. Click update and confirm the trigger was updated successfully.

Adding hosts

  1. On the client run:
    apt-get install zabbix-agent
  2. Edit /etc/zabbix/zabbix_agentd.conf and set:
    Server=192.168.1.15
    ...
    ServerActive=192.168.1.15
    ...
    Hostname=<fqhn>
  3. Don’t forget that the Zabbix server should also be a Zabbix client!
  4. By default I add to the system the following Templates:
    • Linux system: Template OS Linux
    • Windows system: Template OS Windows, Template ICMP Ping (due to some template error, it is not possible to add Template ICMP Ping to the Template OS Windows template).

Adding services

Monitoring KVM

  1. On the KVM servers run:
    apt-get install git
    git clone https://github.com/bushvin/zabbix-kvm-res /usr/local/opt/zabbix-kvm-res
    ln -s ../opt/zabbix-kvm-res/bin/zabbix-kvm-res.py /usr/local/bin
    ln -s ../../../usr/local/opt/zabbix-kvm-res/zabbix_agentd.conf/UserParameters /etc/zabbix/zabbix_agentd.conf.d/zabbix-kvm-res.conf
  2. Copy /usr/local/opt/zabbix-kvm-res/zbx_templates/zabbix_kvm.xml to the system where you’re running a web browser with access to Zabbix, go to Configuration, then Templates, click Import and import the XML file.
  3. Go to Configuration, then Hosts and then click each host that is a KVM server and add the Template App Libvirt template.

Monitoring web services on non-standard ports

This is done by replacing the current port by a macro, defining a default value for the macro and a host-specific version of the macro.

  1. Add a global user macro as follows:
    1. Navigate to Administration, then General, then from the pulldown menu on the right select Macros.
    2. Add a new macro (e.g. {$HTTP_PORT}) and its default value (e.g. 80).
    3. Click update and confirm the macro was added successfully.
  2. Add a host-specific macro for the host with the webserver on the non-standard port as follows:
    1. Navigate to Configuration, then Hosts. then click the hostname and then go to Macros.
    2. Add a new macro with the same name as the global user macro but with its value set to the non-standard port used on this host.
    3. Click update and confirm the macro was added successfully.
  3. Change the trigger definition as follows:
    1. Navigate to Configuration, then Templates, then locate the row for Template App HTTP Service and click Items in that row and then click HTTP Service is running.
    2. Change the key from net.tcp.service[http] to net.tcp.service[http,,{$HTTP_PORT}].
    3. Click update and confirm the macro was added successfully.

Many thanks to ‘_ikke_’ on irc.freenode.net for help with this procedure!

Monitoring Postfix

  1. Run:
    apt-get install logtail

Monitoring your internet connection

  1. Add a host called ‘internet’, with IP 127.0.0.1, using SNMP as the agent (the IP is not relevant for this host and an agent is required and SNMP agent seems not to care if it can’t contact the specified IP).
  2. Add items ‘ping 8.8.8.8’, type ‘Simple check’, key ‘icmpping[8.8.8.8]’

Monitoring Owncloud

This method allows for monitoring the Owncloud login screen at locations at three different degrees of removal. It should be possible to make Zabbix actually log in to Owncloud but that is not covered here.

  1. Create a new template:
    • Name: Template App Owncloud Service
    • Default hostgroup: Linux Servers
    • Linked Templates: Template App HTTP Service

    Note that nested macros are not permitted so there is no point in setting the default value for {$OWNCLOUD_URL} to http://{$HOST.NAME1}/owncloud/ in the template’s macros.

  2. To this template add a web scenario called Owncloud web scenario.
  3. To this web scenario add one step, which is to visit {$OWNCLOUD_URL} and look for the text ‘Username or email’ and check the HTTP error code is 200.
  4. To the template add a trigger:
    • Name: Owncloud web scenario failed on {HOST.NAME}          (yes, it is correct there is no ‘$’)
    • Severity: Average
    • Expression: {Template App Owncloud Service:web.test.fail[Owncloud web scenario].last()}<>0
  5. To the configuration for the host running the Owncloud application, add the Template App Owncloud Service and define the macro {$OWNCLOUD_URL} to be the Owncloud direct URL.
  6. Repeat the previous step but this time for the front-end proxying webserver. In addition make a trigger dependency from the host-specific instantiation of the template trigger in the front -end proxying webserver to the same in the back-end proxying webserver. (i.e. from host to host, not from template to template).
  7. Repeat the previous step but this time for the pseudo-host ‘internet’, setting {$OWNCLOUD_URL} to be the front-end URL as seen by an open proxy on the internet (e.g. http://anonymouse.org/anonwww.html).

Monitoring Windows XP

  1. Install the Zabbix agent for windows (available here).
  2. Navigate to Control Panel, then Windows Firewall, then Exceptions.
  3. Allow incoming connections to TCP 10050.
  4. Enable the Windows File and Printer sharing service (otherwise Zabbix will complain).

Adding users

See also