Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Zabbix conference 2015: Monitoring a billion kilometers of monthly ride sharings

I presented BlaBlaCar’s monitoring platform we built with Zabbix during Zabbix Conference 2015, which held in Riga (Latvia) on 11 & 12 september 2015.

I explained our choices. Here’s a summary of my talk. Video is available here.

Jean Baptiste Favre at Zabbix Conference 2015

Standartize & industrialize everything

Using configuration management system, as well as defining and using standard could sound pretty obvious, but it’s really a key point to build an efficient monitoring platform.
To achieve that, and still being able to scale, we implemented the Zabbix Sender protocol into the 2 main languages we use for monitoring: Python and Java.

python-protobix

Started as a personnal project, python-protobix quickly became the cornerstone of our monitoring.
All our Python probes use python-protobix which allow us to abstract all Zabbix centric stuff, like items formatting and send operations, with great performances, thanks to the trapper mode usage.

Thanks to this module, implementing a new probe is now a fast operation since we only have to focus on metric collection.

Module version 0.0.9 of python-protobix is already in production inside BlaBlaCar and should be publically available at the end of spetember.

All “protobix enabled” probes are also freely available.

jmx-zabbix

On Java side, Arnaud Lemaire implented Zabbix Sender protocol as jmx-zabbix.
jmx-zabbix uses the JMX to get metric from a Java process. This allow a loose coupling between applications and monitoring.

jmx-zabbix can run either integrated into a Java process as library, that’s the mode we use for our internal applications, or as a separate service. We use the last way to monitor external softwares like Cassandra or Elasticsearch.

It needs a configuration file, since you’ll have to map JMX keys & values with Zabbix items. You can find all details on jmx-zabbix github page.

Industrialization

Internet companies, nowaday, hopefully do use configuration management systems. At BlaBlaCar, we’re kind of fans of Chef.

It makes sense to configure your Zabbix server from you configuration management system: it already have all the required informations to be able to add any host into the right group, applying templates and so on.
Since Zabbix offers an API, it should be quite easy to use it to perform required operations.

This setup is still an experiment at BlaBlaCar, but we hope to be able to release something before the end of the year.

Use Low Level Discovery & Zabbix trappers

Monitoring scalability requires performances. Zabbix has 2 interesting features for that purpose:

  1. Low Level Discovery
  2. Zabbix Sender protocol which allow to send items in a sort of bulk

Low Level discovery

Zabbix Low Level Discovery allows you to:

  1. dynamically create items for un undetermined number of resources, like network interfaces or disk mount points. Zabbix-agent already uses LLD for theses 2 examples, but you can set it up for any resources you want, like RabbitMQ queues.
  2. Create items depending of software configuration. We use LLD to discover wether Galera or a specific Storage Engine are enabled on our MariaDB servers. That allows us to have only one template for MariaDB instead of one per confifguration type.

Zabbix trappers

Zabbix trapper items are one of the core part of BlablaCar monitoring. They allow probes to send all informations in a very efficient manner.
Second advantage is that it’s only some JSON formatting, thus quite easy to understand, even for human.

As an example, our biggest hosts at BlaBlaCar can have more than 1k items, which are received by the server in less than 300ms.

Visualization is critical

Collecting and storing metric is great, getting them analysed is better.
You still can define & use Zabbix Triggers only, but you should give your dev & ops teammates a way to visualize them.

Unfortunatly, given my experience, that’s not that easy with Zabbix. First, there’s no easy way to define graphs or dashboards (called screens in Zabbix) with flexible enough criterias.
Second, people are used to dahsboard like Kibana and Grafana, not really with Zabbix web UI anymore.
To ensure user adoption, we had to find a solution.

This solution has been found, it’s called Grafana. A Zabbix DataSource, grafana-zabbix, is available which allows you to build custom dashboard using ata provided by Zabbix API.

Latest Grafana version also supports LDAP authentication which is great for entreprise infrastructure integration.

Slides

Slides from my talk are available:

  • on Speakerdeck
  • on Slideshare

but also :

Video