Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-27

La moisson de liens pour la semaine du 4 au 8 juillet 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

Privacy Shield : un « bouclier » troué à refuser!
Aujourd’hui, 8 juillet 2016, les États membres de l’Union européenne, réunis dans ce qu’on appelle le « comité de l’article 31 », devront se prononcer sur l’adoption de la décision d’adéquation qui encadrera les échanges de données personnelles entre les États-Unis et l’Union européenne : le Privacy Shield. Cette décision, adoptée dans la plus grande précipitation, ne répond pas aux inquiétudes exprimées ces dernières semaines à tour de rôle par le groupe des CNILs européennes, le Parlement européen et différents gouvernements européens, ainsi que par les associations de défense des droits.
RFC 7905: ChaCha20-Poly1305 Cipher Suites for Transport Layer Security (TLS)
Ce court RFC ajoute les algorithmes de cryptographie ChaCha20 et Poly1305 à la liste de ceux utilisables dans le protocole TLS.
hacking DNS – ça peut vous arriver aussi
L’été dernier, ma mère s’est offert une tablette. Mais après quelques mois d’usage, son enthousiasme est retombé : la navigation était devenue déplaisante, car si Facebook fonctionnait, il était devenu impossible de se rendre sur des sites web, car des pages s’ouvraient automatiquement par salves, affichant des publicités pour des sites douteux :iPhones à un euro, pornographie et, bien sûr, des fausses alertes Google ou Microsoft qui avertissent l’utilisateur que son ordinateur n’est plus en sécurité et qu’il ne lui reste que quelques minutes pour installer un logiciel bizarre pour y remédier. Une annonce de virus auto-réalisatrice!
Sécurité des systèmes d’information des OIV : l’avis de NBS System
En décembre 2013 ont été publiées les premières mesures de protection des OIV. Les OIV, Opérateurs d’Importance Vitale, sont des entreprises, privée ou publiques, identifiées comme ayant des activités indispensables ou dangereuses pour la population. En France, on en compte environ 200, même s’il est impossible de trouver une liste les répertoriant (pour des raisons évidentes de sécurité).
Is Your ISP Hijacking Your DNS Traffic?
The answer to above question used to be difficult. You had to be an expert to find out the answer. In fact, most people don’t even notice, nor care.
Playing with the dnstraceroute tool (see on GitHub), I noticed that it is a common practice for service providers to hijack and redirect DNS traffic to their local DNS servers. So if you thought you were using Google’s Public DNS Server or Verisign’s, you may want to think twice.

System Engineering

Live Debugging with Docker
During the DockerCon 2016 keynote, I demonstrated a development workflow with Docker for Mac, going from a fresh laptop to a running app in no time. The especially cool part was when I live-debugged a Node.js app running inside a container from my IDE, despite having no Node.js runtime installed on my laptop. Here I’m going to show you how to do it yourself.
Why we use the Linux kernel’s TCP stack
A recent blog post posed the question Why do we use the Linux kernel’s TCP stack?. It triggered a very interesting discussion on Hacker News.
I’ve also thought about this question while working at CloudFlare. My experience mostly comes from working with thousands of production machines here and I can try to answer the question from that perspective.
Introducing OpenCellular: An open source wireless access platform
As of the end of 2015, more than 4 billion people were still not connected to the internet, and 10 percent of the world’s population were living outside the range of cellular connectivity. Despite the widespread global adoption of mobile phones over the last 20 years, the cellular infrastructure required to support basic connectivity and more advanced capabilities like broadband is still unavailable or unaffordable in many parts of the world. At Facebook, we want to help solve this problem, and we are pursuing multiple approaches aimed at improving connectivity infrastructure and lowering the cost of deploying and operating that infrastructure.
Using Honcho to Create a Multi-Process Docker Container
A common misconception is that Docker is only for creating single-process or single-service containers. While it’s true that the Dockerfile and docker run command options are designed for running a single process, that doesn’t mean that Docker itself doesn’t allow for a multi-process Docker container.
In fact, Docker’s documentation has a very useful tutorial on how to run multi-process containers using Supervisor to manage the processes within the container.
Pocket Watch: Verifying Exabytes of Data
There is nothing more important to Dropbox than the safety of our user data. When we set out to build Magic Pocket, our in-house multi-exabyte storage system, durability was the requirement that underscored all aspects of the design and implementation. In this post we’ll discuss the mechanisms we use to ensure that Magic Pocket constantly maintains its extremely high level of durability. Delivers Live Video Streaming to 500,000+ Viewers with NGINX
In 2012,, the number one media and entertainment portal in Brazil, had over a million visitors a day and was experiencing unprecedented growth in its online viewership. As the online branch of the country’s top television brand Grupo Globo, is responsible for online distribution of the media company’s news, sports, and entertainment to a massive, and yet still fast-growing, audience.
Ansible, à la rescousse en cas de crash serveur
Il y a de cela une dizaine de jours, la partition système d’un serveur d’un de nos clients est passé en lecture seule suite à un problème de consistence sur le disque. Pour les services en cours et ne dépendant pas de fichiers sur cette partition, les services continuaient de fonctionner. Pour les autres, ils étaients hors service ou dans une situation de dsyfonctionnement dès lors qu’ils avaient besoin d’écrire un fichier sur la partition système.
http2 explained
http2 explained describes the protocol HTTP/2 at a technical and protocol level. Background, the protocol, the implementations and the future. Written by Daniel Stenberg.
This is a « living document » in the sense that I keep posting updates, and I care about and value feedback, questions and comments I get about it. This document improves over time thanks to a joint effort. Full credits to all helpers at the end of the document.
Running Jenkins jobs in Docker containers
One of my main tasks at work is to configure Jenkins to act as a hub for all the deployment and automated testing jobs we run. We use CloudBees Jenkins Enterprise, mostly for its Role-Based Access Control plugin, which allows us to create one Jenkins folder per project/application and establish fine grained access control to that folder for groups of users. We also make heavy use of the Jenkins Enterprise Pipeline features (which I think are also available these days in the open source version).
Updates to Performance and Scalability in Kubernetes 1.3 – 2,000 node 60,000 pod clusters
We are proud to announce that with the release of version 1.3, Kubernetes now supports 2000-node clusters with even better end-to-end pod startup time. The latency of our API calls are within our one-second Service Level Objective (SLO) and most of them are even an order of magnitude better than that. It is possible to run larger deployments than a 2,000 node cluster, but performance may be degraded and it may not meet our strict SLO.


Why Adaptive Fault Detection is Powerful and Unique
Adaptive Fault Detection is an algorithm-based technology and one of the important components that makes VividCortex effective and singular. Unlike other monitoring methodologies — such as anomaly detection or threshold alerting — fault detection is designed to detect events that are, by definition, detrimental to a system. It looks for issues that actually prevent work from completeing — not just anomalies or outliers. With this quick blog post, we want to help readers understand the definition and value of fault detection.
Finding the Needle in a Haystack: Anomaly Detection with the ELK Stack
It’s the middle of the night. Your mobile starts vibrating. On the other end, it’s a frantic customer (or your boss) complaining that the website is down. You enter crisis mode and start the troubleshooting process, which involves ingesting a large amount of caffeine and — usually — ends with the problem being solved.
Container Monitoring: Top Docker Metrics to Watch
Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run.

Software Engineering

Inside the Algolia EnginePart 4—Textual Relevance
The way we search has changed a lot in the past decade. The original function of the Enter key was to begin a search, today it’s used to select a result that’s already been displayed. The type of information that people search for has changed too.
Today’s search engines are used for much more than just web sites and documents—they’re also used to find specific items like people, places and products.
Why Clicking Send Doesn’t Mean Your Emails Will Be Delivered
When you think about the reputation of your business, customer service and product quality are key attributes that come to mind. Email communication practices can also contribute to the reputation of a business. For the purpose of this post, I’d like to deep dive into email reputation.
Poor email reputation directly impacts deliverability, slowing down your campaigns and sending them to spam folders. A good email reputation helps get more emails delivered to your recipients’ inboxes without delay. In addition, many of the techniques for maintaining and improving reputation will make emails more engaging, increasing open rates, click-through rates and other measures of success all driving towards growth.
As the technical account manager for our managed service product, I’m constantly working with large businesses and big brands on ways they can get more emails delivered to their intended recipients. Often times, email is the core driver of their growth and so if emails are not sent, growth is directly impacted.
A Universal Slack Event Router
It’s no secret that more and more teams nowadays live on Slack. Discussions, internal and external events, application notifications are examples of the many things that end up in Slack to help teams be more efficient.
Our company is no exception: Slack is a central part of our workflow. As a consequence, it’s natural for us to think that our products should be deeply integrated with this workflow. After all, a monitoring application like Sysdig Cloud is the window into your own applications and infrastructure and, as such, it can be far more effective if it is aware of what’s happening in the rest of your organization.
Machine Learning Driven Programming: A New Programming for a New World
Like the weather, everybody complains about programming, but nobody does anything about it. That’s changing and like an unexpected storm the change comes from an unexpected direction: Machine Learning / Deep Learning.
Beyond Continuous Deployment
If I were to whittle the principle behind all modern software development approaches into one word, that word would be: feedback. By « modern approaches », I’m referring to DevOps, Continuous Integration (CI), Continuous Delivery (CD), Continuous Deployment, Microservices, and so on. For definitions of these terms, see the Stelligent Glossary.
It’s not just feedback: it’s fast and effective feedback. It’s incorporating that feedback into subsequent behavior. It’s amplifying this feedback back into the development process to affect future work as soon as possible.
In this post, I describe how we can move beyond continuous deployment by focusing on the principle of feedback.
A new angle to Pipeline-as-Code
Last year we introduced the « Pipeline-as-Code » system, which allows you to define your complete Jenkins Pipeline project—building, testing, staging or deploying, whatever you need—from a Jenkinsfile script right in the same source code repository as your project itself, for compatible SCM systems. With this « marker file » standardized, Jenkins is able to detect buildable branches and automatically create subprojects for them, and even detect buildable repositories within an organization, keeping the configuration inside Jenkins itself to a minimum.
Parsing Binary Data Formats
To do that, I had to write an image decoder first because you can’t just start changing random bytes. All that will do is make the image invalid or corrupt. I selected PNG as the image format I wanted to work with, and I found the official specification so I could get started writing code. The next thing I needed was a way to parse the binary data from an image into something I could use.
Product Integration Testing at the Speed of Netflix
This story started with me wanting to write a steganography tool.

Steganography is the process of hiding information inside a file, typically an image.
The Netflix member experience is delivered using a micro-service architecture and is personalized to each of our 80+ million members. These services are owned by multiple teams, each having their own lifecycle of build and release. This means it is imperative to have a vigilant and knowledgeable Integration Test team that ensures end-to-end quality standards are maintained even as microservices are deployed every day in a decentralized fashion.

Top 10 Best Practices for Jenkins Pipeline Plugin
The Jenkins Pipeline plugin is a game changer for Jenkins users. Based on a Domain Specific Language (DSL) in Groovy, the Pipeline plugin makes pipelines scriptable and it is an incredibly powerful way to develop complex, multi-step DevOps pipelines. This document captures some definite Do’s and Don’ts of writing Jenkins Pipelines - with code examples and explanations.
Writing integration tests for RabbitMQ-based components
I’m writing this post after days of misery, reverse-engineering, github browsing and finding (sort of) clever ways to get yet one step closer to the goal mentioned in the title.
Before we dive into the topic, let me clarify a few things. RabbitMQ, an AMQP (advanced message queueing protocol) compatible message oriented middleware, (in my understanding) has no in-memory message broker implementation. ActiveMQ, another AMQP compatible implementation, on the other hand does provide such a component – easy to configure and use. The problem is that ActiveMQ implements version 1.0 of the protocol, while RabbitMQ is on version 0.9.1, and the two versions, of course, are not compatible. That is the main reason one might need QPID, a third MOM implementation that comes with an in-memory message broker and is able to « speak » multiple versions of the protocol.

Databases Engineering


How we reindexed 36 billion documents in 5 days within the same Elasticsearch cluster
At Synthesio, we use ElasticSearch at various places to run complex queries that fetch up to 50 million rich documents out of tens of billion in the blink of an eye. Elasticsearch makes it fast and easily scalable where running the same queries over multiple MySQL clusters would take minutes and crash a few servers on the way. Every day, we push Elasticsearch boundaries further, and going deeper and deeper in its internals leads to even more love.
ElasticSearch cluster rolling restart at the speed of light with rack awareness
ElasticSearch is an awesome piece of software, but some management operations can be quite a pain in the administrator’s ass. Performing a rolling restart of your cluster without downtime is one of them. On a 30 something server cluster running up to 900GB shards, it would take up to 3 days. Hopefully, we’re now able to do it in less than 30 minutes on a 70 nodes with more than 100TB of data.

MySQL & MariaDB

Temporary tables and MySQL STATUS information
When analysing MySQL configuration and status information at customers it is always interesting to see how the applications behave. This can partially be seen by the output of the SHOW GLOBAL STATUS command. See also Reading MySQL fingerprints.
Pipelining versus Parallel Query Execution with MySQL 5.7 X Plugin
n this blog post, we’ll look at pipelining versus parallel query execution when using X Plugin for MySQL 5.7.
In my previous blog post, I showed how to useX Plugin for MySQL 5.7 for parallel query execution.
MySQL 5.7, utf8mb4 and the load data infile
In this post, I’ll discuss how MySQL 5.7 handles UTF8MB4 and the load data infile.
Many of my clients have told me that they do not like using the LOAD DATA INFILE statement and prefer to manually parse and load the data. The main reason they do it is issues with the character sets, specifically UTF8MB4 and the load data infile. This was surprising to me as nowadays everyone uses UTF8. MySQL 5.7 (as well as 5.6) has full support for UTF8MB4, which should fix any remaining issues.


Do You Need to Put Your Query on a Budget?
Before we scare you away with the word « budget, » rest assured that after reading this blog, you won’t have to give up your favorite activities or sell your car. What you will be able to do is understand how Vertica resource pool parameters affect query budget. An insufficient query budget could be slowing your queries down. But fear not. Like a balance book (yes, they still exist), Vertica provides you with a way to review and alter resource budgets for your queries.

Data Engineering & Analytic

Tinder and the Dating App Retention Paradox
Tinder is more than the most popular dating app on the market—it’s one of the most powerfully sticky and addictive apps period. Billions of swipes and tens of millions of matches are recorded every single day. Average usage across both male and female users is somewhere around 90 minutes a day.
The 7 Fundamental Steps to Complete a Data Project
It’s hard to know where to start once you’ve decided that yes, you want to become more data-driven. Just looking at all the technologies you have to understand and all the languages you’re supposed to master is enough to make your dizzy.
Well, building your first data project is actually not that hard. And yes, Dataiku DSS helps, but what will really helps you is understanding the data science process. Becoming data driven is about this: knowing the basic steps and following them to go from raw data to building a machine learning model.

Network Engineering

Troubleshooting RIPE Atlas Probes: USB Sticks
Some of the third version of RIPE Atlas probes have recently had an issue with their USB sticks. We’re investigating what may be causing this issue and have a possible solution, outlined below. (At the same time, we’re also looking into a new hardware solution for the future.) If you’ve had trouble with your probe, please follow these simple steps. RIPE Atlas users everywhere will thank you for getting your probe back online - and we will, too!
The scalable, open source big data analytics platform for networks and services
PNDA is a simple, scalable, open big data platform supporting operational and business intelligence analysis for networks and services. This guide provides an overview of PNDA, and will tell you how to set up and use PNDA in your own environment.

Management & Organization

Transparency Is Clearly the Secret to Good Management
Can you keep a secret? Well, don’t—at least not if you’re an engineering manager.
The best managers—and naturally I include New Relic’s engineering managers in that category—know transparency is key. My philosophy: 95% of what I know isn’t a secret, and there’s no reason for it to be. Why? Because managing without secrets helps you empower your team and yourself.
Don’t fire the underperformers (yet)
Soon or later, every company ends hiring underperformers. Often unnoticed in large corporations, they can be fatal to small businesses where everyone counts in large amount.
The myth of the « always at 200% » team
I experienced that state of mind, and it didn’t turn well. Employees of the company would stay awake late to be sure they would not miss an email from the boss. They wanted to be the first ones to answer to show how reactive and motivated they were. The ugly truth was, none of us was working efficiently building a great company. We were slacking on the Web late at night, checking our email from time to time, just in case something would happen. After one year, we all stopped pretending, and an incredible percentage of the team divorced.
« Plus vous êtes optimiste, plus vous êtes innovant, » pour le PDG de Baidu, le Google chinois
Robin Li, le très discret PDG de Baidu, le Google chinois était à Paris à l’occasion de Viva Tech. Il y a détaillé une stratégie fort différente de celle de son rival américain. Une approche du marché à laquelle les annonceurs français vont devoir se plier s’ils veulent conquérir l’empire du milieu. « Nous comprenons mieux les gens que ne le font les banques, » annonce en particulier le PDG.