Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2015-21

La moisson de liens pour la semaine du 25 au 29 mai 2015. Ils ont, pour la plupart été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security

Exploiting memory corruption bugs in PHP Part 3: Popping Remote Shells
This took longer than expected, but it’s a journey worth taking! This is less descriptive than other blog posts, because I’d like to try the video format out once. AKA, I’m lazy :)
Disappointingly for some, this will be a guide to create a POC. See the video at the end for what a my automated & remote exploit looks like, as well as tips & tricks to get things working in a real environment.
I kept the app stupid to make life easy. It literally attempts to serialize whatever data is sent to the page, after base64_decoding it. More complicated exploits will require a little more fines :).

Databases

What Makes A Database Mature?
Many database vendors would like me to take a look at their products and consider adopting them for all sorts of purposes. Often they’re pitching something quite new and unproven as a replacement for mature, boring technology I’m using happily.
I would consider a new and unproven technology, and I often have. As I’ve written previously, though, a real evaluation takes a lot of effort, and that makes most evaluations non-starters.
Perhaps the most important thing I’m considering is whether the product is mature. There are different levels of maturity, naturally, but I want to understand whether it’s mature enough for me to take a look at it. And in that spirit, it’s worth understanding what makes a database mature.

MySQL

Workload Analysis with MySQL’s Performance Schema
Earlier this spring, we upgraded our database cluster to MySQL 5.6. Along with many other improvements, 5.6 added some exciting new features to the performance schema.
MySQL’s performance schema is a set of tables that MySQL maintains to track internal performance metrics. These tables give us a window into what’s going on in the database—for example, what queries are running, IO wait statistics, and historical performance data.

Architecture

Integrating etckeeper with Logentries & Chef
When working within a team to maintain systems’ infrastructure, properly documenting and communicating changes made to configuration files within /etc is fundamental to preventing knowledge gaps throughout your team.
Rearchitecting GitHub Pages
GitHub Pages, our static site hosting service, has always had a very simple architecture. From launch up until around the beginning of 2015, the entire service ran on a single pair of machines (in active/standby configuration) with all user data stored across 8 DRBD backed partitions. Every 30 minutes, a cron job would run generating an nginx map file mapping hostnames to on-disk paths.
How We Ensure VividCortex Never Loses Data
Adrian Cockcroft really nailed it when he said that a monitoring system has to be more reliable than what it’s monitoring. I don’t mind admitting that in our first year or so, we had some troubles with losing telemetry data. Customers were never sure whether their systems were offline, the agents were down, or we were not collecting the data. Even a few seconds of missing data is glaringly obvious when you have 1-second resolution data. There’s nowhere to hide.
Infrastructure as a Code
Think for a moment about how we treat applications developed by us and their code. We are using version control systems to know exactly who, when, what and why have changed a particular line of code. We are performing reviews and care for quality. We are using pull requests to facilitate development. We have a bunch of good practices and design patterns. We have a lot of test automation implemented and are executing thousands of tests to prove each change is correct. We have staging environments and final smoke tests. We can go on with such a list for quite a long time. What about treating infrastructure in the same manner? Operating system, servers and Continuous Integration configuration can be as big and complex as many applications are. Why do not benefit from the above practices in the context of infrastructure? This is a short summary about what « Infrastructure as a code » or « Programmable Infrastructure » slogans are about.
Consul for Cluster Health Monitoring
If you’re not familiar with Consul, it’s what I call a cluster management tool. It’s composed of a handful of features such as « Service Discovery », « Key Value Store », « DNS Server », « Health Checking », and it’s « Data Center Aware ». It ultimately allows you to manage an infrastructure composed of many applications, dynamically configure them, route traffic to the healthy ones, and reroute traffic away from those that are not healthy.

Software engineering

HTTP Calls and SQL Queries
A couple of months ago, we introduced the Blackfire Enterprise Edition and support for teams. The response has been great so far, and since the launch, we packed a lot of other features. One of them is the ability to understand which HTTP web services or APIs have been called during an HTTP request, and also the exact SQL queries that were executed on your database. It helps a lot when optimizing performance as those are almost always the main bottlenecks.
Mechanical Sympathy: Understanding the Hardware Makes You a Better Developer
I have spent the last few years of my career working in the field of high-performance, low-latency systems. Two observations struck me when I returned to working on the metal:
Many of the assumptions that programmers make outside this esoteric domain are simply wrong.
Lessons learned from high-performance, low-latency systems are applicable to many other problem domains.
Some of these assumptions are pretty basic. For example, which is faster: storing to disk or writing to a cluster pair on your network? If, like me, you survived programming in the late 20th century, you know that the network is slower than disk, but that’s actually wrong! It turns out that it’s much more efficient to use clustering as a backup mechanism than to save everything to disk.
Other assumptions are more general and more damaging. A common meme in our industry is « avoid pre-optimization. » I used to preach this myself, and now I’m very sorry that
I did. These are just a few supposedly obvious principles that truly high-performance systems call into question. For many developers, these rules of thumb appear reasonably inviolable in everyday practice. But as performance demands grow increasingly strict, it becomes proportionally important for developers to understand exactly how systems work—at both the abstract, procedural level, and the level of the metal itself.
The structure of Apache Lucene.
The inestimably noble Apache Software Foundation produces many of the blockbuster products (Ant, CouchDB, Hadoop, JMeter, Maven, OpenOffice, Subversion, etc.) that help build our digital universe. One perhaps less well-known gem is Lucene, which, «  … provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. » Despite its shying from headlines, Lucene forms a quiet but integral component of many Apache (and third-party) projects.
Web Development with Docker, Docker-Machine, Docker-Compose, Tmux, Tmuxinator, and Watchdog
I’m a developer on the Hub team at Docker, Inc. My realm of responsibility spans three different projects: Docker Hub, Registry Hub and www.docker.com. Each of these are Django applications with their own PostgreSQL, Redis, and RabbitMQ instances. I want to be able to « start projects » from one command and not only have everything running, but also have logs, Python shells, file system monitoring, a shell at the root of each project, and git fetch –all without having to type it all myself over and over and over again. This post will describe the development environment I built to accomplish that.

System administration

Usual Debian Server Setup
I manage a few servers for myself, friends and family as well as for the Libravatar project. Here is how I customize recent releases of Debian on those servers.

Log management

10 Things to Consider When Parsing with Logstash
After spending the last couple of weeks using the ELK (Elasticsearch, Logstash, Kibana) stack to process application logs, I have collated the following points that need to be considered while developing Logstash scripts.

Monitoring

Browser Monitoring for GitHub.com
Most large-scale web applications incorporate at least some browser monitoring, collecting metrics about the user experience with JavaScript in the browser, but, as a community, we don’t talk much about what’s working here and what’s not. At GitHub, we’ve taken a slightly different approach than many other companies our size, so we’d like to share an overview of how our browser monitoring setup works.

BI - Data Science

What is the difference between Business Intelligence and Data Science?
In our area of work and as a Business Engineer at Dataiku, people (customers, partners, network, school friends) often ask me: what is the difference between BI and Data Science?
In a previous job, I worked in a Business Intelligence environment. Today, I accompany customers in building their own Data Science Applications for BI purposes. So what’s the difference between these two data centric disciplines?
Applied Data Science: Optimizing checkout times
My neighborhood mini-market has a very clever system for optimizing wait times at the checkout. There are always between 2 and 5 people working in the store, working continuously either at the checkout, or stocking shelves.
When the checkout line gets too long, a call goes out immediately over the loudspeaker, and someone stocking shelves will come and open a new checkout line in less than a minute.

Management & organization

Docker and the Three Ways of DevOps
Have you read Gene Kim’s The Phoenix Project? Some of the principles behind the Phoenix Project and an upcoming book I am co-authoring with Gene (The DevOps Cookbook) have been referred to as the « Three Ways of DevOps ». These are particular patterns of applying DevOps principles in a way that yields high performance outcomes.