Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2015-25

La moisson de liens pour la semaine du 15 au 19 juin 2015. Ils ont, pour la plupart été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Architecture

Automated Deploys with Docker Hub and Nirmata
As companies strive to attain achieve software development agility, they are looking to automate each phase of their software development pipeline. The ultimate goal is to fully automate the deployment of code to production, triggered when a developer checks in a fix or a feature. Companies such as Netflix have built extensive tooling to enable their developers to achieve this level of sophisticated automation. In this post, I will describe how deployments of containerized applications can be completely automated with GitHub, Docker Hub and Nirmata.

System Engineering

Using HAProxy for Loadbalancing 2 Web Servers in Docker
In this blog I will share the steps I made to use HAProxy (running in a docker container) to loadbalance my web requests between two apache web servers.
I will also show how to add a health check so HAProxy will be able to detect when a webserver is down.
The two apache webservers in this blog are assumed to be available at hostname ‘apache1’ and ‘apache2’.
Thread Pools in NGINX Boost Performance 9x!
It’s well known that NGINX uses an asynchronous, event-driven approach to handling connections. This means that instead of creating another dedicated process or thread for each request (like servers with a traditional architecture), it handles multiple connections and requests in one worker process. To achieve this, NGINX works with sockets in a non-blocking mode and uses efficient methods such as epoll and kqueue.
When Solid State Drives are not that solid
It looked just like another page in the middle of the night. One of the servers of our search API stopped processing the indexing jobs for an unknown reason. Since we build systems in Algolia for high availability and resiliency, nothing bad was happening. The new API calls were correctly redirected to the rest of the healthy machines in the cluster and the only impact on the service was one woken-up engineer. It was time to find out what was going on.
Transparent Huge Pages and Alternative Memory Allocators: A Cautionary Tale
Recently, our site reliability engineering team started getting alerted about memory pressure on some of our Redis instances which have very small working sets.1 As we started digging into the issue, it became clear that there were problems with freeing memory after initial allocation because there were a relatively small number of keys but a comparatively large amount of memory allocated by redis-server processes. Despite initially looking like a leak, the problem was actually an issue between an alternative memory allocator and transparent huge pages.
Brubeck, a statsd-compatible metrics aggregator
One of the key points of GitHub’s engineering culture —and I believe, of any good engineering culture— is our obsession with aggressively measuring everything.
Coda Hale’s seminal talk « Metrics, Metrics Everywhere » has been a cornerstone of our monitoring philosophy. Since the very early days, our engineering team has been extremely performance-focused; our commitment is to ship products that perform as well as they possibly can (« it’s not fully shipped until it’s fast »), and the only way to accomplish this is to reliably and methodically measure, monitor and graph every single part of our production systems. We firmly believe that metrics are the most important tool we have to keep GitHub fast.
Inside NGINX: How We Designed for Performance & Scale
NGINX leads the pack in web performance, and it’s all due to the way the software is designed. Whereas many web servers and application servers use a simple threaded or process-based architecture, NGINX stands out with a sophisticated event-driven architecture that enables it to scale to hundreds of thousands of concurrent connections on modern hardware.

Monitoring

Elasticsearch: Powering Real-Time Mobile and Web Analytics for LeadBoxer and Opentracker
Cralan Deutsch and Wart Fransen are co-founders of Netherlands-based LeadBoxer and Opentracker. LeadBoxer is an actionable tool for B2B sales agents which delivers qualified leads via real-time signals, based on proprietary customisable lead scoring technology. Opentracker specialises in web tracking, data analytics and statistics innovation, while its hallmarks are simple, intuitive, and easy-to-read reporting interfaces, combined with an enterprise-class API.
Creating Meaningful Documents from Application Logs that Span Multiple Lines – Part 1
When dealing with application logs, one of the most common scenarios is to read and process logs that span multiple lines. The most common example of log information that spans multiple lines, are those generated by Java and .NET. When processing such logs using tools like Logstash, we are faced with a dilemma. The choice is between treating each line as a separate entry, versus treating all entries to be a part of a single block of text. If we treat each line separately, we have the challenge of correlating all the entries into a coherent group that can be processed. If we treat the data as a single block of text, any business logic that depends on this text, has to perform text processing and extract required information.
Creating Meaningful Documents from Application Logs that Span Multiple Lines – Part 2
As you might be aware, Logstash is popularly used along with two other tools, namely Elasticsearch and Kibana, with the three being commonly referred to as ‘ELK stack of tools’. In most cases, log information parsed by Logstash is stored as documents inside Elasticsearch, which is a high-performance search engine.
Burrow: Kafka Consumer Monitoring Reinvented
One of the responsibilities of the Data Infrastructure SRE team is to monitor the Apache Kafka infrastructure, the core pipeline for much of LinkedIn’s data, in the most effective way to ensure 100% availability. We have recently developed a new method for monitoring Kafka consumers that we are pleased to release as an open source project - Burrow. Named after Franz Kafka’s unfinished short story, Burrow digs through the maze of message offsets from both the brokers and consumers to present a concise, but complete, view of the state of each subscriber.
Creating Meaningful Documents from Application Logs that Span Multiple Lines – Part 1
When dealing with application logs, one of the most common scenarios is to read and process logs that span multiple lines. The most common example of log information that spans multiple lines, are those generated by Java and .NET. When processing such logs using tools like Logstash, we are faced with a dilemma. The choice is between treating each line as a separate entry, versus treating all entries to be a part of a single block of text. If we treat each line separately, we have the challenge of correlating all the entries into a coherent group that can be processed. If we treat the data as a single block of text, any business logic that depends on this text, has to perform text processing and extract required information.
Exception Monitoring and Response
Like most software applications, GitHub can generate a few exceptions. Incoming exceptions range from system-level issues including Git timeouts and missing references, to application-level issues including simple code mistakes and JavaScript errors.
We take stability and performance seriously, so we need a way to quickly identify issues as they surface, determine the best teams or individuals to ping, and ship any relevant changes as soon as possible. Haystack helps us do that.

Software Engineering

Simple Data Analysis Using Apache Spark
Apache spark is a framework for distributed computing. A typical Spark program runs parallel to many nodes in a cluster. For a detail and excellent introduction for spark please have a look into Apache spark website (https://spark.apache.org/documentation.html).
The purpose of this tutorial is to walk through a simple spark example, by setting the development environment and doing some simple analysis on a sample data file compose of userId, age, gender, profession, and zip code (you can download the source and the data file from Github https://github.com/rjilani/SimpleSparkAnalysis).
Getting Code to Production With Less Friction and High Quality
A key point of frustration for many developers is a slow and inflexible release cycle. The slower the release cycle, harder it is for a developer to move features from inception to production. This creates an ever increasing backlog that a developer needs to manage and sustain, when they could be building new features. LinkedIn’s solution is to this challenge is our shift to an increasingly agile release cycle. This allows our developers to build the various part of a feature in an iterative manner and can help make the development cycle easy to manage.
Modern Code Review Practices Are More Than Finding Software Bugs
It’s crazy to think that before social media we may never have been able to connect with people the way we do now. Case and point: two weeks ago, one of my co-workers was skimming through his Twitter feed when he noticed a tweet about a case study on modern day code review. After reading through the abstract, it was quickly sent my way to see if this would be a good piece of content to share on our social network which is dedicated to all things relating to peer code review.
4 Common Mistakes Developers Make When Estimating
So, your boss asked you to estimate a new feature. You talked a bit about that feature with your teammates. Then came silence. And a brain strain. But after a while you finally came up with your estimate of « ahem…1..to…2..days ».
Wrong Buzzer Sound. Oops, wrong (again)! Turned out it took you almost a whole week to implement.
What went wrong? And what goes wrong in general when developers estimate tasks? Let’s find out!
How to Load Test & Tune Performance on Your API (Part II)
Here is the second part of our how-to on running a load test on your API. In the first part, we walked through the process of setting up your load testing environment and deciding what are the right metrics to measure and the different approaches to measuring them. We also provided some guidance on what tools to use and finally obtained real data points about how our API was performing.
Building Microservices: Using an API Gateway
The first article in this series about microservices introduced the Microservice Architecture pattern. It discussed the benefits and drawbacks of using microservices and how, despite the complexity of microservices, they are usually the ideal choice for complex applications.
When you choose to build your application as a set of microservices, you need to decide how your application’s clients will interact with the microservices. With a monolithic application there is just one set of (typically replicated, load-balanced) endpoints. In a microservices architecture, however, each microservice exposes a set of what are typically fine-grained endpoints. In this article, we examine how this impacts client-to-application communication and proposes an approach that uses an API Gateway.
Building RoadRunner, a near real-time feedback loop
We use Hadoop/MapReduce batch jobs extensively to process content and activity streams. The home feed is a prime example where we employ such batch jobs to compute signals and features to create a personalized feed of interesting Pins for every Pinner. While this batch approach is effective and scalable, it’s not responsive to recent activity. RoadRunner, a stream compute infrastructure, was born to address the need for near real-time feedback cycle.
Python: Refactoring to Iterator
Over the last week I’ve been building a set of scripts to scrape the events from the Bayern Munich/Barcelona game and I’ve ended up with a few hundred lines of nested for statements, if statements and mutated lists. I thought it was about time I did a bit of refactoring.

Mobile

A/B Testing, a Great Way To Test The Android apps For Better Returns
A/B testing is a term used in Business Intelligence and marketing which implies a random experiment that is carried out with two variants A and B. A is said to be the control and B is the treatment. Control is usually the current state or the version of the system and B is the modification that is proposed. The base of the test is a hypothesis which is examined iteratively until the results are positive. When a change, based on some hypothesis is proposed to an existing system it might not generate value or returns in terms of profit. This does not necessarily mean that the hypothesis is wrong. For instance, if a certain section of an app is not able to convert the visitors into customers, there can be several reasons for it.

QA

Open-sourcing Facebook Infer: Identify bugs before you ship
Today, we’re open-sourcing Facebook Infer, a static program analyzer that Facebook uses to identify bugs before mobile code is shipped. Static analyzers are automated tools that spot bugs in source code by scanning programs without running them. They complement traditional dynamic testing: Where testing allows individual runs through a piece of software to be checked for correctness, static analysis allows multiple and sometimes even all flows to be checked at once. Facebook Infer uses mathematical logic to do symbolic reasoning about program execution, approximating some of the reasoning a human might do when looking at a program. We use Facebook Infer internally to analyze the main Facebook apps for Android and iOS (used by more than a billion people), Facebook Messenger, and Instagram, among others. At present, the analyzer reports problems caused by null pointer access and resource and memory leaks, which cause a large percentage of app crashes.
NTS: Real-time Streaming for Test Automation
Netflix members can enjoy instant access to TV shows & Movies on over 1400 different device/OS permutations. Assessing long-duration playback quality and delivering a great member experience on such a diverse set of playback devices presented a huge challenge to the team.

Security

Improving LXC template security
I’ve been getting involved with the Fedora Security Team lately and we’re working as a group to crush security bugs that affect Fedora, CentOS (via EPEL) and Red Hat Enterprise Linux (via EPEL). During some of this work, I stumbled upon a group of Red Hat Bugzilla tickets talking about LXC template security.
10 Tips to Improve Your Website Security
In recent years there has been a proliferation of great tools and services in the web development space. Content management systems (CMS) like WordPress, Joomla!, Drupal and so many other allow business owners to quickly and efficiently build their online presences. Their highly extensible architectures, rich plugin, module, extension ecosystem have made it easier than ever to get a website up and running without years of learning required.
Du bon usage des logs de sécurité
Connaitre l’état de la sécurité d’un système d’information requiert bien souvent la mise en place d’un outil de gestion de ses évènements. Depuis les règles de collecte des évènements de sécurité jusqu’à la mise en place d’un indicateur hebdomadaire de niveau de la sécurité, voici une méthode pour monter un tableau de bord qui pourra s’appliquer à une nouvelle infrastructure ou une infrastructure existante.

Databases

MySQL & MariaDB

Update on the InnoDB double-write buffer and EXT4 transactions
In a post, written a few months ago, I found that using EXT4 transactions with the « data=journal » mount option, improves the write performance significantly, by 55%, without putting data at risk. Many people commented on the post mentioning they were not able to reproduce the results and thus, I decided to further investigate in order to find out why my results were different.
Speed up GROUP BY queries with subselects in MySQL
We usually try to avoid subselects because sometimes they force the use of a temporary table and limits the use of indexes. But, when is good to use a subselect?
This example was tested over table a (1310723 rows), b, c and d ( 5 rows each) and with MySQL version 5.5 and 5.6.

Cassandra

Cassandra 2.2, 3.0, and beyond
As you know, we’ve split our post-2.1 release into two pieces, with 2.2 to be released in July (rc1 out Monday) and 3.0 in September.
2.2 will include Windows support, commitlog compression, JSON support, role-based authorization, bootstrap-aware leveled compaction, and user-defined functions.
3.0 will include a major storage engine rewrite and materialized views.

Management & organization

The Art of DevOps Part IV – The Operational Battlegrounds
In this 4 part blog series, I am exposing DevOps best practices using a metaphor inspired by the famous 6th century Chinese manuscript: « The Art of War ». It is worth reminding that Sun Tzu, just like me, considered war as a necessary evil, which must be avoided whenever possible. What are we fighting for here? Ultimately, we’re fighting for the absolute best services and features that we can deliver to our customers as quickly as we possibly can, and to eliminate the « War Room » scenario we are all so familiar with.
Tips for Writing for a Tech Audience
I’ve been writing articles and blog posts about web development and technology for a long time. The original version of this blog started in 2004, but by that time I’d already written a couple articles for the ultra-prestigious ColdFusion Developer’s Journal (it’s ok to feel jealous).
The most Controversial Concept in Agile Delivery – Estimating in Story Points
This blog post is another one of those that I should have written a while ago as the topic of story point based estimation keeps coming up again and again. To really understand why story point based estimation is important for Agile delivery, I think I need to explain the idea behind it.
How fast you can change?
In my talks, I often ask a trick question – what is the most important part in a bicycle and a formula one racing car?
I get all kinds of answers – wheels, engine, chasis, tyres, steering, even the driver…. No doubt, they are all right answers.
However, my favorite right answer is the brakes! Why? Because they make us go faster!
Let me explain.
Good Agile Metrics or Working Software
As agile coaches, we use and value metrics as an objective way to evaluate the strengths and weaknesses of teams that we are coaching. When we first engage with a new team, we conduct an agile assessment of the team’s capabilities that results in a baseline metric that pinpoints exactly where we should focus our transformation plan.