Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-21

La moisson de liens pour la semaine du 23 au 27 mai 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

Hybrid RSA and ECDSA certificates with NginX
NginX version 1.11.0 just became available and that means we can now serve both RSA and ECDSA certificates for maximum performance without having to drop support for older clients.
Cloudera’s Process for Handling Security Vulnerabilities
In addition to expecting enterprise-class standards for stability and reliability, Cloudera’s customers also have expectations for industry-standard processes around the discovery, fix, and reporting of security issues. In this post, I will describe how Cloudera addresses such issues in our software.
Mid-2016 Tor bug retrospective, with lessons for future coding
Programs have bugs because developers make mistakes. Generally, when we discover a serious bug, we try to fix it as soon as we can and move on. But many groups have found it helpful to pause periodically and look for trends in the bugs they have discovered or fixed over the course of their projects. By finding trends, we can try to identify ways to develop our software better.
LinkedIn and Let’s Encrypt
Last night I was playing around with the LinkedIn REST API and quite by accident, I discovered something. If you have installed a Let’s Encrypt certificate on your site, LinkedIn will not read images included in your OpenGraph tags.
Sécuriser Secure Shell (SSH)
Vous avez peut-être entendu dire que la NSA serait capable de décrypter SSH, ou en tout cas, sous certaines conditions. Si ce n’est pas le cas, je vous conseille de jeter un œil aux récentes informations dévoilées par Edward Snowden. Toutes. Pas de panique, ce billet sera toujours là lorsque vous aurez terminé. Mon objectif ici avec cet article est de rendre les analystes de la NSA… tristes.

System Engineering

Python in production engineering
Python aficionados are often surprised to learn that Python has long been the language most commonly used by production engineers at Facebook and is the third most popular language at Facebook, behind Hack (our in-house dialect of PHP) and C++. Our engineers build and maintain thousands of Python libraries and binaries deployed across our entire infrastructure.
Open Sourcing Twitter Heron
Last year we announced the introduction of our new distributed stream computation system, Heron. Today we are excited to announce that we are open sourcing Heron under the permissive Apache v2.0 license. Heron is a proven, production-ready, real-time stream processing engine, which has been powering all of Twitter’s real-time analytics for over two years. Prior to Heron, we used Apache Storm, which we open sourced in 2011. Heron features a wide array of architectural improvements and is backward compatible with the Storm ecosystem for seamless adoption.
Separation of Concerns
This is the third post in our blog series about the design, implementation and usage of caching in datacenters.
The most important design decision we adopted in building the server is to separate performance-sensitive processing from the rest, and separate different types of performance-sensitive processing from each other. This is key to ensure deterministic runtime behavior facing a wide range of environments and workloads. As Dean et al have pointed out in the Tail at Scale, performance variability is amplified by scale, and the key to reduce (meaningful) variability is differentiation.
Autoscaling PaaSTA Services
One step in creating a service is to decide how many compute resources it needs. From the inception of PaaSTA, changing a service’s resource allocation has required manually editing and pushing new configs, and service authors had to pour over graphs and alerts to determine the proper resource allocation for a service whenever load requirements changed. This changed earlier this month when autoscaling was introduced into PaaSTA.
Hypernetes: Bringing Security and Multi-tenancy to Kubernetes
While many developers and security professionals are comfortable with Linux containers as an effective boundary, many users need a stronger degree of isolation, particularly for those running in a multi-tenant environment. Sadly, today, those users are forced to run their containers inside virtual machines, even one VM per container.
Catalog zones are coming to BIND 9.11
You know the drill: you provision a server with a master zone, and then you have to hop over to all secondary servers and add the slave zone to their configuration. You probably do that with some form of automation, or you use something slightly convoluted like what we’ve discussed previously in automatic provisioning of slave DNS servers. If your master and slave servers are BIND, you’re in luck: catalog zones will automate this for you within the BIND code itself: there’ll no longer be a need for « hacking » this to accomplish automatic provisioning of slave BIND servers.
I/O bursts with QEMU 2.6
QEMU 2.6 was released a few days ago. One new feature that I have been working on is the new way to configure I/O limits in disk drives to allow bursts and increase the responsiveness of the virtual machine. In this post I’ll try to explain how it works.
Xenomai 3 sur Raspberry Pi 2
Depuis plusieurs années l’installation de Xenomai sur un Raspberry Pi 1 se fait assez facilement, et les résultats en sont plutôt satisfaisants. Malheureusement l’installation sur un Raspberry Pi 2 ne fonctionnait pas. Le problème a été résolu depuis quelques mois par un patch de Mathieu Rondonneau qui permet d’utiliser la toute dernière version de Xenomai (3.0.2).

Monitoring

Hadoop & Spark monitoring with Datadog
Using Datadog you can now immediately start monitoring the four most widely-used technologies in the Hadoop ecosystem: HDFS, MapReduce, YARN, and Spark.
Monitoring made easy with Percona App for Grafana
Are you using Grafana 3.x with Prometheus’ time-series database? Now there is a « Percona App » available on Grafana.net! The app provides a set of dashboards for MySQL performance and system monitoring with Prometheus’ datasource, and make it easy for users install them. The dashboards rely on the alias label in the Prometheus config and depend on the small patch applied on Grafana.
Monitor third-party service statuses from StatusPage.io
The widespread use of external services has made tracking their availability a key part of maintaining infrastructure performance. StatusPage.io hosts status pages for many commonly used external services including Citrix, Twilio, Bitbucket, Travis CI, CircleCI, Codeship, Segment, Keen IO, KISSmetrics, and HipChat to communicate their availability status to customers and give updates during downtimes. StatusPage also provides the capability to follow the pages of any third-party service you may use within your environment.
Open Sourcing Kafka Monitor
Apache Kafka has become a standard messaging system for large-scale, streaming data. In companies like LinkedIn it is used as the backbone for various data pipelines and powers a variety of mission-critical services. It has become a core component of a company’s infrastructure that should be extremely robust, fault-tolerant and performant.
Collecting MongoDB Metrics and Statistics
If you’ve already read our guide to key MongoDB metrics in Part 1 of this series, you’ve seen that MongoDB provides a vast array of metrics on performance and resource utilization. This post covers the different options for collecting MongoDB metrics in order to monitor them.
Monitoring MongoDB Performance Metrics (MMAP)
If you are using the WiredTiger storage engine, which was introduced with MongoDB 3.0 and is now the default storage engine, visit the companion article « Monitoring MongoDB performance metrics (WiredTiger) ».
Monitoring MongoDB Performance Metrics (WiredTiger)
If you are using the MMAPv1 storage engine, visit the companion article « Monitoring MongoDB performance metrics (MMAP) ».
8 Ways to Reduce Alert Fatigue
Silo’d responsibilities have wreaked havoc on team communications, making it difficult for different departments to have the full context of a situation during fire fights. This has not only reduced the quality of communication across entire development teams, it’s also created a serious issue that plagues many on the operations side — alert fatigue. Alert fatigue is not just an issue of unhappy team members — it impacts the software delivery chain’s ability to grow.

Software Engineering

Why Microservices Should Be Event Driven: Autonomy vs Authority
I’ve been working on a series of articles showing how to build microservices using an event-driven approach (which IMHO is the only real way to build microservices :) or… any complex distributed architecture). I’ll explore DDD, CQRS, Event-sourcing, even streaming, complex-event processing and more. I’m using a reference monolith application based on Java EE that uses all the typical Java EE technology and dives deep into what makes it tick, what drawbacks it has, and how to evolve it to a microservices architecture. I’ll show implementation details all the way from containers (Docker, Kubernetes) to the JVM layer (Spring Boot and WildFly Swarm) to the application architecture (events, commands, streaming, raw events, aggregates, aggregate roots, transactions, CQRS, etc). Hopefully it will be ready for my Red Hat Summit talk in San Francisco in June! Follow me on twitter @christianposta for updates on this project.
Kinds of Static analysis tools
I like clean code. I like code where is visible quality with the first touch of code. And I like predictable codebases which look as written by one developer. And I like static analysis because it significantly helps to achieve all the mentioned things. In my last article I wrote about mistakes in using static analysis in software projects. Now I would like to go a step forward and give advice how to choose the right tool(s). I would say that even with a subset of these tools you will have enormous benefits as fewer bugs, better maintainability, cleaner and simpler code. Let’s start with a theory and motivation.
Babby’s First Hack Day: Fast Queue
Technically this was not my first hack day, but this was the first hackday where I attempted to work on a project on my own. I am a QA Engineer, not a mobile developer, so my experience in making iOS apps is pretty light. I have been working with Swift for automation purposes, and for practicing coding on my own, and this was the first hack day since I started coding in earnest where I could put what I learned to the test.
Application data caching using SSDs
With the global expansion of Netflix earlier this year came the global expansion of data. After the Active-Active project and now with the N+1 architecture, the latest personalization data needs to be everywhere at all times to serve any member from any region. Caching plays a critical role in the persistence story for member personalization as detailed in this earlier blog post.
An Introduction to the Docker Trusted Registry
Many of us start our Docker journey pulling images from the Docker Hub with the time-honored docker pull command.
Lots of these images are « official » and have passed through Docker’s series of best practice and security checks. But the Docker Hub is also full of unofficial images that are unreliable in reliability and security.
Kafkaesque Days at LinkedIn – Part 1
Apache Kafka is the backbone for various data pipelines and asynchronous messaging at LinkedIn and beyond. In fact, we were excited to learn at the Kafka Summit last month that a number of other big adopters like Netflix and Microsoft have hit similar levels of scale in terms of message volume and deployment sizes, firmly establishing Kafka in the Four Comma Club.
Going deeper with Project Infinite
Last month at Dropbox Open London, we unveiled a new technology preview: Project Infinite. Project Infinite is designed to enable you to access all of the content in your Dropbox—no matter how small the hard disk on your machine or how much stuff you have in your Dropbox. Today, we’d like to tell you more—from a technical perspective—about what this evolution means for the Dropbox desktop client.
Using the Pipeline Plugin to Accelerate Continuous Delivery – Part 1
Jenkins is a powerful, open source automation tool with an impressive plugin architecture that helps development teams automate their software lifecycle. Jenkins is used to power many industry-leading companies’ software development pipelines. Jenkins Pipeline is a powerful, first-class feature for managing complex, multi-step pipelines. Jenkins Pipeline, a set of open source plugins and integrations, brings the power of Jenkins and the plugin ecosystem into a scriptable Domain Specific Language (DSL). Best of all, like Jenkins core, Pipeline is extensible by third-party developers, supporting custom extensions to the Pipeline DSL and various options for plugin integration.
What Tools Do You Need for Continuous Delivery?
Software organizations strive to deliver good quality software to their customers based on the need and market requirements. However, the business needs are not static and they change continuously based on the changing market requirements. Organizations also know that software is difficult to ship, and all the development activities can be completed for the software (requirement, design, code, build, test), but when it comes to deployment, there are frequent issues and consequently, organizations take time to deploy the software in the production environment.
Feature Flagging for Back-End & Microservices
It helps, in general, to be able to rollout to a small set of canary users and see if things go wrong. You can gather feedback from performance metrics or error logs. You can give a smaller customer access to the new feature or even a group of customers who know they are testing out something bleeding edge. Then you can make changes and react to the feedback and then roll it out to all users or roll it back. This helps mitigate risk and lets everyone sleep better at night.
The Joel Test: 12 Steps to Better Code
Have you ever heard of SEMA? It’s a fairly esoteric system for measuring how good a software team is. No, wait! Don’t follow that link! It will take you about six years just to understand that stuff. So I’ve come up with my own, highly irresponsible, sloppy test to rate the quality of a software team. The great part about it is that it takes about 3 minutes. With all the time you save, you can go to medical school.

Web performance

Tracking Multi-CDN Performance Issues to DNS
A multi-CDN service that combines multiple CDN providers into a single network is a common and effective way to speed up your web applications for users anywhere in the world. This strategy can also boost failover support in case one of the CDNs you’re using goes down.

Databases Engineering

Inside the Algolia engine part 3—query processing
Search engines and query processing are not recent in Computer Science: this field known as Information Retrieval has a pretty vast set of state of the art practices. Today most search engines on the market come with a large set of features that developers can use to create their query processing pipeline, but this task is far more difficult than it seems and most people never manage to achieve a good result. In this post, we will cover the classical approaches and how we are handling it in the Algolia engine.

Elasticsearch

Ingest Node: A Client’s Perspective
With the first alpha release of Elasticsearch 5.0 comes a ton of new and awesome features, and if you’ve been paying attention then you know that one of the more prominent of these features is the new shiny ingest node. Simply put, ingest aims to provide a lightweight solution for pre-processing and enriching documents within Elasticsearch itself before they are indexed.

MySQL & MariaDB

Asynchronous Query Execution with MySQL 5.7 X Plugin
MySQL 5.7 supports X Plugin / X Protocol, which allows (if the library supports it) asynchronous query execution. In 2014, I published a blog on how to increase a slow query performance with the parallel query execution. There, I created a prototype in the bash shell. Here, I’ve tried a similar idea with NodeJS + mysqlx library (which uses MySQL X Plugin).
Looking inside the MySQL 5.7 document store
In this blog, we’ll look at the MySQL 5.7 document store feature, and how it is implemented.
pt-online-schema-change (if misused) can’t save the day
Altering large tables can be still a problematic DBA task, even now after we’ve improved Online DDL features in MySQL 5.6 and 5.7. Some ALTER types are still not online, or sometimes just too expensive to execute on busy production master.

Data Engineering & Analytic

New in Cloudera Labs: Envelope for Apache Spark Streaming
Spark Streaming is the go-to engine for stream processing in the Cloudera stack. It allows developers to build stream data pipelines that harness the rich Spark API for parallel processing, expressive transformations, fault tolerance, and exactly-once processing. But it requires a programmer to write code, and a lot of it is very repetitive!
Nova: The Architecture for Understanding User Behavior
Amplitude has grown significantly both as a product and in data volume since our last blog post on the architecture, and we’ve had to rethink quite a few things since then (a good problem to have!). About six months ago, we realized that old Wave architecture was not going to be effective long-term, and started planning for the next iteration. As we continued to push the boundary of behavioral analytics, we gained more understanding of what we needed from a data storage and query perspective in order to continue advancing the product.
Apache HBase is Everywhere
HBase adds the ability to do low-latency random read/write across your big data. While it is a key piece of the Apache Hadoop ecosystem, HBase itself has an ecosystem of projects and products that use it as a storage engine for systems such as time series database (OpenTSDB), or SQL-style databases (Apache Phoenix, Apache Trafodion [incubating], Splice Machine). Partner companies such as Cask Data have HBase at the core of their offerings. Look for it as an alternative, more performant metadata store in future releases of Apache Hive. And Hadoop YARN is about to merge a scalable application timeline service based on HBase.
Why Apache Beam?
In this post, we would like to shed some light upon Apache Beam, the new Apache Incubator project that Google initiated with us and other partners. We would like to highlight our involvement in Beam and how we see the relationship between Beam and Flink developing in the future. See also Google’s perspective on how Beam and Flink relate.

Network Engineering

Networking @Scale, May 2016 — Recap
Last year, we held our first Networking @Scale, our invitation-only, one-day technical conference for engineers working on large-scale networking solutions. We received a tremendous amount of interest and positive feedback. So, this year, we decided to go even bigger with a two-day event. We hosted this second Networking @Scale on May 10 and 11 and had speakers from Akamai, AT&T, Comcast, Facebook, Google, the Jet Propulsion Laboratory (JPL), Microsoft, and Netflix. This year’s event reinforced the incredible range of network challenges we all face as a community.

Management & Organization

Working with a distributed team: How?
Agile recommends « Individuals and interactions over processes and tools ». Scrum has pretty formal ways of organizing development cycles. And both promote the need to « At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. »
Le créosote, ce manager performant qui détruit votre entreprise
Je continue à étudier les facteurs de déclin des entreprises. Après avoir évoqué le silence imposé aux employés et les talents qui n’en sont pas, regardons aujourd’hui le manager créosote, celui qui tue tout autour de lui pour s’épanouir. Le créosote peuple à peu près toutes les entreprises que je rencontre et qui ont tant de mal à innover. N’y aurait-il donc pas un lien de cause à effet?
Qu’a appris Google de sa quête à bâtir l’équipe parfaite?
Le journaliste et essayiste Charles Duhigg (@cduhigg), auteur notamment Du pouvoir des habitudes et du récent Smarter, Faster, Better, livre au New York Times Magazine un passionnant reportage sur le management chez Google (un sujet que nous avons déjà plusieurs fois abordé : ici et là notamment).