Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-34

La moisson de liens pour la semaine du 22 au 26 août 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

The SWEET32 Issue, CVE-2016-2183
Today, Karthik Bhargavan and Gaetan Leurent from Inria have unveiled a new attack on Triple-DES, SWEET32, Birthday attacks on 64-bit block ciphers in TLS and OpenVPN. It has been assigned CVE-2016-2183.
This post gives a bit of background and describes what OpenSSL is doing. For more details, see their website.
Défendre le chiffrement
Le ministre de l’intérieur, Bernard Cazeneuve, a récemment annoncé qu’il souhaitait limiter l’accès au chiffrement, dans le cadre de la lutte contre le terrorisme. J’étais ce matin, à 6h20 (ouch!), l’invité de France Inter, pour expliquer pourquoi c’était une ânerie sans nom. Pour la même raison, je co-signais une tribune dans Le Monde, En s’attaquant au chiffrement contre le terrorisme, on se trompe de cible, avec mes anciens collègues du CNNum.
Just how much traffic can you generate using CSP?
The ability to send reports about violations of your CSP is a fantastic feature and allows you to monitor all kinds of issues on your site in real time. There are a few things that you need to consider about CSP reporting though and I’m going to cover them in this article.
ERNW Hardening Repository
Today we started publishing several of our hardening documents to a dedicated GitHub repository — and we’re quite excited about it! It took a while to develop a suitable markdown template to support all the requirements you have when you write a hardening guide, but we’re online now!

System Engineering

Containerized CI Solutions in AWS – Part 1: Jenkins in ECS
In this first post of a series exploring containerized CI solutions, I’m going to be addressing the CI tool with the largest market share in the space: Jenkins. Whether you’re already running Jenkins in a more traditional virtualized or bare metal environment, or if you’re using another CI tool entirely, I hope to show you how and why you might want to run your CI environment using Jenkins in Docker, particularly on Amazon EC2 Container Service (ECS). If I’ve done my job right and all goes well, you should have run a successful Jenkins build on ECS well within a half hour from now!
A Beginner’s Guide to the Dockerfile
The humble but powerful Dockerfile is the building block of Docker images and containers. In essence, it’s a list of commands the Docker engine runs to assemble the image, and thus instances of images as containers.
Let’s look at an example before learning to construct our own.
Hashing Infrastructures
Engineers in fast moving, medium to large scale infrastructures in the cloud are often faced with the challenge of bringing up systems in a repeatable, fast and scalable way. There are currently tools which aid engineers in accomplishing this task e.g. Convection, Terraform, Saltstack, Chef, Ansible, Docker. Once the system is brought up there is a maintenance challenge of continually deploying and destroying the resources. What if we can hash the inputs for describing an infrastructure, where the final hash acts as an entry point into the defined infrastructure? What if the input model was agnostic of underlying tools?
The Always On Architecture - Moving Beyond Legacy Disaster Recovery
Failover, switching to a redundant or standby system when a component fails, has a long and checkered history as a way of dealing with failure. The reason is your failover mechanism becomes a single point of failure that often fails just when it’s needed most. Having worked on a few telecom systems that used a failover strategy I know exactly how stressful failover events can be and how stupid you feel when your failover fails. If you have a double or triple fault in your system failover is exactly the time when it will happen.
The infrastructure behind Twitter: efficiency and optimization
In the past, we’ve published details about Finagle, Manhattan, and the summary of how we re-architected the site to be able to handle events like Castle in the Sky, the Super Bowl, 2014 World Cup, the global New Year’s Eve celebration, among others. In this infrastructure series, we’re focusing on the core infrastructure and components that run Twitter. We’re also going to focus each blog on efforts surrounding scalability, reliability, and efficiency in a way that highlights the history of our infrastructure, challenges we’ve faced, lessons learned, upgrades made, and where we’re heading.
Reducing Your Docker Image Size
When architecting Docker applications, keeping your images as lightweight as possible has a lot of practical benefits. It makes things faster, more portable, and less prone to breaks. Lightweight images also make it easier to use services like Jet, Codeship’s Docker CI/CD platform; they’re less likely to present complex problems that are hard to troubleshoot, and it takes less time to share them between builds.
GlusterFS, système de fichier réseau synchronisé & redondant
à Octopuce, nous gérons de plus en plus des infrastructures complexes, comprenant un grand nombre de machines virtuelles, pour permettre de garantir qu’un site web répondra même si une de ses VM est éteinte, soit pour maintenance, soit suite à une panne (ça arrive!)
Remplacer un disque RAID défectueux
Voici la procédure que j’ai suivi pour remplacer un disque RAID défectueux sur une machine Debian.
Basics of Backups
I’ve recently had some discussions about backups with people who aren’t computer experts, so I decided to blog about this for the benefit of everyone. Note that this post will deliberately avoid issues that require great knowledge of computers. I have written other posts that will benefit experts.

Monitoring

Because you can’t always blame network operations…or the network!
What’s your most memorable « blame the network » anecdote? If you’re in network operations, you will likely have many from which to to choose. After all, doesn’t the network always get blamed first? To be fair, other teams often feel the same. Citrix and VMware admins have alternately been touted as « the new network guy, » and as this Dilbert cartoon suggests, the stigma can impact almost anyone in IT.
If you’re on the app team, or represent the business/users, think of recent complaints that might have been worded to blame the network, even without a clear understanding of the actual cause. Surely the intent wasn’t malicious; blaming the network has just become a common way of verbalizing performance frustrations.
Developing Prometheus alerts for etcd
Prometheus is an open source monitoring and alerting system. Its powerful query language used to retrieve time series data can also be employed when defining alerts. Alerts actively notify users of irregular system conditions, sending messages to a variety of integrations such as Slack or PagerDuty.

Software Engineering

Processing Payments At Scale
Groupon recently announced gross billings of $1,492,882,000 for Q2 2016 — that’s about $17M our systems charged every single day this quarter. There is a lot of complexity associated with processing such volume, which we’re going to explore in this blog post.
Engineering Trade-Offs and The Netflix API Re-Architecture
Netflix’s engineering culture is predicated on Freedom & Responsibility, the idea that everyone (and every team) at Netflix is entrusted with a core responsibility. Within that framework they are free to operate with freedom to satisfy their mission. Accordingly, teams are generally responsible for all aspects of their systems, ranging from design, architecture, development, deployments, and operations. At the same time, it is inefficient to have all teams build everything that they need from scratch, given that there are often commonalities in the infrastructure needs of teams. We (like everyone else) value code reuse and consolidation where appropriate.
Dawn of the Dead Ends: Fixing a Memory Leak in Apache Kafka
At Heroku, we’re always working towards increased operational stability with the services we offer. As we recently launched the beta of Apache Kafka on Heroku, we’ve been running a number of clusters on behalf of our beta customers.
Over the course of the beta, we have thoroughly exercised Kafka through a wide range of cases, which is an important part of bringing a fast-moving open-source project to market as a managed service. This breadth of exposure led us to the discovery of a memory leak in Kafka, having a bit of an adventure debugging it, and then contributing a patch to the Apache Kafka community to fix it.
Undebt: How We Refactored 3 Million Lines of Code
Peter Seibel wrote that to maximize engineering effectiveness, « Let a 1,000 flowers bloom. Then rip 999 of them out by the roots. » Flowers, in how the metaphor applies to us, are code patterns—the myriad different functions, classes, styles, and idioms that developers use when writing code. At first, new flowers are welcome—maybe the new pattern seems easier to use, more scalable, more efficient, or more suited to some particular task than the old.
CircleCI Hacks - Automate the Decision to Skip Builds Using a Git Hook
Developers use CircleCI to build all sorts of projects. Many of those projects follow the typical one project per repo approach to code organization. Some include documentation, logs, vendor packages, deployment scripts, and more. Then there are the monorepos that companies like Google use. The added complexity that these repos bring leads to a different workflow for many of our customers.
Collected notes from Python packaging
Here are some collected notes on some particular problems from packaging Python stuff for Debian, and more are coming up like this in the future. Some of the issues discussed here might be rather simple and even benign for the experienced packager, but maybe this is be helpful for people coming across the same issues for the first time, wondering what’s going wrong. But some of the things discussed aren’t easy. Here are the notes for this posting, there is no particular order.

Databases Engineering

Elasticsearch

Monitoring the Search Queries
Ever wonder how your users are using your Elasticsearch cluster? Have you felt the need to investigate the queries sent to the Elasticsearch cluster by your users?
Using Packetbeat you can keep an eye on what comes and goes and avoid those nasty surprises your users may throw at your cluster.
An Arduino-Based Home Weather Station on the Elastic Stack
I’m far from a meteorologist. I’m a hacker with a garage/office that I spend way too much time in. I have a bias toward things that feel like data. A friend told me that I was maybe being a bit of a garage troll; I am tucked away from the sun and warmth. I decided that I needed to figure out if she was right.

MySQL & MariaDB

Beware of large MySQL max_sort_length parameter
Today we had a very interesting phenomena at a customer. He complained that MySQL always get some errors of the following type

Vertica

Introducing Vertica Troubleshooting Checklists
HPE Vertica is excited to announce the launch of our newest customer resource: Vertica Troubleshooting Checklists. Our checklists are modular, web-based resources the lead you through the process of diagnosing and fixing issues you may be experiencing.

Data Engineering

Text summarization with TensorFlow
Every day, people rely on a wide variety of sources to stay informed – from news stories to social media posts to search results. Being able to develop Machine Learning models that can automatically deliver accurate summaries of longer text can be useful for digesting such large amounts of information in a compressed form, and is a long-term goal of the Google Brain team.
New in Cloudera Enterprise 5.8: Flafka Improvements for Real-Time Data Ingest
Learn about the new Apache Flume and Apache Kafka integration (aka, « Flafka ») available in CDH 5.8 and its support for the new enterprise features in Kafka 0.9.
Over a year ago, we wrote about the integration of Flume and Kafka (Flafka) for data ingest into Apache Hadoop. Since then, Flafka has proven to be quite popular among CDH users, and we believe that popularity is based on the fact that in Kafka deployments, Flume is a logical choice for ingestion « glue » because it provides a simple deployment model for quickly integrating events into HDFS from Kafka.
Stream Processing Hard Problems Part II: Data Access
Before we dive into why data access is a hard problem in stream processing, here is some background information. At LinkedIn, we develop and use Apache Samza as our stream processing framework, Apache Kafka as our durable pub-sub messaging pipe, and Databus (and its next generation replacement) for capturing change events from our databases. Our streams infrastructure team gets feedback from application developers across the company (and from the open source community) on scalability, reliability, usability, and other problems that they encounter in their production applications. The learnings and techniques described in this post are in essence a summary of problems that we have faced so far and our efforts to address them. Although this post doesn’t require deep knowledge of Samza, a basic pre-read of Samza might help.
New in Cloudera Enterprise 5.8: SQL Editor and Other Productivity Improvements
Cloudera Enterprise 5.8 includes the latest release of Hue (3.10), the web UI that makes Apache Hadoop easier to use.
As part of Cloudera’s continuing investments in user experience and productivity, Cloudera Enterprise 5.8 includes a new release of Hue that makes several common tasks much easier. In the remainder of this post, we’ll provide a summary of the main improvements. (Hue 3.10 is also available for a quick try in one click on demo.gethue.com.)

Network Engineering

Robotron: Top-down network management at scale
Managing a healthy and sustainable network is difficult. However, little is understood about the networking management practices outside the network engineering community. We developed a state-of-the art system named Robotron to manage tens of thousands of network devices connecting hundreds of thousands of servers globally at Facebook. This week, we are presenting an overview of the system in the paper Robotron: Top-down Network Management at Facebook Scale at SIGCOMM 2016 in Florianópolis, Brazil.
IPv4 vs IPv6 Performance Comparison
IPv6 usage has been growing very slowly through the last 10 to 15 years. Since mid-2015 it started to pick up and increase adoption at a rapid pace. Google, for example, has been tracking their IPv6 usage since 2009 and it is beautiful to finally see some growth.
The future of the edge
CDNs are stuck. We’re doing the best that we can with with the current model CDNs use: we’re able to pass through writes and pull content from origin, which lets us cache static assets and content that changes frequently. What we can’t do is (effectively) cache responses that change on every request, that are different for every user, and that modify state at the origin. That is, we can’t do anything with writes. Where does that leave us?
In this post, I’ll explore « the future of the edge, » or the next logical step in how we streamline online experiences. In order to keep up with the direction things are headed, we need to combine logic and data at the edge. Logic without data, without state, is insufficient.
How to : routage avec multiple gateways
Lorsque l’on a plusieurs possibilités de sortie sur son firewall, il existe un moyen d’orienter le trafic vers l’une au l’autre des routes, en fonction de règles iptables / Netfilter. Une machine peut alors simultanément utiliser un VPN pour certaines connexions, et une connexion ADSL ou fibre pour d’autres par exemple. Ce processus est un peu plus complexe sur Linux que sur BSD : c’est ce premier cas que nous allons traiter. Tout au long de l’article, les lignes de commande sont précédées d’un dièse afin de vous éviter des copier-coller « dangereux ».
Follow-Up on CVE-2016-1409 – IPv6 NDP DoS Vulnerability
This is a guest post from Jed Kafetz.
After seeing Christopher’s post I decided to create a proof using GNS3 and Virtualbox. The aim is to perform the exact attacking using Antonios Atlasis’ Chiron tools and run a Wireshark packet capture to prove the hop limit drops below 255.

Management and Organization

Putting the ‘Ops’ Back in DevOps
What Agile means to your typical operations staff member is, « More junk coming faster that I will get blamed for when it breaks. » There always is tension between development and operations when something goes south. Developers are sure the code worked on their machine; therefore, if it does not work in some other environment, operations must have changed something that made it break. Operations sees the same code perform differently on the same machine with the same config, which means if something broke, the most recent change must have caused it … i.e. the code did it. The finger-pointing squabbles are epic (no pun intended). So how do we get Ops folks interested in DevOps without promising them only a quantum order of magnitude more problems—and delivered faster?
De la gestion de projet à la gestion de workflow
Il y a une chose qui me chiffonne lorsqu’on s’intéresse aux méthodologies agiles, telle qu’elles sont définies « by the book », ou telle qu’elles sont enseignées. On y décrit un fonctionnement à base de sprints de développement, qui s’enchaînent les uns après les autres, avec une durée de l’ordre de la quinzaine de jours. Même en décrivant la partie qui se situe en amont du développement (écriture de user stories, gestion du backlog), et celle qui se situe en aval (tests, validation, mise en production), tout reste centré sur le développement.
DevOps By The Numbers – 5 Metrics To Watch
Tim Buntel recently sat down with Alan Shimel of DevOps.com and explored DevOps by the Numbers. This discussion looked at how to approach the measurements and metrics of a Continuous Delivery transformation. Tim spoke on tough questions like « are we getting better at delivering high-quality software faster and at scale? » and « has all this effort been worth it?! » After listening to the entire discussion we compiled the top 5 DevOps metrics to watch: