Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-31

La moisson de liens pour la semaine du 1er au 5 août 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

Should CDNs tighten up their security?
I was doing some work on securityheaders.io the other day and I noticed something about the CDN that I use for some of my assets. They didn’t use HSTS to enforce the use of HTTPS in compliant user agents, which I thought was a little odd.
Surveillance : le hamster qui mangeait des spaghetti
Surveillance, boîtes noires, sondes, IOL, chiffrement, métadonnées, les contenus et articles abondent sur ces sujets, mais à la lecture de certains commentaires, il apparaît que le fonctionnement du Web en particulier, et d’Internet en général, est assez mal compris, même superficiellement. Que se passe-t’il très concrètement lorsque je clique sur un lien? Quelles sont les données — ou, le cas échéant, métadonnées — qui seront visibles et par qui? Ces interrogations, qui paraîtront naïves à celles et ceux possédant une bonne littératie numérique, sont pourtant tout à fait légitimes et la réponse n’est pas si simple qu’on pourrait le croire.
Breaking through censorship barriers, even when Tor is blocked
While Tor Browser provides many security and privacy properties and features, not everyone around the world has the luxury to connect to use it. By default, Tor Browser makes all of its users look alike by spoofing UserAgent (and other methods) to avoid fingerprinting attacks. However, it doesn’t hide the fact you’re connecting to Tor, an open network where anyone can get the list of relays. This network transparency has many benefits, but also has a downside: Many repressive governments and authorities benefit from blocking their users from having free and open access to the internet. They can simply get the list of Tor relays and block them. This bars millions of people from access to free information, often including those who need it most. We at Tor care about freedom of access to information and strongly oppose censorship. This is why we’ve developed methods to connect to the network and bypass censorship. These methods are called Pluggable Transports (PTs).
Debian and Tor Services available as Onion Services
We, the Debian project and the Tor project are enabling Tor onion services for several of our sites. These sites can now be reached without leaving the Tor network, providing a new option for securely connecting to resources provided by Debian and Tor.
git-crypt - transparent file encryption in git
git-crypt enables transparent encryption and decryption of files in a git repository. Files which you choose to protect are encrypted when committed, and decrypted when checked out. git-crypt lets you freely share a repository containing a mix of public and private content. git-crypt gracefully degrades, so developers without the secret key can still clone and commit to a repository with encrypted files. This lets you store your secret material (such as keys or passwords) in the same repository as your code, without requiring you to lock down your entire repository.

System Engineering

Challenges of a Remotely Managed, On-Premises, Bare-Metal Kubernetes Cluster
The recently announced Platform9 Managed Kubernetes (PMK) is an on-premises enterprise Kubernetes solution with an unusual twist: while clusters run on a user’s internal hardware, their provisioning, monitoring, troubleshooting and overall life cycle is managed remotely from the Platform9 SaaS application. While users love the intuitive experience and ease of use of this deployment model, this approach poses interesting technical challenges. In this article, we will first describe the motivation and deployment architecture of PMK, and then present an overview of the technical challenges we faced and how our engineering team addressed them.
Introducing the p0f BPF compiler
Then we published a set of utilities we are using to generate the BPF rules for our production iptables: the bpftools.
Today we are very happy to open source another component of the bpftools: our p0f BPF compiler!
Kong, le gorille de l’API Management vu de près
Les entreprises proposant un service au travers d’une API et qui voient le nombre de consommateurs et de partenaires augmenter, sont confrontées à de nombreux challenges.
Deploying the Netflix API
As described in previous posts (« Embracing the Differences » and « Optimizing the Netflix API »), the Netflix API serves as an integration hub that connects our device UIs to a distributed network of data services. In supporting this ecosystem, the API needs to integrate an ever-evolving set of features from these services and expose them to devices. The faster these features can be delivered through the API, the faster they can get in front of customers and improve the user experience.
Welcome to the era of Container 2.0
It’s time to advance the discussion about what’s possible with containers. We need to move past all the Container 1.0 talk about how containers can revolutionize application development, and begin to focus on what comes next. Specifically, we need to focus on the game-changing benefits that containers will bring to how large enterprises manage their applications, as well as the datacenters and the clouds that power them.
Distributed Resource Scheduling with Apache Mesos
Netflix uses Apache Mesos to run a mix of batch, stream processing, and service style workloads. For over two years, we have seen an increased usage for a variety of use cases including real time anomaly detection, training and model building batch jobs, machine learning orchestration, and Node.js based microservices. The recent release of Apache Mesos 1.0 represents maturity of the technology that has evolved significantly since we first started to experiment with it.

Monitoring

Introducing Winston - Event driven Diagnostic and Remediation Platform
Netflix is a collection of micro services that all come together to enable the product you have come to love. Operating these micro services is also distributed across the owning teams and their engineers. We do not run a central operations team managing these individual services for availability. What we do instead is invest in tools that help Netflix engineers operate their services for high availability and resiliency. Today, we are going to talk about one such tool recently built for Netflix engineers - Winston
5 Monitoring Myths That Deserve to Be Busted
There’s no doubt that effective application performance management can be a daunting goal. As businesses have demanded increasingly complex tasks from their technology, the solutions required to keep their systems in top shape need to be ever more insightful and precise. While things like efficiency and uptime are the bottomline indicators of an app or database’s performance, there’s a variety of potential methods that can be used to achieve those ideals.
Hands on: Monitoring Kubernetes with Prometheus
Monitoring is one of the pillars of successful infrastructure. It has been called the base of the hierarchy of reliability. Monitoring is a must have for responding to incidents, detecting and debugging systemic problems, planning for the future, and generally understanding your infrastructure.
Pull doesn’t scale - or does it?
Let’s talk about a particularly persistent myth. Whenever there is a discussion about monitoring systems and Prometheus’s pull-based metrics collection approach comes up, someone inevitably chimes in about how a pull-based approach just « fundamentally doesn’t scale ». The given reasons are often vague or only apply to systems that are fundamentally different from Prometheus. In fact, having worked with pull-based monitoring at the largest scales, this claim runs counter to our own operational experience.
We already have an FAQ entry about why Prometheus chooses pull over push, but it does not focus specifically on scaling aspects. Let’s have a closer look at the usual misconceptions around this claim and analyze whether and how they would apply to Prometheus.

Software Engineering

System Profiling for Lazy Developers
Measuring latency within my code is something that I do very very often. Occasionally I resort to tools like profilers to help me out but, honestly, most of the time I just put timers in my code and print the results to the console or a log file.
3 Ways to Improve Your Code Testing
I am a big fan of Test Driven Development (TDD). I drank the kool-aid a while back and have not had a regret since. When I sling code, I am always writing a test, or writing against a test. That’s how I’m built.
I am of the firm belief that one of the best ways to ensure short term and long term code quality is to make sure there that your tests are exercising as much of the code base as possible, as often as possible. Thus, I am always looking for ways to improve my test coverage and test frequency. Again, the more code I can exercise when testing, the better it is.
Using feature flags to incorporate a release strategy into your development process
The « old way » of software releases is characterized by an explicit waterfall hand-off between teams – from Product (functional requirements) to Engineering (build and deploy). This old way did not explicitly promote release planning in the development process. The full release burden was shifted from one team to another without plans for a feedback loop or integrated release controls. Hence, it was difficult for teams to continuously deliver software, gather feedback metrics, and take full control over software rollouts and rollbacks.
Testing Multi-Threaded and Asynchronous Code
If you’ve been writing code long enough, or maybe even if you haven’t, chances are you’ve hit on a scenario where you want to test some multi-threaded code. The conventional wisdom is that threads and tests should not mix. Usually this works out fine because the thing that you really want to test just happens to run inside of a multi-threaded system and can be tested individually without the use of threads. But what if you can’t separate things out, or moreover, what if threading is the point of the code you’re testing?
I’m here to tell you that while threads in tests might not be the norm, they are ok. The software police will not arrest you for firing up a thread in a unit test, though how to actually go about testing multi-threaded code is another matter. Some excellent asynchronous technologies like Akka and Vert.x provide test kits to ease the burden. But outside of these, testing multi-threaded code usually requires a different approach than a typical synchronous unit test.
Let’s make CLI Tools less Shitty
I have recently come to the conclusion that the future of configuration management is not Ruby, Python, or a putrified YAML DSL but it is Bash and PowerShell. My thinking largely stems from the recent release of Habitat from Chef Inc. and several years of experience writing automation tools for Sysadmins with varying levels of sophistication. I have written tons of Chef cookbooks, Puppet manifests, and Ansible Playbooks over the last 7 years. My greatest frustration with all of these tools has been that it is very difficult for users not already immersed in those tools to reuse and modify their respective cookbook/manifest/playbook. I have written previously about how it might work if we gave CLI tools Restful APIs and this post builds on those ideas.
Google’s QUIC protocol: moving the web from TCP to UDP
The QUIC protocol (Quick UDP Internet Connections) is an entirely new protocol for the web developed on top of UDP instead of TCP.
Some are even (jokingly) calling it TCP/2.
I only learned about QUIC a few weeks ago while doing the curl & libcurl episode of the SysCast podcast.
The really interesting bit about the QUIC protocol is the move to UDP.
Being Pushy
I’ve spent a few days last week in Stockholm attending the HTTP Workshop, and taken part in many fascinating discussions. One of them revolved around HTTP push, its advantages, disadvantages and the results we see from early experiments on that front.
The general attitude towards push was skeptical, due to the not-so-great results presented from early deployments, so I’d like to share my slightly-more-optimistic opinion.

Mobile

Open Sourcing Test Butler
Automated testing is a key component to LinkedIn’s 3x3 strategy for releasing mobile applications. As we developed the new LinkedIn Android app, we found that our tests had a major problem: our testing environment was unreliable, so our tests failed intermittently. We needed a solution that would let us rely on our tests to inform us when there was a problem with the app, not the testing environment. For this reason, we created and open sourced Test Butler, a reliable Android testing tool. LinkedIn runs over one million tests each day using Test Butler and we believe that it can provide a benefit to anyone running Android tests.
Building a Native Video Player Library for Android
At LinkedIn, we recognize that video has become a popular medium for people to communicate and share information. We recently launched a feature where members can hear directly from Influencers on timely and thought-provoking topics through the rich experience of video. In this post, we will discuss some of the technical challenges involved in developing a shared video player library for Android. This library powers the video viewing experience on LinkedIn.

Databases Engineering

Redis

HyperLogLogs in Redis
A HyperLogLog is a probabilistic data structure used to count unique values — or as it’s referred to in mathematics: calculating the cardinality of a set.
These values can be anything: for example, IP addresses for the visitors of a website, search terms, or email addresses.

Elasticsearch

Brewing in Beats: Configure the Ingest Node Pipeline
Cassandrabeat uses Cassandra’s nodetool cfstats utility to monitor Cassandra database nodes and lag. Please give it a try and let us know what do you think.
Anatomy of an Elasticsearch Cluster: Part I
This post is part of a series covering the underlying architecture and prototyping examples with a popular distributed search engine, Elasticsearch. In this post, we’ll be discussing the underlying storage model and how CRUD (create, read, update and delete) operations work in Elasticsearch.
Elasticsearch is a very popular distributed search engine used at many companies like GitHub, SalesforceIQ, Netflix, etc. for full text search and analytical applications.

MySQL & MariaDB

gh-ost: GitHub’s online schema migration tool for MySQL
Today we are announcing the open source release of gh-ost: GitHub’s triggerless online schema migration tool for MySQL.
gh-ost has been developed at GitHub in recent months to answer a problem we faced with ongoing, continuous production changes requiring modifications to MySQL tables. gh-ost changes the existing online table migration paradigm by providing a low impact, controllable, auditable, operations friendly solution.
MySQL table migration is a well known problem, and has been addressed by online schema change tools since 2009. Growing, fast-paced products often require changes to database structure. Adding/changing/removing columns and indexes etc., are blocking operations with the default MySQL behavior. We conduct such schema changes multiple times per day and wish to minimize user facing impact.
Before illustrating gh-ost, let’s address the existing solutions and the reasoning for embarking on a new tool.

Cassandra

How to Setup a Highly Available Multi-AZ Cassandra Cluster on AWS EC2
Originally built by Facebook in 2009, Apache Cassandra is a free and open-source distributed database designed to handle large amounts of data across a large number of servers. At Stream, we use Cassandra as the primary data store for our feeds.

Vertica

How to Load New Data and Modify Existing Data Simultaneously
Many Vertica customers tell us « we have an OLTP workload » which is not Vertica’s architectural sweet spot. However, when we dig into what they are actually doing, it often turns out that they are simply bulk loading mostly new data with some small number of updates to existing rows. In Vertica 6, we have added support for the MERGE statement to allow users to do just that.

Data Engineering & Analytic

uMirrorMaker: Uber Engineering’s Robust Kafka Replicator
At Uber, we use Apache Kafka as a message bus for connecting different parts of the ecosystem. We collect system and application logs as well as event data from the rider and driver apps. Then we make this data available to a variety of downstream consumers via Kafka.
Vizceral Open Source
Previously we wrote about our traffic intuition tool, Flux. We have some announcements and updates to share about this project. First, we have renamed the project to Vizceral. More importantly, Vizceral is now open source!
Categorizing Posts on Tumblr
Millions of posts are published on Tumblr everyday. Understanding the topical structure of this massive collection of data is a fundamental step to connect users with the content they love, as well as to answer important philosophical questions, such as « cats vs. dogs: who rules on social networks? »
As first step in this direction, we recently developed a post-categorization workflow that aims at associating posts with broad-interest categories, where the list of categories is defined by Tumblr’s on-boarding topics.
Streaming MySQL tables in real-time to Kafka
This is the second post in a series covering Yelp’s real-time streaming data infrastructure. Our series will explore in-depth how we stream MySQL updates in real-time with an exactly-once guarantee, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into datastores like Redshift and Salesforce.

Network Engineering

FakeNet-NG: Next Generation Dynamic Network Analysis Tool
As a reverse engineer on the FLARE (FireEye Labs Advanced Reverse Engineering) team, I regularly perform basic dynamic analysis of malware samples. The goal is to quickly observe runtime characteristics by running binaries in a safe environment. One important task during dynamic analysis is to emulate the network environment and trick the malware into thinking it is connected to the Internet. When done right, the malware reveals its network signatures such as command and control (C2) domain names, User-Agent strings, URLs queried, and so on.
One tool of choice is FakeNet. In this blog, I will discuss a major overhaul to FakeNet and how it helps you perform basic malware dynamic analysis. Some of the new features include full support for Windows Vista and later operating systems, process logging, advanced process and host traffic filtering engine, support for third party tools (e.g. debuggers, HTTP proxies, etc.) and many others.
Further Analysis of RIPE Atlas Version 3 Probes
In recent weeks we’ve been contacted by a number of RIPE Atlas hosts who’ve had problems with their RIPE Atlas probes and suspected that something was wrong with the USB sticks in the version 3 (v3) probes. We starting investigating the issue and published some initial findings on RIPE Labs last month.
Initially there seemed to be a potential issue with some of the v3 probes’ USB sticks. We use the sticks to store both the operating system and the measurement data on the probes. (Note that we are looking into future hardware solutions that don’t rely on USB sticks for local storage.)
BGP Routing Tutorial Series, Part 3
So far in this series we’ve looked at a number of basic concepts about BGP, covering both who would want to use it and why. In particular we’ve learned that speaking or advertising BGP to your service providers and/or peers lets you do two things