Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-43

La moisson de liens pour la semaine du 24 au 28 octobre 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

Introducing DNS66, a host blocker for Android
I’m proud (yes, really) to announce DNS66, my host/ad blocker for Android 5.0 and newer. It’s been around since last Thursday on F-Droid, but it never really got a formal announcement.
Tweaking Referrers For Privacy in Firefox
The Referer header has been a part of the web for a long time. Websites rely on it for a few different purposes (e.g. analytics, ads, CSRF protection) but it can be quite problematic from a privacy perspective.
DNS Outage Was Doomsday for the Internet
What was supposed to be a quiet Friday suddenly turned into a real « Black Friday » for us (as well as most of the Internet) when Dyn suffered a major DDOS attack. From an internet disruption’s perspective, the widespread damage the outage caused made it the worst I have ever experienced.

System Engineering

Continuous MySQL backup validation: Restoring backups
Facebook’s MySQL databases are spread across our global data centers, and we need to be able to recover from an outage in any of these locations, at any given point in time. In such a disaster event, not only do we have to recover the service as quickly and reliably as possible, but we also need to ensure we don’t lose data in the process. To that end, we’ve built a system that continuously tests our ability to restore our databases from backups.
Building a Secure, Fast Network Fabric for Microservices Applications
This post is adapted from a presentation delivered at nginx.conf 2016 by Chris Stetson. You can view a recording of the presentation on YouTube.
A/B testing at the edge
When you make a change to a system — whether it’s design, code, or infrastructure — it’s important to know that you’re causing the desired outcome. You want to base your business decisions on data, and one of the best ways to get that data is with A/B testing.
Instant Messaging at LinkedIn: Scaling to Hundreds of Thousands of Persistent Connections on One Machine
We recently introduced Instant Messaging on LinkedIn, complete with typing indicators and read receipts. To make this happen, we needed a way to push data from the server to mobile and web clients over persistent connections instead of the traditional request-response paradigm that most modern applications are built on. In this post, we’ll describe the mechanisms we use to instantly send messages, typing indicators, and read receipts to clients as soon as they arrive. We’ll describe how we used the Play Framework and the Akka Actor Model to manage Server-sent events-based persistent connections. We’ll also provide insights into how we did load testing on our server to manage hundreds of thousands of concurrent persistent connections in production. Finally, we’ll share optimization techniques that we picked up along the way.
How We Architected and Run Kubernetes on OpenStack at Scale at Yahoo! JAPAN
This post outlines how Yahoo! JAPAN, with help from Google and Solinea, built an automation tool chain for « one-click » code deployment to Kubernetes running on OpenStack.
We’ll also cover the basic security, networking, storage, and performance needs to ensure production readiness.
Automatically update TLSA records on new Letsencrypt Certs
I’ve been using DNSSEC for some quite time now and it is working quite well. When LetsEncrypt went public beta I jumped on the train and migrated many services to LE-based TLS. However there was still one small problem with LE certs:

Monitoring

Introducing anomaly detection in Datadog
Some of the most valuable metrics to monitor are also the most variable. Application throughput, web requests, user logins… all of these important, top-level metrics tend to have pronounced peaks and valleys, depending on the time of day or the day of the week. Those fluctuations make it very hard to set sensible thresholds for alerting or investigation.
Using the Zabbix::Tiny to change an item interval based on a trigger.
A relatively common Zabbix feature request is to change the interval of a Zabbix item (how often the item is updated) based on the value of the item. This post will illustrate how to use a trigger to execute a Perl script to meet this goal.

Software Engineering

Introducing Go 2.0
Just so we’re clear, this post is a thought experiment, not any form of commitment to deliver Go 2.0 in any time frame. While I personally believe there will be a Go 2.0 in the future, I’m in no position to influence its creation; hence, this post is mere speculation.
Advanced Branching and Merging Strategies (Part 1 of 2)
In this two-part blog series, I will describe advanced branching and merging strategies for complex operational environments. These strategies are based on my personal experience at current and past clients with multiple projects and ongoing maintenance parallel to each other.

Android

Components for Android: A declarative framework for efficient UIs
Scrollable user interfaces are the dominant paradigm on mobile. If you’ve ever built an Android app, you’ve probably used RecyclerView to implement a scrollable list of items.
Building a list interface on Android is fairly simple: Just create a layout for the items, hook it up to a RecyclerView adapter, and you’re done. Most apps are a bit more complicated than that, though.

Databases Engineering

MySQL & MariaDB

Getting to Know MariaDB ColumnStore
With the recent announcement of MariaDB ColumnStore, we get many questions on the architecture and functionality of MariaDB ColumnStore. This blog post describes the architecture of MariaDB ColumnStore.
Does InnoDB page size matter?
From MariaDB 10.1 there is a feature where the InnoDB page size can be configured to be larger than the default 16K for normal, uncompressed tables. However, there has been little performance results that show whether the page size really effects the transaction performance or response time. In this blog, we study effects of page size on three different storage devices using the same benchmark(s). These devices are:
MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs), Part Three – hierarchies
Here is the third in a series of posts about CTEs, a new feature of MySQL 8.0, available in this Labs release. In the first post we had explored the new SQL syntax, and in the second we had applied it to generating series.
TokuDB and PerconaFT Data Files: Database File Management Part 2 of 2
This is the second installment of the blog series on TokuDB and PerconaFT data files. You can find my previous post here. In this post we will discuss some common file maintenance operations and how to safely execute these operations.

Vertica

Post-upgrade Tasks for Saving Catalog Space
When you upgrade to 7.2 and later, not only can you take advantage of the new features, you can perform tasks to save substantial space in your Vertica catalog.

Elasticsearch

Abusing an innocent Elasticsearch cluster for a mass reindex without disturbing your clients
I’ve found a new way to abuse a poor, defenceless Elasticsearch cluster. If you haven’t already, you might enjoy what we did last summer reindexing 36 billion documents in 5 days within the same cluster.

Data Engineering & Analytics

Building an efficient neural language model over a billion words
Neural networks designed for sequence predictions have recently gained renewed interested by achieving state-of-the-art performance across areas such as speech recognition, machine translation or language modeling. However, these models are quite computationally demanding, which in turn can limit their application.
Recurrent Neural Nets – The Third and Least Appreciated Leg of the AI Stool
We’‘ve paid a lot of attention lately to Convolutional Neural Nets (CNNs) as the cornerstone of 2nd gen NNs and spent some time on Spiking Neural Nets (SNNs) as the most likely path forward to 3rd gen, but we’d really be remiss if we didn’t stop to recognize Recurrent Neural Nets (RNNs). Why? Because RNNs are solid performers in the 2nd gen NN world and perform many tasks much better than CNNs. These include speech-to-text, language translation, and even automated captioning for images. By count, there are probably more applications for RNNs than for CNNs.

Network Engineering

The Trouble with NAT - Part 1
The Trouble with NAT - Part 1 This is a guest post by Mark Smith, based on a presentation he gave at the AusNOG 2016 conference on this topic. This is the first post of three, and in this article Mark will discuss the concept of Network Critical Success Factors (NCSFs) before getting into ‘the trouble’. https://labs.ripe.net/Members/mirjam/the-trouble-with-nat-part-1 https://labs.ripe.net/logo.png
The Trouble with NAT - Part 2
The Trouble with NAT - Part 2 This is the second post in a series on NATs contributed by Mark Smith, based on a presentation given at AusNOG 2016. In the first post, Mark discussed Network Critical Success Factors (NCSFs). In this post, he is going into details about the trouble with Network Address Translation (NAT). https://labs.ripe.net/Members/mirjam/the-trouble-with-nat-part-2 https://labs.ripe.net/logo.png
The Trouble with NAT - Part 3
This is the final post in a series on Network Address Translation (NAT), provided by Mark Smith. In this post, Mark discusses the fundamental constraints of NAT and addresses some FAQs about IPv6 without NAT.
How Cloudflare’s Architecture Allows Us to Scale to Stop the Largest Attacks
The last few weeks have seen several high-profile outages in legacy DNS and DDoS-mitigation services due to large scale attacks. Cloudflare’s customers have, understandably, asked how we are positioned to handle similar attacks.