Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-42

La moisson de liens pour la semaine du 17 au 21 octobre 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

Introducing Internationalized Domain Name (IDN) Support
Let’s Encrypt is pleased to introduce support for issuing certificates that contain Internationalized Domain Names (IDNs). This means that our users around the world can now get free Let’s Encrypt certificates for domains containing characters outside of the ASCII set, which is built primarily for the English language.
Security bug lifetime
In several of my recent presentations, I’ve discussed the lifetime of security flaws in the Linux kernel. Jon Corbet did an analysis in 2010, and found that security bugs appeared to have roughly a 5 year lifetime. As in, the flaw gets introduced in a Linux release, and then goes unnoticed by upstream developers until another release 5 years later, on average. I updated this research for 2011 through 2016, and used the Ubuntu Security Team’s CVE Tracker to assist in the process. The Ubuntu kernel team already does the hard work of trying to identify when flaws were introduced in the kernel, so I didn’t have to re-do this for the 557 kernel CVEs since 2011.
Security Assessment of VeraCrypt: fixes and evolutions from TrueCrypt
VeraCrypt is a disk encryption software developed by IDRIX. It is derived from the now defunct TrueCrypt project. This audit has been carried out at the request of the Open Source Technology Improvement Fund. Its goal was to evaluate the security of the features brought by VeraCrypt since the publication of the audit results on TrueCrypt 7.1a conducted by the Open Crypto Audit Project. The full text of the report is made available on this blogpost.
Bloquer les publicités et traqueurs au niveau du DNS avec Unbound
Hormis les plugins bloqueurs de publicités comme Ublock Origin au niveau du navigateur, il est aussi possible d’agir au niveau DNS. Je pense que les 2 méthodes sont complémentaires. Le blocage au niveau du DNS permet d’agir sur tout les ordinateurs l’utilisant (par exemple tous les ordinateurs de votre réseau local). C’est à dire que le blocage va fonctionner aussi sur les iMachines, iTéléphones et Androphones plutot réticents à ce que vous bloquiez les publicités qui rapportent des brouzoufs à leur éditeurs.

System Engineering

Varnish Explained
A few months ago, I gave a presentation at LaraconEU in Amsterdam titled « Varnish for PHP developers ». The generic title of that presentation is actually Varnish Explained and this is a write-up of that presentation, the video and the slides.
octocatalog-diff: GitHub’s Puppet development and testing tool
GitHub uses Puppet to configure the infrastructure that powers GitHub.com, comprised of hundreds of roles deployed on thousands of nodes. Each change to Puppet code must be validated to ensure not only that it serves the intended purpose for the role at hand, but also to avoid causing unexpected side effects on other roles. GitHub employs automated Continuous Integration testing and manual deployment testing for Puppet code changes, but it can be time-consuming to complete the manual deployment testing across hundreds of roles.
Three Things to Consider When Thinking About Containers
Containers like Docker and Rocket are getting more popular every day. In my conversations with customers, they consistently ask what containers are and how they can use them in their environment. If you’re as curious as most people, read on. . .
Netflix Chaos Monkey Upgraded
Years ago, we decided to improve the resiliency of our microservice architecture. At our scale it is guaranteed that servers on our cloud platform will sometimes suddenly fail or disappear without warning. If we don’t have proper redundancy and automation, these disappearing servers could cause service problems.
How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop
HDFS now includes (shipping in CDH 5.8.2 and later) a comprehensive storage capacity-management approach for moving data across nodes.
In HDFS, the DataNode spreads the data blocks into local filesystem directories, which can be specified using dfs.datanode.data.dir in hdfs-site.xml. In a typical installation, each directory, called a volume in HDFS terminology, is on a different device (for example, on separate HDD and SSD).
Stratégie de placement de conteneurs Docker (partie 2)
Second volet de notre étude sur les orchestrateurs de nœuds Docker, après l’étude des placements des conteneurs sur les nœuds, abordons désormais les possibilités d’anti-affinité offertes par nos chers candidats Fleet, Nomad, Swarm et Kubernetes.
Profitons également de l’occasion pour nous offrir une petite mise à jour des versions de nos belligérants. Le changement majeur est du côté de Docker Inc. puisque désormais Swarm est directement embarqué dans le moteur de conteneurs depuis la version 1.12.
Streaming Messages from Kafka into Redshift in near Real-Time
This is the sixth post in a series covering Yelp’s real-time streaming data infrastructure. Our series explores in-depth how we stream MySQL updates in real-time with an exactly-once guarantee, how we automatically track & migrate schemas, how we process and transform streams, and finally how we connect all of this into datastores like Redshift and Salesforce.
Best Practice for HTTP2 Front-end deployments – Part two
In the first article, we did a quick overview of some of the performance improvements in HTTP/2 and the implications this will have on the way we deploy assets to production. This post will delve a little deeper and attempt to establish some initial guidelines and workflow that could be used when deploying front-end components to HTTP/2 sites.

Monitoring

The Myth of the Root Cause: How Complex Web Systems Fail
Distributed web-based systems are inherently complex. They’re composed of many moving parts — web servers, databases, load balancers, CDNs, and many more — working together to form an intricate whole. This complexity inevitably leads to failure. Understanding how this failure happens (and how we can prevent it) is at the core of our job as operations engineers.
Looking back at the Zabbix Conference 2016, day 1
It’s been a few weeks since the Zabbix Conference 2016. If you are considering attending next year, you might want to know – how was it? In one word, great. But that doesn’t tell much, so let’s briefly explore how it went.
Looking back at the Zabbix Conference 2016, day 2
The second day of the Zabbix conference started with workshops. This was a completely new thing, thus there was limited experience with organising these.

Software Engineering

Future Tidal Wave of Mobile Video
In this article I will examine the growing trends of Internet Mobile video and how consumer behaviour is rapidly adopting to a world of ‘always on content’ and discuss the impact on the underlying infrastructure. This is very important because we often assume that the Internet has infinite capacity and we can get frustrated by buffering, which wouldn’t happen with our satellite and terrestrial TV services.
Client-side ranking to more efficiently show people stories in feed
Our mission with News Feed is to connect people with the stories that matter most to them. For people on slower internet connections, we are focusing on ways to more efficiently rank and render relevant stories without having to wait on the network to return results.
A comparison of state-of-the-art graph processing systems
Large-scale graph processing is one of many important parts of the Data Infrastructure backend services at Facebook. The need to analyze graphs arises naturally in a wide variety of use cases, including page and group recommendations, infrastructure optimization through intelligent data placement, graph compression, and others. The Facebook social graph alone has 1.71 billion vertices with hundreds of billions of edges. Augmenting this graph with more entities, such as the pages that people like, may result in graphs with over 1 trillion edges.
Improving the Responsiveness of the Document Detector
In our previous blog posts (Part 1, Part 2), we presented an overview of various parts of Dropbox’s document scanner, which helps users digitize their physical documents by automatically detecting them from photos and enhancing them. In this post, we will delve into the problem of maintaining a real-time frame rate in the document scanner even in the presence of camera movement, and share some lessons learned.

Android

Android accessibility debugging with Stetho
Android has a powerful built-in accessibility system that allows people to use applications through an alternative interaction mode called « focus navigation. » Rather than directly touching the screen to activate an element, focus navigation allows people who use accessibility services such as screen readers, physical switch devices, refreshable Braille displays, or voice control to focus on and activate different elements of an interface.
Approaching Outside-in TDD on Android III
In the previous post, we wrote the acceptance test as a first step and started creating the classes on the entry points of our system. In this post, we will finish implementing the system, and will summarize what we have learnt during the process.

Databases Engineering

Datanet: a New CRDT Database that Let’s You Do Bad Bad Things to Distributed Data
If you’re using your CAP decoder ring you know what’s next…what databases do we have that target making concurrency a first class feature? That promise to thrive and continue to function when network partitions occur?
No many, but we have a brand new concurrency oriented database: Datanet - a P2P replication system that utilizes CRDT algorithms to allow multiple concurrent actors to modify data and then automatically & sensibly resolve modification conflicts.

MySQL & MariaDB

MySQL 8.0: Descending Indexes Can Speed Up Your Queries
The future MySQL 8.0 will (probably) have a great new feature: support for index sort order on disk (i.e., indexes can be physically sorted in descending order). In the MySQL 8.0 Labs release (new optimizer preview), when you create an index you can specify the order « asc » or « desc », and it will be supported (for B-Tree indexes). That can be especially helpful for queries like « SELECT … ORDER BY event_date DESC, name ASC LIMIT 10″ (ORDER BY clause with ASC and DESC sort).
Query Classification and Pluggable Parser
From the very beginning, a central feature of MariaDB MaxScale has been its ability to understand the SQL statements that flow through it. For that purpose, MaxScale is equipped with an SQL parser for parsing statements. However, the parser is not explicitly made available inside MaxScale, but the information it provides is accessed via the query classifier component.
Before looking at what functionality the query classifier provides, let’s make a short historical detour to see how the parser’s placement within MaxScale has changed over time.
Evolving MySQL Compression - Part 1
Pinterest Infrastructure engineers are the caretakers of more than 75 billion Pins–dynamic objects in an ever-growing database of people’s interests, ideas and intentions. A Pin is stored as a 1.2 KB JSON blob in sharded MySQL databases. A few years back, as we were growing quickly, we were running out of space on our sharded MySQL databases and had to make a change. One option was to scale up hardware (and our spend). The other option–which we chose–was using MySQL InnoDB page compression. This cost a bit of latency but saved disk space. However, we thought we could do better. As a result, we created a new form of MySQL compression which is now available to users of Percona MySQL Server 5.6.
MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL (CTEs), Part Two – how to generate series
Here is the second in a series of posts about CTEs, a new feature of MySQL 8.0, available in this Labs release.

Data Engineering & Analytics

How to Create Value From Raw Web Logs With Machine Learning
Almost every action we do on the Internet or on mobile applications is recorded in files known as web logs. These logs can be very voluminous, providing a classic example of Big Data.
Data science and Machine Learning algorithms can provide a way of extricating value from web logs. At the OVH Summit on the 11th of October, I presented a workshop on getting value out of web logs through Machine Learning with Dataiku DSS. In this article, I will run through the aspects of that presentation.
Building an Algorithm to Break Strong Encryption
Here I discuss breaking encryption keys that rely on the product of two very large prime numbers. In other words, the interest here is to factor a number (representing a key in some encryption system) that is the product of two very large primes. Once the number is factored, the key is compromised. Factoring such large numbers is believed to be computationally non-feasible, thus the interest in discovering new algorithms to disprove this conjecture, and specifically to factor large numbers (product of two large primes - the most difficult numbers to factor) much faster than with the current algorithms. As an important side note, I will discuss the randomness (or lack of) of the byproduct time series involved, and show why they are unsuitable to generate random deviates, despite satisfying several tests of randomness. This feature (lack of randomness) can further be exploited to develop more potent factoring algorithms.
Tricks in Face Recognition
Last year I started developing a Face Recognition model. I started with static pictures and using Wolfram Mathematica. This year I found out we can do the same job using OpenCV in Python, or creating specific filters in R and applying Weierstrass and Gaussian transformation.

Network Engineering

IPv6 and the DNS
The exhortations about the Internet’s prolonged transition to version 6 of the Internet Protocol continue, although after some two decades the intensity of the rhetoric has faded and, possibly surprisingly, it has been replaced by action in some notable parts of the Internet. But how do we know there is action? How can we tell whether, and where, IPv6 is being deployed in today’s Internet?
Orange Blacklisting: A Case for Measuring Censorship
On 17 October 2016, a website hosted by the French Ministry of the Interior went offline when a large number of customers of the Internet service provider, Orange, were redirected to the site. The problem occurred after Google, Wikipedia and cloud provider OVH were mistakenly placed on a terrorism block list.
Google détourné par Orange vers la place Beauveau
Aujourd’hui, bien des clients d’Orange ont eu la mauvaise surprise de ne pas pouvoir visiter Google. La plupart n’avaient pas de messages d’erreur précis, juste une longue attente et un message d’erreur vague du genre « timeout ». Certains avaient la désagréable surprise de voir apparaitre une page menaçante, les accusant d’avoir voulu se connecter à un site Web terroriste. À l’origine de ce problème, une erreur de configuration dans les résolveurs DNS d’Orange, en raison de la fonction de censure administrative du Web.