Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-39

La moisson de liens pour la semaine du 26 au 30 septembre 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

Mitigating Logjam: Enforcing Stronger Diffie-Hellman Key Exchange
In response to recent developments attacking Diffie-Hellman key exchange (https://weakdh.org/) and to protect the privacy of Firefox users, we have increased the minimum key size for TLS handshakes using Diffie-Hellman key exchange to 1023 bits. A small number of servers are not configured to use strong enough keys. If a user attempts to connect to such a server, they will encounter the error « ssl_error_weak_server_ephemeral_dh_key ».
Sécurité des serveurs web avec TLS, petite toilette d’automne 2016
Résumé pour gens très pressés : ce n’est pas si difficile que cela en a l’air.
Résumé pour gens pressés : même sans être un gourou de la cryptographie, il est possible de sécuriser son site au niveau approximatif de l’état de l’art (du moment — ce n’est jamais une tâche définitive) en s’appuyant sur des sites de recommandations réalisés par des spécialistes.
Après avoir passé quelques heures à peaufiner ma configuration, je pense utile de partager ce que j’ai appris pour dispenser autour du moi un peu de bonheur artificiel par l’entremise de la sécurité cryptographique.
Reshaping web defenses with strict Content Security Policy
Cross-site scripting — the ability to inject undesired scripts into a trusted web application — has been one of the top web security vulnerabilities for over a decade. Just in the past 2 years Google has awarded researchers over $1.2 million for reporting XSS bugs in our applications via the Vulnerability Reward Program. Modern web technologies such as strict contextual auto-escaping help developers avoid mistakes which lead to XSS, and automated scanners can catch classes of vulnerabilities during the testing process. However, in complex applications bugs inevitably slip by, allowing attacks ranging from harmless pranks to malicious targeted exploits.

System Engineering

Introducing Google Container-VM Image
This spring, we announced Container-VM Image as a beta product under Google Cloud Platform (GCP). If you’re a developer interested in deploying your application or a service provider on Google Compute Engine, we recommend taking a few moments to understand how it can help you.
Stretch transition freeze in a month
It is the first of October and that means the transition freeze is roughly one month away (Nov 5th 2016). In other words, this is the « final boarding call for transitions ».
Deploy CoreOS with Ansible
CoreOS is a lightweight Linux operating system designed for clustered deployments providing automation, security, and scalability for your most critical applications. I’ve been playing with CoreOS to replace Debian hosts which run Docker containers on Nousmotards project. CoreOS helps on simplifying bare metal deployment and avoid managing an OS upgrade.
NGINX and NGINX Plus Deliver Responsive Images Without the Headaches
Responsive web design has become the norm for modern websites and web applications, providing a consistent experience across a wide variety of devices while also optimizing the display for each device. However, modern devices vary not only in terms of screen size but also pixel density. The HTML5 img tag provides a number of features that enable the browser to select the most appropriate asset if the server provides multiple variants. By deploying different sizes of the same image, the web browser can choose the size best suited to its current environment.
Eliminating Delays From systemd-journald, Part 2
In this second post of our series on making systemd’s journald more efficient, we take a close look at journald’s mmap() usage. Check out Part 1 of the journald performance post if you missed it, or need a refresher.
How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenters
If you are Uber and you need to store the location data that is sent out every 30 seconds by both driver and rider apps, what do you do? That’s a lot of real-time data that needs to be used in real-time.
Uber’s solution is comprehensive. They built their own system that runs Cassandra on top of Mesos. It’s all explained in a good talk by Abhishek Verma, Software Engineer at Uber: Cassandra on Mesos Across Multiple Datacenters at Uber.
API First Transformation at Etsy – Operations
This is the second post in a series of three about Etsy’s API, the abstract interface to our logic and data. The previous post is about concurrency in Etsy’s API infrastructure. This post covers the operational side of the API infrastructure.

Monitoring

So You’ve Been Paged: A Guide to Incident Response (For Those Who Hate Being Paged)
One of the inevitable joys of working in DevOps is « the page » — that dreaded notification from your alerting system that something has gone terribly wrong…and you’re the lucky person who gets to fix it.
Here at Scalyr, we’ve got a few decades of collective DevOps experience and we’ve all been on the receiving end of a page. Even though we do our best to avoid being woken up, it happens.
Application Tracing with Nginx
Variables are an important and sometimes overlooked aspect of NGINX configuration. With approximately 150 variables available, there are variables to enhance every part of your configuration. In this blog post we discuss how to use NGINX variables for application tracing and application performance management (APM), with a focus on uncovering performance bottlenecks in your application. This post applies to both the open source NGINX software and NGINX Plus. For brevity we’ll refer to NGINX Plus throughout except when there is a difference between the two products.
How to monitor Elasticsearch performance
This post is part 1 of a 4-part series about monitoring Elasticsearch performance. In this post, we’ll cover how Elasticsearch works, and explore the key metrics that you should monitor. Part 2 explains how to collect Elasticsearch performance metrics, part 3 describes how to monitor Elasticsearch with Datadog, and part 4 discusses how to solve five common Elasticsearch problems.
How to collect Elasticsearch metrics
This post is part 2 of a 4-part series about monitoring Elasticsearch performance. Part 1 provides an overview of Elasticsearch and its key performance metrics, Part 3 describes how to monitor Elasticsearch with Datadog, and Part 4 discusses how to solve five common Elasticsearch problems.
How to solve 5 Elasticsearch performance and scaling problems
This post is the final part of a 4-part series on monitoring Elasticsearch performance. Part 1 provides an overview of Elasticsearch and its key performance metrics, Part 2 explains how to collect these metrics, and Part 3 describes how to monitor Elasticsearch with Datadog.

Software Engineering

Brotli Compression
Following a few performance related blogs recently this one is following a similar trend and is going to look at Brotli compression. It was announced by Google in September 2015 and it claims to offer 20%-26% better compression than existing compression algorithms. That’s a pretty noteworthy improvement to compression which would bring a nice performance boost with it.
Image Compression with Neural Networks
Data compression is used nearly everywhere on the internet - the videos you watch online, the images you share, the music you listen to, even the blog you’re reading right now. Compression techniques make sharing the content you want quick and efficient. Without data compression, the time and bandwidth costs for getting the information you need, when you need it, would be exorbitant!
Pyflame: Uber Engineering’s Ptracing Profiler for Python
At Uber, we make an effort to write efficient backend services to keep our compute costs low. This becomes increasingly important as our business grows; seemingly small inefficiencies are greatly magnified at Uber’s scale. We’ve found flame graphs to be an effective tool for understanding the CPU and memory characteristics of our services, and we’ve used them to great effect with our Go and JavaScript services. In order to get high quality flame graphs for Python services, we wrote a high-performance profiler called Pyflame, implemented in C++. In this article, we explore design considerations and some unique implementation characteristics that make Pyflame a better alternative for profiling Python code.

Web performances

Caching Ghost with NginX
The performance of my site has always been a consideration, the faster your site, the better the browsing experience. I’ve made various changes in the past to try and boost performance but this is perhaps the best one so far. Setting up caching with NginX.

Databases Engineering

MySQL & MariaDB

Order from Chaos: Member Coordination in Group Replication
We are very excited about the next release of MySQL Group Replication 0.9.0 in MySQL 5.7.15 and the great work that has been done to improve its stability. Release after release, MySQL Group Replication becomes more stable and more user-friendly and has reached a maturity level that made us declare 0.9.0 a release candidate. One of the majors steps towards a better MySQL Group Replication was the adoption of a homegrown paxos-based protocol for the communication system. Although this landmark was already advertised when the 0.6.0 and 0.8.0 versions were released, we must provide more technical details on this matter.
MySQL 8.0: Making User Management DDLs Atomic
With MySQL 8.0, we are bringing in an important change in the way user management DDLs are executed.

Elasticsearch

Elasticsearch as a column store
If you have no idea what questions you will want to ask your data when you start ingesting it, columnar storage is probably a good option for you: it helps in two areas that are often close to the heart of users who deal with large amounts of data
A New Way To Ingest - Part 1
With the upcoming release of version 5.0 of the Elastic Stack, it is time we took a closer look at how to use one of the new features, Ingest Nodes.

Vertica

Introducing the HPE Connector for Apache Spark
In Vertica version 8.0.0, we added integration for Apache Spark through our HPE Vertica Connector for Apache Spark. This is a fast parallel connector that allows you to transfer data between Apache Spark and Vertica.

Data Engineering & Analytics

Why Not So Hadoop?
Does Big Data mean Hadoop? Not really, however when one thinks of the term Big Data, the first thing that comes to mind is Hadoop along with heaps of unstructured data. An exceptional lure for data scientists having the opportunity to work with large amounts data to train their models and businesses getting knowledge previously never imagined. But has it lived up to the hype? In this article, we will look at a brief history of Hadoop and see how it stands today.
Introducing sparklyr, an R Interface for Apache Spark
Over the past couple of years we’ve heard time and time again that people want a native dplyr interface to Spark, so we built one! sparklyr also provides interfaces to Spark’s distributed machine learning algorithms and much more.
Practical data science: Building Minimum Viable Models
When we talk about innovative services or products, many startups follow a smoother model of development. This allows them to minimize the risk to be able to have improvements when collecting capital to finance themselves. Once they found the market fit, the issue will be about the growth, to achieve a balance point.
Personalized Group Recommendations Are Here
There are two primary paradigms for the discovery of digital content. First is the search paradigm, in which the user is actively looking for specific content using search terms and filters (e.g., Google web search, Flickr image search, Yelp restaurant search, etc.). Second is a passive approach, in which the user browses content presented to them (e.g., NYTimes news, Flickr Explore, and Twitter trending topics). Personalization benefits both approaches by providing relevant content that is tailored to users’ tastes (e.g., Google News, Netflix homepage, LinkedIn job search, etc.). We believe personalization can improve the user experience at Flickr by guiding both new as well as more experienced members as they explore photography. Today, we’re excited to bring you personalized group recommendations.

Management & Organization

Don’t fire the underperformers (yet)
Soon or later, every company ends hiring underperformers. Often unnoticed in large corporations, they can be fatal to small businesses where everyone counts in large amount.
The main problem with underperformers is that they sometimes take months to detect. No one can join an existing company and go full steam on day one. You need to learn the company’s culture, the tools, how to work with your colleagues, and the job you’ve been hired for. In tech department, it takes up to 3 months before you realise you’ve hired an underperformer, in a sales team, sometimes more, depending on how long your sales cycle lasts.
DevOps: A recipe for successful continuous deployment
Traditional operations are well-defined. But DevOps is not a definitive experience. DevOps is organic communication among people. Emerging from the need for improved quality in development and accelerated innovation, DevOps has increased in popularity among software deployment teams. However transitioning to the cultural change is incumbent upon an organization.