Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-23

La moisson de liens pour la semaine du 6 au 10 juin 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

Surfez Zen : les recommandations de l’ANSSI en image
Vous avez surement déjà été exposé à des risques sur Internet sans même en avoir conscience. Cette nouvelle infographie à mettre entre toutes les mains retrace les principales menaces pour les utilisateurs, mais indique surtout les conseils et les bons réflexes pour s’en prémunir. Avec ces judicieux conseils, une prudence accrue et une vigilance constante, vous disposez de tous les atouts pour « surfer » en toute sérénité.
U2F with Yubikeys
During our recent hackday we wanted to explore new ways to login to Tumblr and play with some cool toys. The following is not an announcement of any kind, other than that U2F is awesome and everyone should buy a Yubikey (they aren’t paying us to say this, we swear).
How to Setup User Security on Jenkins with Project Matrix Authorization
Jenkins is an open source automation server, which will help you to build, deploy and automate your enterprise application.
In Jenkins, after the install, it will launch a setup wizard and walk you through the initial security setup.
But, if you like to create user accounts and restrict their privileges, you need to setup appropriate Jenkins security authorization.

System Engineering

Cleaning up obsolete config files on Debian and Ubuntu
As part of regular operating system hygiene, I run a cron job which updates package metadata and looks for obsolete packages and configuration files.
While there is already some easily available information on how to purge unneeded or obsolete packages and how to clean up config files properly in maintainer scripts, the guidance on how to delete obsolete config files is not easy to find and somewhat incomplete.
Running Percona XtraDB Cluster in a multi-host Docker network
In this post, I’ll discuss how to run Percona XtraDB Cluster in a multi-host Docker network.
With our release of Percona XtraDB Cluster 5.7 beta, we’ve also decided to provide Docker images for both Percona XtraDB Cluster 5.6 and Percona XtraDB Cluster 5.7.
The Illustrated Children’s Guide to Kubernetes
Kubernetes is an open source project with a growing community. We love seeing the ways that our community innovates inside and on top of Kubernetes. Deis is an excellent example of company who understands the strategic impact of strong container orchestration. They contribute directly to the project; in associated subprojects; and, delightfully, with a creative endeavor to help our user community understand more about what Kubernetes is. Want to contribute to Kubernetes? One way is to get involved here and help us with code. But, please don’t consider that the only way to contribute. This little adventure that Deis takes us is an example of how open source isn’t only code.
Hitchhiker’s guide to testing infrastructure as/and code — don’t panic!
Testing has a number of uses: from development, where it can be used to gain insight into the impact of changes, and explore special cases, to documenting edge cases and their behavior, to building shared confidence in the code base. Testing traditionally can’t provide feedback from users, though, since it always runs offline.
Supporting HTTP/2 for Google Chrome Users
Users of the Google Chrome web browser are seeing some sites that they previously accessed over HTTP/2 falling back to HTTP/1. This is because of a policy change in the most recent update to Chrome, released in late May, which removes support for NPN, one method for upgrading a connection to HTTP/2.
Kubernetes v1.3 Preview - Auth, Scale, and Improved Install
With the release of Kubernetes version 1.3 just around the corner, we’d like to share a preview of the CoreOS contributions helping guide the community toward this important milestone. Kubernetes is seeing a lot of early adoption in the enterprise and that continues to drive rapid feature development.
Best practices for Tableau Server on Google Compute Engine
Most Tableau users storing and working with data on Google Cloud Platform have probably heard of Tableau Desktop, which helps you connect to data in Google BigQuery, Cloud SQL and other databases to quickly create visualizations and dashboards for better insight.

Monitoring

Performance Monitoring with Real vs. Headless Browsers
The purpose of performance monitoring is to detect issues and minimize their impact on your end users. There are several different ways to do this, one of which is synthetic testing. Synthetic testing involves simulating real users by loading critical pages and transaction flows such as logging in and, if you’re an ecommerce company, checking out. This simulation should accurately represent the typical behavior of your users. Synthetic transaction testing for websites can come in two forms: browser emulation and real browsers.
Monitoring A/B Experiments In Real Time
As a data driven company, we rely heavily on A/B experiments to make decisions on new products and features. How efficiently we run these experiments strongly affects how fast we can iterate. By providing experimenters with real-time metrics, we increase our chance to successfully run experiments and move faster.

Software Engineering

Voices: a Text Analytics Platform for Understanding Member Feedback
In the era of big data, corporations and businesses are increasingly collecting immense amounts of unstructured data in the form of free text, from customer service conversations to market research surveys. While it is clear that such member feedback, or « Voice of the Member » (VOM), contains valuable information, it is often less clear how to best analyze such data at scale.
Comparison of Automated Test Design Methods
The crucial part of testing is test design, which contains the creation of test cases. A test case contains preconditions, inputs and expected outputs. Test design is not an easy task, and since there are numerous methods available it is difficult to select the optimal method for any given testing process. The best methods currently available make it possible to automatically generate test cases from a special (mainly visual) representation of the specification. Usually, test design methods require a model such as UML diagrams, pseudo-code, or some special test-related model of the specification. In this article we’ll compare three methods: a) State Transition Testing (STT), b) Process Cycle Testing (PCT), and c) Constraint Driven Testing (CDT).
What is JSON? How Do I Use It? Does It Beat BSON?
TL;DR: If you’ve ever wondered about JSON—what it is, how to use it or what it has to do with BSON—then you’ve come to the right place. In this article I’ll explore what these acronyms stand for and what these formats do in the world of programming and databases.
Microservices war stories
The popularity of implementing microservices in today’s application landscape continues to rise. There have been countless success stories focused on migrating from a monolithic architecture (a single large application stored in one code repository) to microservices, in which parts of application logic are broken into smaller functional services. As more teams move toward microservices architectures, an increasing number of stories have arisen about the pain of poor choices. Microservices are not the answer to all application problems. Attempts to move away from one giant application to smaller focused services often result in a tightly coupled nest of applications. Some of these problems can be avoided by learning from the mistakes of existing architectures.
Software Automation On a Budget
When a business is getting off the ground or a startup is launching, it’s understandable that money will be tight and cash flow all but nonexistent. There’s a tangible sense of urgency in getting the business’ core product ready, along with its marketing strategy and other core business functions, rather than focusing on the ideal automation solution.
How to Easily Sync Web & Mobile Experiences
Imagine you just signed up for Amazon’s VIP checkout experience. In this hypothetical experience, you can click a VIP button that allows you to shop for brand new items not yet available to normal Amazon customers. « This is awesome! » you say to yourself, as you view the VIP item list in your laptop’s browser.
Diagnosing Common Bad Micro Service Call Patterns
In our previous article Harald and I covered how to diagnose common database performance hotspots in your Java Code. In the current article we focus on patterns that cause performance and scalability issues in distributed « Micro » Service Oriented Architecture (SOA), such as transporting excessive quantities of data over a low latency connection, or making too many service calls due to bad service interface design, or thread/connection pool exhaustion.
Powering Continuous Delivery With Feature Flags
We are in the era of Continuous Delivery, where we are expected to quickly deliver software that is stable and performant. We see development teams embracing a suite of Continuous Integration/Delivery tools to automate their testing and QA, all while deploying at an accelerated cadence.
Less Is More: Optimizing Email Volume Part 1
In July 2015, we announced that we are reducing email volume so that members receive only the most relevant email communication from us. We have been making a concentrated effort in this direction, the results of which are hopefully already noticeable. In this post, we explain how we achieved those results. In part two of this series, we will discuss our new improved technique for email volume optimization which is being rolled out this year.
Toward A Practical Perceptual Video Quality Metric
At Netflix we care about video quality, and we care about measuring video quality accurately at scale. Our method, Video Multimethod Assessment Fusion (VMAF), seeks to reflect the viewer’s perception of our streaming quality. We are open-sourcing this tool and invite the research community to collaborate with us on this important project.
Two Approaches to Test Automation Architectures
I’ve yet to see two development environments that are alike. But even if there is no cookie cutter approach to software delivery, there are standard approaches, and methodologies that are consistent throughout modern software development and that frame nearly all environments.
Because there is a big move in software testing to go from purely manual testing (a non-technical process) to a fully automated deeply technical one, how QA processes are set up, and how it fits into the overall delivery chain is very important. Let’s take a look at the two most common architectures for test automation, and why they may or may not be the best approach.

Databases Engineering

Vertica

What’s New in 7.2.3: Automatic Live Aggregate Projection Usage in DISTINCT Queries
HPE Vertica 7.2.3 enhances the automatic use of live aggregate projections. In 7.2.3, the automatic usage of live aggregate projections extends to queries with SELECT DISTINCT and DISTINCT-qualified aggregate functions.

MySQL & MariaDB

Choosing MySQL High Availability Solutions
High availability environments provide substantial benefit for databases that must remain available. A high availability database environment co-locates a database across multiple machines, any one of which can assume the functions of the database. In this way, a database doesn’t have a « single point of failure. »

Data Engineering & Analytic

Redshift v. BigQuery: Similarities, Differences and the Serverless Future?
In broad strokes, both BigQuery and Redshift are cloud data warehousing services. Honestly, the similarities are greater than the differences, and if you are looking to graduate from MySQL/PostgreSQL/SQL Server for analytics or moving away from expensive perpetual license MPP databases, you can’t really go wrong with either.
Training Dataiku DSS - Partitioning
On May 11th, Jed Dougherty (Data scientist at Dataiku) presented our fifteenth Free Training, « Training Dataiku DSS - Partitioning ». In this video, Jed explains how to use APIs to predict NYC 311 calls with Python and Dataiku DSS.
Hadoop VS Spark: Which is the best Data Analytics engine?
In the book Hadoop: The definitive guide, Tom white quotes Grace Hopper, « In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems of computers. » For long Hadoop has been the data analytics system preferred by businesses all over. The recent entry of the spark engine has however given businesses an option other than Hadoop for data analytics purposes.
Open Sourcing Photon ML
Machine learning is a key component of LinkedIn’s relevance-driven products. We use machine learning to train the ranking algorithms for our feed, advertising, recommender systems (such as People You May Know), email optimization, search engines, and more. For an in-depth example, check out these posts (part one and two) on how LinkedIn applies machine learning for ranking the feed.
5 Cognitive Biases Ruining Your Growth
As data-driven as we try to be, all organizations are essentially and necessarily human-driven. And humans, naturally, are riddled with irrationality and biases.
No one is exempt. And while you can’t totally avoid biases, just being aware of them and vigilant can help you mitigate the downsides. Besides, if you’re not aware that you’re biased, that’s simply your Bias Blind Spot.
Four Analytics Problems…That Our Customers Solved
When companies decide it’s time to be « data-driven, » there are a number of common pits into which they tend to fall. One we see quite often is that they build their own internal data application platform.

Network Engineering

Optimizing TLS over TCP to reduce latency
The layered nature of the Internet (HTTP on top of some reliable transport (e.g. TCP), TCP on top of some datagram layer (e.g. IP), IP on top of some link (e.g. Ethernet)) has been very important in its development. Different link layers have come and gone over time (any readers still using 802.5?) and this flexibility also means that a connection from your web browser might traverse your home network over WiFi, then down a DSL line, across fiber and finally be delivered over Ethernet to the web server. Each layer is blissfully unaware of the implementation of the layer below it.
RFC 7873: Domain Name System (DNS) Cookies
La grande majorité des requêtes DNS passent aujourd’hui sur UDP. Ce protocole ne fournit aucun mécanisme permettant de vérifier un tant soit peu l’adresse IP source de la requête. Contrairement à ce qui arrive avec TCP, il est trivial de mentir sur l’adresse IP source, sans être détecté. Cela permet des comportements négatifs, comme les attaques par réflexion. Ce nouveau RFC propose un mécanisme simple et léger pour s’assurer de l’adresse IP source du client : des petits gâteaux, les cookies.
Why having more POPs isn’t always better
One of the most interesting parts about working at Fastly is addressing questions about how our offering differs from legacy providers. Answering these questions is pretty easy (tl;dr we can cache event-driven content, we enable real-time interactions with your services, and we seamlessly integrate into your technology stack and existing workflows), but there is one question that comes up more often than others
Supporting the transition to IPv6-only networking services for iOS
Early last month Apple announced that all apps submitted to the Apple Store June 1 forward would need to support IPv6-only networking as they transition to IPv6-only network services in iOS 9. Apple reports that « Most apps will not require any changes », as these existing apps support IPv6 through Apple’s NSURLSession and CFNetwork APIs.

Management & Organization

Six things Stackdriver brings to the DevOps table
As someone for whom DevOps and sysadmin tasks are only part of my job, having all the tools I commonly need in one place is a huge advantage. Stackdriver gives me exactly that. Monitoring, logging, debugging and error reporting are all integrated and provide the essential tools I need to keep my websites up and healthy. I also like that Stackdriver doesn’t require me to have deep system administration knowledge to set up basic monitoring. With minimal effort, I’m confident that I’ll be notified if my application has an issue.
The bad practice in FOSS projects management
During the OpenStack summit a few weeks ago, I had the chance to talk to some people about my experience on running open source projects. It turns out that after hanging out in communities and contributing to many projects for years, I may be able to provide some hindsight and an external eye to many of those who are new to it.
Optimizing an Agile DevOps Organization: Part 1
How best can firms adopting DevOps seamlessly integrate operations teams and activities into Agile models? This article discusses a framework to help organize operations in Agile environments.
Putting the Ops in DevOps: Part 1
This is the eighth blog in the Field Notes from a DevOps Cultural Anthropologist series. It is the first in a two-part blog series on the Ops side of the DevOps house. The first blog focuses on managing an Ops focused DevOps career and the second is an interview with leaders from the Ops-focused G2 Technology Group, a substantial and innovative Boston based DevOps managed services company.
How DevOps is Killing QA
In most organizations, this process generally has occurred within one or more methodologies such as waterfall or agile. In waterfall, the testing process occurs relatively late. The software is produced, bugs are found by the QA team and then the software is fixed. These last two practices repeat in a cycle until management is comfortable that it can be released to the public.