Mon blog-notes à moi que j'ai

Blog personnel d'un sysadmin, tendance hacker

Compilation veille Twitter & RSS #2016-40

La moisson de liens pour la semaine du 3 au 7 octobre 2016. Ils ont, pour la plupart, été publiés sur mon compte Twitter. Les voici rassemblés pour ceux qui les auraient raté.

Bonne lecture

Security & Privacy

WiFi Wardriving++
Wardriving is the act of finding WiFi networks, usually from a car or other vehicle, and mapping out their location. I did some similar work almost 3 years ago but this time around, there’s been a few upgrades. Let’s build some serious wardriving kit!

System Engineering

DNSSEC and ECDSA
DNSSEC and ECDSA Is the elliptical curve cryptographic algorithm (ECDSA) a viable crypto algorithm for use in DNSSEC today? And what has changed over the last two years?
Dynamic Provisioning and Storage Classes in Kubernetes
Storage is a critical part of running containers, and Kubernetes offers some powerful primitives for managing it. Dynamic volume provisioning, a feature unique to Kubernetes, allows storage volumes to be created on-demand. Without dynamic provisioning, cluster administrators have to manually make calls to their cloud or storage provider to create new storage volumes, and then create PersistentVolume objects to represent them in Kubernetes. The dynamic provisioning feature eliminates the need for cluster administrators to pre-provision storage. Instead, it automatically provisions storage when it is requested by users. This feature was introduced as alpha in Kubernetes 1.2, and has been improved and promoted to beta in the latest release, 1.4. This release makes dynamic provisioning far more flexible and useful.
The elements of scaling
At our second annual customer summit, we heard from industry leaders on topics like performance, the future of the web, tackling problems with VCL, and leadership. Camille Fournier, former CTO of Rent the Runway, spoke on the key elements of growing teams successfully.
Introducing InfraKit, an open source toolkit for creating and managing declarative, self-healing infrastructure
Docker’s mission is to build tools of mass innovation, starting with a programmable layer for the Internet that enables developers and IT operations teams to build and run distributed applications. As part of this mission, we have always endeavored to contribute software plumbing toolkits back to the community, following the UNIX philosophy of building small loosely coupled tools that are created to simply do one thing well. As Docker adoption has grown from 0 to 6 billion pulls, we have worked to address the needs of a growing and diverse set of distributed systems users. This work has led to the creation of many infrastructure plumbing components that have been contributed back to the community.
Harden Debian with PIE and bindnow!
Shipping Position Independent Executables and using read-only Global Offset Table was already possible for packages but needed package maintainers to opt-in for each package (see Hardening wiki) using the « pie » and « bindnow » Dpkg hardening flags.

Monitoring

Breaking Down Monitoring
Monitoring is pivotal in the sustained proactivity in your ITOps architecture. In recent years, we have seen an explosion in both the number of and types of tools classified as « monitoring » tools. While this ever-increasing tools landscape has vastly increased ITOps visibility, the occasional side effect of integrating this vast array of tools is to create even more noise. The « visibility and noise » paradox has turned the monitoring landscape into a catch-22 for many IT departments, while others have streamlined their proactivity to issue resolution. Let’s look at the monitoring landscape and build an integrated environment that succeeds.
Docker Stats Monitoring: Taking Dockbeat for a Ride
There is no silver bullet. This is how I always answer those asking about the best logging solution for Docker. A bit pessimistic of me, I know. But what I’m implying with that statement is that there is no perfect method for gaining visibility into containers. Dockerized environments are distributed, dynamic, and multi-layered in nature, so they are extremely difficult to log.

Software Engineering

Did I Test This Feature? - Advances in Test Gap Analysis
In a recent blog post, I wrote about Test Gap analysis—our analysis that identifies changed code that was never tested before a release. Often, these areas—the so called test gaps—are way more error prone than the rest of the system, which is why test managers try to test them most thoroughly.
5 Basic REST API Design Guidelines
As soon as we start working on an API, design issues arise. A robust and strong design is a key factor for API success. A poorly designed API will indeed lead to misuse or – even worse – no use at all by its intended clients: application developers.
PHP 7 deployment at Dailymotion
In march 2015, we started to think that code refactoring and architecture improvements, will not be the only way to optimize the response time on dailymotion.com. This is the core problem of websites with high load : « how to scale without investing too much in people/servers ».

Android

Approaching Outside-in TDD on Android (I)
Outside-in Test-Driven Development (TDD) can be a challenge to implement. In this 3-part post series, we would like to share our experiences applying it to Android development and offer some practical tips for doing so yourself.

Web performances

Pregenerating Static Web Pages for Better Performance
In my recent Tuning NGINX article, I talked about how it’s important to tune based on the specific needs of an application and its environment. In today’s article, we’re going to put that in practice. Last time, we tuned our environment by adjusting parameters within NGINX. Now, we’re going to explore a sometimes-overlooked aspect of tuning: making adjustments to how our application works.

Databases Engineering

Solr vs. Elasticsearch: Who’s The Leading Open Source Search Engine?
More than ever, this is the time of cloud and data growth. Today’s applications generate data in petabytes and zettabytes while everyone still demands faster and faster performance. However, as the data piles up, searching through all of that information effectively quickly becomes a substantial back end challenge.
Juggling Databases Between Datacenters
Recently we went through an exercise where we moved all of our database masters between data centers. We planned on doing this online with minimal user impact. Obviously when performing this sort of action there are a variety of considerations such as cache consistency and other pieces of shared state in stores like HBase, but the focus of this post will be primarily on MySQL.

MySQL & MariaDB

Zooming-in on Group Replication performance
A previous blog post exposed the main factors affecting Group Replication performance, which was followed by another that showed the scalability of both single-master and multi-master throughput. In this post we return with more « inside information » that may be useful for optimizing the performance of Group Replication deployments.
How to move InnoDB-Logfiles on a Galera Cluster
Somebody recently asked, what they had to do, if they wanted to move their InnoDB-Logfiles back to the datadir. As a challenge, the servers were part of a Galera Cluster.
MySQL 8.0: Scaling and Performance of INFORMATION_SCHEMA
MySQL 8.0 comes with the new design of INFORMATION_SCHEMA subsystem. The blog MySQL 8.0: Improvements to Information_schema provides an overview of the improvements we made. This blog focuses mainly to demonstrate performance of the INFORMATION_SCHEMA in MySQL 8.0, giving us an idea on the kind of performance gain that one can expect.
MySQL 8.0 General Tablespaces: File per Database (and no FRM files)
In this blog post, we’ll look at MySQL 8.0 general tablespaces.

Elasticsearch

Anatomy of a Watch
Elasticsearch has many use cases and often those use cases involve knowing when an event that is streaming into the system meets certain conditions and having additional data from the event at our disposal. For these scenarios, we developed Watcher. This component is very powerful and flexible alerting tool, however, because it is so configurable sometimes reviewing the documentation can be daunting and seem very complex.
You get a report! You get a report!
We recently released the first version of Reporting for Kibana which gives users the ability to generate a PDF report from saved Kibana dashboards. By leveraging Watcher’s email action, you can send PDF reports regularly, or only when certain events have occurred.

Data Engineering & Analytics

Building The LinkedIn Knowledge Graph
At LinkedIn, we use machine learning technology widely to optimize our products: for instance, ranking search results, advertisements, and updates in the news feed, or recommending people, jobs, articles, and learning opportunities to members. An important component of this technology stack is a knowledge graph that provides input signals to machine learning models and data insight pipelines to power LinkedIn products. This post gives an overview of how we build this knowledge graph.
Choosing the correct ML Solution for you…
Enterprise applications trending to adopt Machine Learning as their strategic implementation and performing machine learning deep analytics across multiple problem statements is becoming a common trend. There are variety of machine learning solutions / packages / platform that exist in market. One of the main challenges that the teams initially trying to resolve is to choose the correct platform / package for their solution.
Beyond Deep Learning – 3rd Generation Neural Nets
By far the fastest expanding frontier of data science is AI and specifically the rapid advances in Deep Learning. Advances in Deep Learning have been dependent on artificial neural nets and especially Convolutional Neural Nets (CNNs). In fact our use of the word « deep » in Deep Learning refers to the fact that CNNs have large numbers of hidden layers. Microsoft recently won the annual ImageNet competition with a CNN comprised of 152 layers. Compare that with the 2, 3, or 4 hidden layers that are still typical when we use ordinary back-prop NNs for traditional predictive analytic problems.
How-to: Do Scalable Graph Analytics with Apache Spark
Graphs—also known as « networks »—are ubiquitous across web applications. As a refresher, a graph consists of nodes and edges. A node can be any object, such as a person or an airport, and an edge is a relation between two nodes, such as a friendship or an airline connection between two cities. Social networks and content networks (which comprise interlinked documents, such as web pages or citation networks) are other very common examples of a graph. Finally, the Internet of Things is basically a huge « graph of graphs ».

Network Engineering

NetFlash: Tracking Dropbox network traffic in real-time with Elasticsearch
Large-scale networks are complex, dynamic systems with many parts, managed by many different teams. Each team has tools they use to monitor their part of the system, but they measure very different things. Before we built our own infrastructure, Magic Pocket, we didn’t have a global view of our production network, and we didn’t have a way to look at the interactions between different parts in real time. Most of the logs from our production network have semi-structured or unstructured data formats, which makes it very difficult to track a large amount of log data in real-time. Relational database models do not support these logs very well, and while NoSQL solutions such as HBase or Hive can store large amounts of logs easily, they aren’t readily stored in a form that can be indexed in real-time.

Management & Organization

The deadly difference between hiding the symptoms and solving the problem
There’s a common misconception between solving a problem and hiding the symptoms. The tech world is full of examples both because it’s an easy falling trap and because of the move fast culture.