Bloomberg has been using Kubernetes, an open source system for deploying and managing containerized applications, in its infrastructure for the past three years. This container orchestration system has proven to be one of the industry’s most promising enterprise technologies, but it was critical for the company’s infrastructure teams to understand how Kubernetes breaks in order to make this tool as reliable as possible.
Chaos engineering helps discover and fix problems in Kubernetes clusters before real outages occur by deliberately creating errors and enabling more thorough testing of live systems and networks. There’s also a greater ability to introduce various types of errors that might occur in real-life scenarios, as opposed to unit tests that validate a small piece of code against a set of assumptions.
“In chaos engineering, you introduce uncertainty on the entire system,” said Mikolaj Pawlikowski, a London-based software engineer with Bloomberg’s Data Technologies Infrastructure team. “While it doesn’t replace the need for unit testing, it does allow you to detect many things that are extremely difficult to introduce in isolation – for a relatively small price.”
Either way, troubleshooting becomes even more challenging when there’s compromised connectivity in a Kubernetes cluster with anywhere from a dozen to a few hundred nodes, some of which may be either virtual or physical machines.
PowerfulSeal was introduced by Bloomberg last year as an open source tool to test Kubernetes clusters. To address the need for troubleshooting and visualizing connectivity issues during experiments with Kubernetes clusters, Pawlikowski has now built a companion tool called Goldpinger, which was recently published as an open source tool and is being presented at KubeCon + CloudNativeCon North America 2018 in Seattle, WA later today.
What makes these tools unique is how they work together. PowerfulSeal introduces failure into various parts of Kubernetes’ clusters. It can be configured to apply pressure to the system by using chaos engineering to create scenarios where daemons on various hosts are destroyed at random and then brought back up. Goldpinger verifies the stability of the cluster’s networking layer by detecting any connectivity issues between the nodes. This provides a mechanism to verify that an application still works despite potential issues with network connectivity. Goldpinger also graphically depicts a graph of the Kubernetes cluster to illustrate that the application remains stable and is working.
“Goldpinger essentially does the dirty work of continuously checking the connectivity between various nodes,” said Pawlikowski. “It translates each Kubernetes cluster into ready-to-use metrics that can be used to visualize and create alerts, as well as a graphic that you can browse and see which nodes are okay, slow, or broken in order to diagnose any problems.”
While distributed systems exhibit all kinds of problems, like race conditions, interference between different components at the server or system-level, and even incompatible versions, the failure resistance of the system as a whole is tested as opposed to isolated components. Chaos is introduced such that the system is expected to continue working while a failure may be present. The main goal is to introduce the failure and for the system to self-adjust in order to cope with it. Goldpinger then helps detect and verify the impact on the network and the extent of the outage, if there is one at all.
PowerfulSeal was initially created for an internal Bloomberg platform that’s using Kubernetes to enable developers to more easily deploy code. After being used with other Bloomberg Kubernetes projects, it was shared with the broader community at KubeCon 2017 and has sparked interest from various institutions, like universities, hedge funds and Fortune 100 retailers. They’re interested in refining the tool and using chaos engineering as a way to add value to Kubernetes clusters in order to create more stable distributed systems.
“We’re excited to see CNCF end users like Bloomberg continue to innovate by open sourcing internal cloud native tools like Goldpinger for the benefit of the wider community,” said Chris Aniszczyk, COO of the Cloud Native Computing Foundation.
Goldpinger and PowerfulSeal are a testament to tremendous value companies get from being able to adopt open source technology rather than having to create tools in-house to solve every problem. As Bloomberg is an active participant in the open source community, once its engineers create software they feel could be useful to other developers, they frequently contribute these tools to help others across the community.