HomeArchitectureChaos engineering - practical resources to use in your next project

Chaos engineering – practical resources to use in your next project

Chaos Engineering is well on its way to the top of the Garner Hype Cycle. Organizations as well as individuals pick this up as we speak. New initiatives, services, and tools emerge. Furthermore, cloud providers offer their Chaos Engineering-related services to attract IT professionals to extend the usage of their cloud platforms. So what’s good to know to actually get started with this new trend? In this article, we’ll explore various resources and topics that are of help to you. Chaos engineering – practical resources to use in your next project.

Get inspiration

One of the first things to make a great start is to get inspiration from the companies that already went ahead. There is a very active webpage on Github that collects stories from large companies throughout the globe. It represents stories from various market segments such as financial institutions, retail, health care, government, etc. Topics range widely but include the following:

  • Lessons learned of applying chaos engineering at specific companies
  • DevOps best practices of Chaos Engineering.
  • Published blogs about Chaos Engineering from various companies to share their experiences.
  • Chaos Engineering with respect to systems’ resilience.
  • Conducting regression tests, performance tests, and security tests in the light of Chaos Engineering.
Chaos engineering - practical resources to use in your next project
Source: https://pixabay.com/

The great thing about these stories is to collect valuable information before you should actually start your experiments or pilot project. Proven ways of working help to get funding and commitment in your organization. It also helps you on a practical level since there are many references to tools and (technical) architectures.

Chaos Maturity Model

Almost every major trend within the IT software industry has its, own Maturity Model. So is the case with Chaos Engineering. Before you carry out your experiments, it’s wise to assess the maturity level of your organization. Yuri Nino wrote an excellent article on medium to express the different levels of Chaos Engineering using the coffee metaphor.

The maturity model is based on two metrics: adoption and sophistication. You can plot your Chaos Engineering programs against the programs of the model and see where you stand.

The levels of adoption range from “in the shadows”, to “investment” and “adoption” all the way to “cultural expectation” whereas the levels of sophistication start with “elementary”, and “simple” followed by “sophisticated” towards “advanced”.


Five authors wrote a book published by O’Reilly which is fully dedicated to Chaos Engineering. They used a lot of input from the Netflix teams to think about how your systems would behave when they are hit by an unexpected failure. In the book, you will find a lot of information on the following topics which are still relevant today:

  • Learn to use the Maturity Model and set realistic goals
  • Craft the processes which are needed to carry out the engineering processes
  • Collect data on the healthy state of your system and build hypotheses that support those.
  • Run your experiments with your production systems in mind, but also make sure you don’t take many risks blowing up your critical systems in case things go wrong.

With this being said, it’s also great to explore the Application Resiliency Maturity Model which clearly shows the characteristics and typical activities that niche players, challengers, visionaries, and leaders carry out. This makes it very practical to assess the current state of your organization.

Chaos Engineering and Kubernetes

Since nearly every organization uses Kubernetes nowadays for production-grade workloads, this platform can’t be missed when it comes to Chaos Engineering. Chaos Mesh is one of the incubating projects of CNCF. It provides a simple way to get started with Chaos Engineering on Kubernetes clusters. A quick summary of how Chaos Mesh can help you to conduct disrupting events.

The platform of Chaos Mesh is inspired by the characteristics of distributed systems. Therefore there is a lot of experience in terms of networking, disks, Operating Systems, etc. All features were designed with this in mind so this closely resembles whatever can happen in real-life situations.

Chaos engineering - practical resources to use in your next project
Source: https://pixabay.com/

You can easily deploy Chaos Mesh to your existing Kubernetes cluster. A powerful dashboard helps you to visualize your tests and simulated faults. It does not actually change your deployment logic, so this won’t conflict with the ways to restore your workloads in case of non-recoverable failures.

Control the blast radius of your experiments by protecting sensitive namespaces which should not be included in the tests. In addition to that, there is the option to create roles and users that have a limited scope. This makes sure you won’t break workloads that are not part of the tests (yet).

Chaos toolkit

Developers can also make use of the tools and examples presented by the ChaosToolkit website. This great initiative aims to let developers run their own Chaos Engineering experiments by writing their own experiment definitions.

In its simplest form, you need to describe the following elements:

  • The metadata of the experiment such as the name, description, and tags
  • Following is the “steady state hypothesis” that defines the expected “successful end-state” of a system
  • At its core, it describes the method to simulate a failure including a verification (probe) step to assess the targeted environment.
  • Optional: rollbacks in case the system can’t recover from the proposed failure

Code snippet

You can find some samples of these experiments on GitHub. These require the installation of the Chaos Toolkit CLI, Minikube, and Helm to conduct the tests. Users can also run a report (PDF) based on the outcomes of the experiments. If you want to schedule your experiments periodically, you can utilize Kubernetes jobs since there is no scheduling capability.

Chaos engineering - practical resources to use in your next project

Extensions add extra features to the existing tool. The following categories offer a wide list of extra options which can be used: application, reliability, load testing, and network to name a few.

Compare Chaos Toolkit with other tools such as Litmus and Chaos Mesh.

AWS Fault Injection Simulator (AWS FIS)

About two years ago, AWS introduced their service called AWS Fault Injection Simulator (AWS FIS) for DevOps teams to carry out Chaos Engineering experiments. It offers generated faults for a number of services including EC2 instances, containers, databases, and storage-related services. The primary objective of these faults is to help create resilient systems that survive common failures before it leads to more catastrophic events.

Besides fault injection sequentially or simultaneously, it offers so-called guardrails which send out alarms to Cloudwatch. These act as triggers for the DevOps in charge to analyze their resources during the time of the experiments.

You would start an experiment by crafting an “Experiment template” and feed that into the Simulator. This calls the FIS engine which carries out the experiments itself. The simulator can be controlled through the AWS Management Console or the AWS Command Line Interface.

Chaos engineering - practical resources to use in your next project
Source: https://pixabay.com/

A code snippet for the above-mentioned template can be seen below:

"description": "The description of the experiment", 
"targets": {}, 
"actions": {}, 
"stopConditions": [], 
"arn:aws:iam::000000000000:role/ExecuteFISActions", "tags": {} }

More templates can be found on the webpage of the FIS experiment templates.

Wrap up

Chaos Engineering quickly becomes mainstream in a lot of organizations. Yet, developers have to learn a new discipline, and also managers as well as other IT experts need to make a shift in their minds. There are many initiatives nowadays which help those groups to understand the main principles as well as practical resources to get started with it. It’s interesting to see there are many (OpenSource) tools on the internet as well as “Chaos Engineering as a service” offerings by Cloud providers. This brings Chaos Engineering closer to the DevOps teams to even quicker adapt it to their daily tasks.

If you have questions related to this topic, feel free to book a meeting with one of our solutions experts, mail to sales@amazic.com.


Receive our top stories directly in your inbox!

Sign up for our Newsletters