DevOps tools make the world go ‘round. Too many tools make the DevOps engineers go mad. The first one is true. The second one, too. Fact is: tooling that helps you automate software development and infrastructure provisioning workflows and pipelines is incredibly important for both the engineers building the automations, as well as the developers using the automated workflows in their day-to-day. Great tools remove complexity from both the DevOps Engineer and the software developer’s work, are simple to use, don’t take a lot of work to manage and are easy to integrate into ecosystem of existing tooling. But reality is different, and DevOps tool sprawl is a very real problem. Let’s dive in.
This is post 1 of 4 in the Amazic World series sponsored by GitLab 1. Toolchain tax: just the tip of the iceberg? 2. Why the Pipeline is key for Cloud Engineers 3. Changing the Security Paradigm – Why you must shift left 4. GitOps is more than Git (or Ops) |
Tools are about outcomes
In an ideal world, tools are about outcomes they help achieve. The toil they help remove, the performance increase of a pipeline, the reduction of errors and reduction of re-work. The increased consistency of repetitive tasks like regularly releasing to production. The reduced time to fix a production outage, and the reduced number of production incidents. The technical debt they prevent, making it easier to change components in the future.
Whatever specific perspective: tools generally help increase quality and engineering flow. This is often where the tooling journey starts: a tool is added to the toolchain to solve one (or two) specific challenges developers or cloud engineers have.
But reality is messy, and unfortunately the tools themselves add complexity and operational burden, lowering the potential gains from the tools, reducing quality and taking developers and DevOps engineers out of their flow with time spent on toil: operational work that tends to be immaterial, manual and repetitive (or, in other words: work no-one likes to do but no-one has automated yet).
The cause for this reduction and complexity is generally found in two areas: integration with other tools in the chain and day-to-day operations.
Sometimes, tools aren’t integrated well, and one-off work-arounds are needed to create a workable situation. But the impact on the resilience and stability of the tool are at jeopardy, and upgrades aren’t done as teams are afraid of making changes. As teams leave the toolchain alone to protect these snowflake integrations, performance, security and resilience are degraded, and technical debt is gradually built up.
The question is: are there enough gains to warrant using the tool? Is the net effect positive? Is the complexity and toil tax we’re paying by having to manage this tool acceptable? Are we not building up excessive technical debt by choosing to ignore upgrading this tool in the short term?
The tip of the iceberg
To get a better understanding of the additional work, complexity and technical debt of adding another tool to your toolchain, we need to look at the balance between the functional and the non-functional aspects of the tool.
A tool’s functional aspects describe what it does. It’s what people usually look for in a tool: does it have the functionality to solve my problem? Does it do what I need it to do?
The non-functional aspects describe how it does that, and are a measure of the quality of the tool. Non-functional aspects determine things like performance, security, ease of upgrading, ease of user management, compliance and security features, potential for building up technical debt and more.
It’s clear that when you’re looking at implementing a new tool, you need to look beyond just the functionality, and look at the rest of the iceberg, as well. And like with taxes and debt, you can’t burden future-you with the debt and tax: it’s better to get ahead of the problem and make sound choices now, rather than fix the mess later.
Toolchain Tax
And while choosing the right tool for your functional requirements is fairly straight-forward, the non-functional requirements are critical to a successful initial adoption and long-term success.
We’ll highlight a couple important areas to help you get started in compiling the right requirements and avoid the most common pitfalls.
Lifecycle Management
The most important aspect of not adding additional burden and toil on your DevOps engineers’ day to day is the operational aspect of the tool, and lifecycle management in particular. In other words: how hard is it to plan and execute upgrades? Are they seamless, or do they require downtime? How many people does it take to keep the application running? Are new versions in a version-controlled artifact repository somewhere, and does the tool have internal pipelines in place for continuous deployment of patches?
Lifecycle management isn’t about just patches and upgrades. User Management is another big part of the day-to-day toil of an application, and the better the architecture, UI and automation, the less time needs to be spent on user management.
Integration
Another important aspect is integration between the tool and other tools. In a DevOps toolchain, many tools work in unison in any given pipeline, so it is crucial that each tool is able to pick up data and metadata from the tool before it, and give data to the tool after it. This applies to both the operational aspect (like user management: can that be automated by being fed data automatically, or is it limited to manual input?) as well as the functional aspect (what the tool functionally does in the toolchain).
This means looking at programmatically accessible interfaces like APIs, looking at the data formats they support, as well as deeper integration options like plugins or modules that offer an off-the-shelf integration between tools.
Future-ready
Can the functionality be extended or changed to keep up with changes and innovations? Being modular and extensible are important aspects to prevent technical debt, otherwise the moment you put the tool into your pipeline, it’s already lagging behind.
Risk Management
Important in enterprise environments are compliance and security, and your tools must have the ability for audit logging, and Role-based Access control at the minimum, but depending on the tool and what information it handles goes much further. A great example is having a single overview of risk and compliance across the fragmented ecosystem of cloud vendors, cloud services, service instances, accounts and users.
Preventing DevOps tool sprawl
What was implicit until now what the amount of work put into selecting the right tool (with both functional and non-functional requirements in mind) versus the perceived value of the tool. Is the non-trivial amount of work in selecting a tool worth it considering the problem it’s solving?
The ecosystem of tools is massive, and likely your toolchain is complex and fragmented across planning, development environments, integration and security testing, performance and load testing, artifact creation and distribution, configuration management and operational monitoring.
And given that for each tool, you need to spend a generous amount of time selecting and validating it, it makes sense to limit the number of tools you have, and look for vendors that solve not a single problem, but a number of adjacent challenges in a single tool. This not only limits the amount time spent on selection and validation, but also limits DevOps tool sprawl.
And it’s not that many tools are bad, but an increased number of tools does have more integration surface between them, often requiring custom glue code to make tools work together. And it’s this glue code that is the culprit in many outages and broken pipelines and a major source of technical debt. In this sense, limiting the number of integrations and handovers makes all the difference, operationally.
Wrapping up
We’ve learned that we shouldn’t put what we want before how we do it. Consider the how of your toolchain and the amount of complexity you can handle, and think about ways to collapse that complexity, for instance by integrating security scanning of new builds with your version control, so that merged code gets checked automatically, from a single DevOps tool.
I’ll leave you with a valuable resource that helps you in this journey. The Starting and Scaling DevOps in the Enterprise eBook explains how to use things like value stream mapping to figure out what steps your DevOps pipeline consists of, how to scale up DevOps processes in larger organizations, and how to keep the number of tools balanced, preventing unwanted DevOps tool sprawl.
Download that eBook here:
https://amazic.com/starting-and-scaling-devops-in-the-enterprise/