Forget the days of manual provisioning and manual maintenance of your Virtual Machines, network infrastructure, firewalls, and security groups. Your infrastructure is created based on IaC templates. Since application deployments depend on it, it’s important to have a reliable target environment. Those IaC templates need to be tested to ensure your infrastructure behaves as expected. Problematic deployments of applications should be rolled back as soon as possible. But that’s too late. It’s even better if you do not deploy applications at all in case your infrastructure is in an unknown or unstable state. Testing your IaC templates: methods and tools to remember
Setting the scene
To simplify the problem area of this article, we’ll focus on a sample application that requires some infrastructure components that can run in any cloud. We’re deploying a containerized application that runs in a Kubernetes cluster. The application itself is a simple guestbook that shows entries to website visitors. It also offers an option to sign up and to post a new entry. Besides this, every feature can be accessed using an API. All actions are logged using Logstash and can be analysed using Elasticsearch.
Using this example you can translate this to your situation and/or (cloud) environment.
Test levels and scope
Before you write a single line of code you need to create a testing strategy. In our example, we already selected the application so we can skip portfolio management and the selection procedure. A lot of testers use the Agile Test Pyramid to define their tests. In the Test Pyramid, you’ll find the following levels of tests. From least effort and easiest to automate to maximum effort and (nearly) impossible to automate:
- Unit tests (very small tests to validate individual pieces of source code).
- Component tests (testing of specific modules and/or programs).
- Integration tests (tests of various modules as a group to verify if they work well together).
- System tests (testing the complete integrated system to validate the corresponding requirements).
- Manual tests (any test that falls not in the above-mentioned level and which is carried out by hand).
Since IaC templates deploy pieces of infrastructure with no business logic (at least not logic that brings competitive advantage to companies) we’ll skip the unit tests. Manual testing is also out of scope since we’re focused on tools and automation to conduct our tests. Therefore our attention is on component tests, integration tests, and system tests.
Supportive languages
It’s good to know which language supports a (wide) range of test tools to actually conduct various types of tests. As per 2022, the following long list of IaC scripting languages are common:
- Ansible
- AWS CloudFormation
- Azure Resource Manager
- CFEngine
- Chef
- Env0
- Google Cloud Deployment Manager
- Packer
- Pulumi
- Puppet (enterprise)
- SaltStack
- Terraform
- Vagrant
Dedicated IaC platforms such as oak9 or Spacelift are excluded from the list. Since containers are the number one deployment mechanism nowadays, in our example we’ll focus on Kubernetes (manifests) and Helm charts. The same principles apply to the other IaC languages and tools.
Test Tools
Since there are a bunch of tools available to help us test our IaC scripts, we have to narrow down our selection a bit more. Not all tools support component tests or integration tests. However, they act as a good starting point for your validations.
Helm (Lint)
Helm itself comes with Helm lint to validate the internal structure of your Helm charts. It’s very powerful to spot syntax errors in your YAML files as well as typos in your source code. Just invoke helm lint <template fie/dir> to scan your Helm chart.
A practical example is the validation of the “name property” which is required. It also helps to work consistently by validating other rules like comparing the name of the directory and the Helm chart name itself.
==> Linting . [ERROR] Chart.yaml: name is required [ERROR] Chart.yaml: directory name (elasticsearch) and chart name () must be the same [ERROR] templates/: validation: chart.metadata.name is required
Error: 1 chart(s) linted, 1 chart(s) failed
This only checks the validity of the Helm chart itself. However, it does not validate the Kubernetes manifests (templates) that are part of the Helm chart itself. Thus it would not discover errors like the following in ingress.yaml:
apiVersion: networking.k8s.io/v1 kind: IngresS metadata: name: {{ $appName }} labelS:
Notice the capital S two times.
Helm offers a feature to generate Kubernetes manifest files which are normally processed “under the hood”. It is possible to spot these errors with the following command that parses individual YAML files. However, it’s not optimal since it does not provide many details about the error. Our example would be:
Command: helm template . -f templates/
And the subsequent error:
Error: failed to parse templates/ingress.yaml: error converting YAML to JSON: yaml: did not find expected node content
The error is very generic and there is no hint on which line the error is present. To catch more errors you might consider other tools such as Kubeval.
Kubeval
Kubeval is an external tool and can be used in addition to Helm Lint. With Kubeval you can validate your Yaml templates against policies written in Rego, the tool independent compliance language of Open Policy Agent (OPA). This brings very powerful options to include a validation schema.
Example rules are the following:
- Don’t run your containers as root.
- Always provide memory and CPU limits to prevent system exhaustion.
- Be sure to block ADD_CAPABILITIES such as mounting sensitive host volumes as read-write.
- Never run a container in privileged mode.
The CPU limitation rule is like this:
"cpu": { "$id": "#/properties/cpu", "type": "string", "title": "CPU limits", "description": "CPU limits to avoid resource exhaustion.", "$ref": "#/definitions/cpu", "examples": [ "768m" ]
Besides validation on a functional level, the CPU limitation rule is also validated in terms of the right pattern. This way it’s impossible to enter wrong values such as specifying the value in Miles instead of Megabytes. Yet another trick is to avoid deploying the wrong templates too late in the process. A practical example project that also highlights the usage of some default values can be found at Helm Chart Testing on Github.
Conftest is an alternative to Kubeval which offers roughly the same types of tests, also written in the Rego language.
Kubetest
So far so good, besides relatively simple tests on static source code files, you also need to write component tests. Kubetest provides some very good examples on readthedocs.io that highlight how to do that. Kubetest requires you to actually deploy your Kubernetes manifests to your Kubernetes cluster. Developers would love this since Kubetest actually utilizes unit-test style conventions such as assertions.
Some example use cases for what you can test:
- Test to see if your daemonset is actually deployed. Therefore you need to validate if the actual running resource is not None.
- If you require 5 replicas of your deployment (this means: 3 pods), you can validate it.
Another powerful feature is the option to actually check the contents of the Pod itself. Suppose you want to test a simple web application that should write a message “hello world”. You can fire a HTTP request to actually query the output of the running Pod and capture the message. An assertion should compare the expected and the actual value to see if both match. If they do so, your test is successful.
Example:
num-containers = pod.get_containers() assert len(num-containers == 1, 'php-apache pod should have one container' response-message = pod.http_proxy_get('/') assert '<h1>Welcome to this PHP based website</h1>' in response-message.data
In essence, this is an implementation of a smoke test. It not only covers a technical-oriented test that validates if your Pod is running but actually checks the contents of the perceived message. After this simple test, you can carry on with the other tests that might take longer due to their complexity or dependencies on other systems.
Kubetest is based on Python so you need to have a Python run-time environment in your CI/CD pipeline as well as a valid Kubeconfig file that connects to the Kubernetes cluster of your choice.
Terratest
Perhaps the most extensive testing tool for Helm charts and Kubernetes manifest is Terratest. It is a very powerful testing tool written in the Go language. Developers which are familiar with Go should welcome this and they are quickly up to speed to write their tests.
Both Kubeval and Conftest are actual “template tests” that validate the syntactic of your templates. Consider integration tests as semantic tests. Terratest fulfills this feature.
Integration tests tend to be rather complex since they span multiple components that need to work together.
It also involves the provisioning and destruction of the components in charge in the right order. Therefore it takes some more effort compared to components tests. It also requires a fully operational Kubernetes cluster and it takes more time to actually conduct the tests.
A typical functional workflow for a Helm chart which depends on some other resources looks like this:
- Create a random namespace in your Kubernetes cluster
- Create a Persistent Volume and a Persistent Volume Claim to store the logs needed for your Elasticsearch application.
- Actually rollout your Helm chart that provisions the needed Daemonsets, Deployments, Replicasets, Pods, Services, Jobs, etc.
- Validate if your Persistent Volume Claims are actually bound.
- Check if your logs are actually stored in your Persistent Volume,
- Validate the return value of a simple test that reads the logs.
Keep in mind that you always need to start with a “clean cluster”. Since spinning up a fresh cluster for every integration test eats away precious time and this is often not desired.
Some code snippets to show how things look like in Terratest:
// Global function to wrap all aother tests func TestDeploymentOfWebApplication(t *testing.T) { // the path where the Helm chart is located path := "./helm-charts/" ... }
And a function call to actually validate the response of the data storage action that happened inside the container.
//endpoint requires a "tunnel" to the intended Pod to be able to get inside endpoint := fmt.Sprintf("https://%s", tunnel.Endpoint()) http_helper.HttpGetWithRetryWithCustomValidation( t, endpoint, retries, sleep, func(statusCode int, body string) bool { isOk := statusCode == 200 logsWrittenToDisk := strings.Contains(body, "Logs written to Persistent Storage") return isOk && logsWrittenToDisk }, ) }
Besides these simple snippets, it’s also interesting to view other cases, such as:
- Validating RBAC roles and permissions (see Kubernetes RBAC example test).
- Test an actual Kubernetes service (see Kubernetes basic example service check test)
More options
Testing Kubernetes based resources are nice, but there is a lot more to explore. Sticking with Terratest, you can also test your Terraform IaC files, validate Packer scripts and plain Docker configuration files. The website of Gruntwork offers a huge list of examples and details on the various options they offer.
Conclusion
Testing your IaC templates is vital to deploy your cloud based applications in a reliable way. Test tools such as Kubetest, Terratest, and Kubeval help to define test cases for template (static) tests, unit tests, component tests, and even integration tests. It’s possible to test your templates for AWS, Azure as well as Kubernetes, and other popular IaC tools like Packer, Vagrant, and Ansible. In this article, I showed some examples of a dummy application that uses Elasticsearch to crawl some logs. Tests are written in simple scripting languages such as Go. Therefore developers have a good time learning them. Follow the links in the article for more background information.
If you have questions related to this topic, feel free to book a meeting with one of our solutions experts, mail to sales@amazic.com.