Fluentd - logs everything from everywhere

Every component in your application landscape should log (part of) its activity for troubleshooting and security auditing purposes. This helps developers check debug logs when they are developing an application, or supports the security team to check for suspicious behavior. As the number of application components and underlying infrastructure grows, (centralized) log management becomes more essential to gather the required information to make sure everything runs smoothly. Fluentd is the Cloud Native Computing Foundation’s open-source log aggregator, solving your log management issues and giving you visibility into the insights the logs hold. Fluentd is maintained very well and it has a broad and active community. Last month, version 1.1.11 has been released.

DevOps and logging

A proper logging solution for your environment is important to operate your cloud-based business. A couple of reasons why log management is important and what you can do with your logs:

Process and aggregate the logs from a number of different components. This can be infrastructure resources, cloud-native services, containers, supporting services and more. A centralized solution is important as the number of unique services increases due to the adoption of container-based microservices.

Container logs

Collect CI/CD metrics to gain insights into the application development pipeline to capture failing pipeline jobs and capture the output of tests running at each deployment.
Collecting production metrics to capture technical errors and warnings. Logs also provide feedback about potential performance bottlenecks.
Logs provide insights into the way your customers use your applications, functionally. You get to know a lot about which feature is popular and what areas need improvements.
Audit trails should be logged and saved at a safe place to act as evidence when tracing back specific actions in the production environment.
Logs acts as input to monitor the health of your system.

DevOps is all about streamlining the efforts of Dev and Ops. Short feedback loops are important to continuous improvement. Logs help to achieve this goal.

What is Fluentd?

Fluentd is an open-source data collector that allows you to standardize the data collection. Treasure Data created Fluentd, maintains and sponsors the project, and offers a SaaS-based version for streamlined day-to-day operations.

Fluentd helps teams create a single, standardized practice for streaming and collecting logs across different applications, teams and infrastructure, so no-one has to re-invent the wheel.

Fluentd can be used to collect and unify different log streams and forward these to different (external) systems. It can filter and transform the log stream while ingesting and forwarding logs, for example, splitting logs from a single source and forwarding each to a different system.

Benefits

Fluentd treats logs in the standard JSON format instead of a custom format. The JSON format is easily readable by computers. A lot of developers are also familiar with it since a number of tools and programming languages use it to integrate with each other. That’s a good start.

Integrations

From an architecture perspective, Fluentd uses modular building blocks. A lot of them integrate perfectly well with a large number of external tools and systems. Just to name a few:

Gather access logs from Apache
Act as an application logger for back-ends and front-ends
Analyze log files using Hadoop or a database system like MySQL or MongoDB
Archive logs and send them to AWS S3
Collect system logs
Generates alerts for Nagios

Check out the plugins page for a full list of options and other integration points. Since there are so many plugins to handle these functions, the core of the Fluentd package remains small and relatively easy to use. It lowers the barrier for DevOps teams to integrate it with their applications and resource components so they can concentrate on their core duties, instead of creating these integrations themselves.

Centralized

Since Fluentd captures logs from across applications and infrastructure components and forwards it to a centralized location, you only have a single place to look for the logs. This makes analyzing logs significantly easier, as everything is in one place already, and allows for correlation of insights across different log sources more easily.

Centralises location — Source: https://pixabay.com

Especially in the IoT world, in which you have a large number of sensors and connected devices, where decisions should be made within a very short timeframe, this is essential. For these use cases, you might consider Fluentd Bit, the lightweight version which requires even less resource capacity.

Reliable

A critical requirement is reliability. Logs should not vanish while in transit. Fluentd supports memory- and file-based buffering to prevent data loss. It also supports High Availability to ensure it keeps running in case of a failure of a single system.

Main functions

In a nutshell, Fluentd works with three main functions:

Filtering: criteria to select and limit the number of log messages (e.g filter on a specific keyword in every logline)
Buffering: buffer the output data before it is sent to another system (e.g. MySQL insert queries)
Routing: decide what to do with the log stream (e.g. route events based on a specific origin IP address)

For a complete list of features and options, check out the config-file webpage. It provides examples of how to load and configure this file.

Practical example: EKS logs to Cloudwatch

Let’s see Fluentd in action and make it more practical. In this example, we’re going to use Fluentd to send the logs of our Kubernetes cluster components of EKS to AWS Cloudwatch. We’ll also capture the logs of the applications we have deployed onto this EKS cluster. Cloudwatch is the cloud-native solution in AWS to store logs. It also acts as the central place for other AWS cloud-native services.

After following this example, you should have a new “log group” in Cloudwatch which serves as the entry point for your control-plane (master Kubernetes nodes), data-plane (worker nodes) and container (application) logs.

AWS log groups

In this example, the log group is named “sample-eks-cluster” to reflect the name of our Kubernetes test cluster.

In Cloudwatch, you will see the following log paths:

```
/aws/containerinsights/sample-eks-cluster/host
```
(logs from /var/log/dmesg, /var/log/secure, and /var/log/messages)
```
/aws/containerinsights/sample-eks-cluster/dataplane
```
(the logs in /var/log/journal for kubelet.service, kubeproxy.service, and docker.service)

/aws/containerinsights/sample-eks-cluster/application

(all log files in /var/log/containers)

Notice you see containerinsights in the log path. This is a new AWS cloud-native service that helps to analyze logs and metrics for any containerized workload in AWS. You can change the log path in the Fluentd config file.

Namespace

First, create a dedicated namespace for Fluentd. Put the following code in a new file called “fluentd-namespace.yaml” and replace the name-of-the-namespace placeholder.

apiVersion: v1
kind: Namespace
metadata
name: <name-of-the-namespace>

And execute:

kubectl -f fluentd-namespace.yaml

Install Fluentd as a Daemonset

Fluentd is installed as a so called “Daemonset” which guarantees it runs on every worker node of your cluster. This is needed to collect logs from the node itself and from the Pods which are scheduled on these nodes as well.

Create a ConfigMap to store the name of the Kubernetes cluster and the AWS region in which it is deployed.

Fill in the placeholders and execute it:

kubectl create configmap cluster-info \
--from-literal=cluster.name=sample-eks-cluster \
--from-literal=logs.region=<aws-region> -n <name-of-the-namespace>

Download the Fluentd daemonset and deploy it to your Kubernetes cluster:

wget https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluentd/fluentd.yaml -o fluentd.yaml

Open the contents of fluentd.yaml and change the namespace to reflect the one you want to use.

Execute:

kubectl apply -f fluentd.yaml

This deploys the following resources in your cluster:

A Service Account to be used by Fluentd
A ClusterRole and ClusterRolebinding which sets the needed permissions to read the logs from all of the components
The ConfigMap which holds the main configuration file of Fluentd. This is the same file as mentioned before – any change you need should be made here.
The DaemonSet which is the actual running application. Two important variables here: the name of the Cluster and the AWS region. These are taken from the Configmap you created earlier.

For the default behavior, no changes are needed. If all went well, you should see the Log Groups in AWS Cloudwatch like the following screenshot.

Extend with Cluster metrics

From here, you can also extend this solution with cluster metrics. Cluster metrics measure the health of your Kubernetes cluster and the results also end up in the Container Insights service of AWS. You need to install a CloudWatch Agent onto your cluster to collect these metrics. Great topic for a next article, so stay tuned.

Closing words

Summarizing this article, Fluentd is a very useful application to collect, process, and forward logs from different sources to a centralized logging service. It has a huge number of plugins so it integrates very well with other tools. Besides this, it’s easy to install and a practical example emphasizes this. Fluentd answers to the need for a proper logging solution in the cloud-native landscape.

Fluentd – logs everything from everywhere