Applications that run in production need to be monitored for (potential) problems and other issues. Without logging events from infrastructure components or the actual application itself, your DevOps teams do not have any insight into how the application and the underlying infrastructure behave. Monitoring trends help to spot problems early on as well as provide valuable audit logging evidence for various reasons. Since there are so many log collectors on the market, it’s hard to choose the “right one” for your specific case. To guide you in this journey, here is the ultimate checklist for your log collector.
Recap of log management
Prior to the checklist itself, it’s vital to understand what log management actually is so you can put all items in perspective. Log management is all about collecting, filtering, storing, analyzing, and monitoring log data from IT systems, infrastructural resources as well as applications. It provides valuable information to spot system issues, detect security threats, and review compliance requirements as well as generic troubleshooting problems. Log management collects data from multiple sources and makes it available for use at a centralized location.
To cloud or …
One of the first aspects to decide about is to choose one system that supports everything: from top-modern applications to legacy applications. In essence, this pushes you to select a cloud-based solution or not. Cloud providers offer rich tools and solutions which heavily make use of their (excellent) cloud services. Although that sounds great, keep in mind that this also creates a (potential) vendor lock-in.
Besides the landing zone of the log management tool itself, you need to decide about the integrations which need to be supported. Should you support your cloud-native services which only run in the cloud or should you also support workloads that are hosted in your internal data center? Perhaps there is an option to select a single tool that can run (partially) within a cloud environment and on-premises.
Another key question to answer is: does the cloud service meet the compliance requirements of your organization? If not, are there (counter)measures to meet these? Having an open source-based solution gives you the freedom to adjust (parts) of the tool to your requirements.
Performance
Every tool should be reliable and fast. Log management tools are no different. It’s one of the key factors that organizations keep in mind when selecting tools for log management. Performance includes multiple characteristics:
- Throughput of data. Your applications and infrastructure can generate a lot of data which all needs to be processed by the log aggregator, throughput is vital. It’s important that you have the option to horizontally scale and vertically scale your infrastructure resources to answer growing data throughput requirements now and in the future.
- Latency should be as low as possible to be able to real-time analyze the data which is being collected and stored. In general, the overhead of the tool itself should be low and use the available network connections as best as possible. Carefully consider the aspects that can have a negative impact on the latency: for example the transformation of the collected data or the filtering of messages which should be ignored to save costly storage.
- Scaling up and down should be supported to keep up with growing demand as well as reducing costs when the demand is low (for example office hours and holiday seasons).
Embedded devices
In general, it’s wise to invest time to seek a log management tool that fits your main purpose. If you require a solution that needs to work on an embedded device that should not consume too much CPU, memory, and other resources, it’s best you choose a lightweight solution that does not include too many features which take long to load and process. Furthermore, don’t accept plugins that make processing of the data slower than needed. It might be possible to switch them off or completely uninstall them.
Horizontally scaling your number of instances is quicker when you have a small system with fewer components that need to interact with each other. When scaling down, valuable (cloud) resources are also released quicker thus saving you costs.
Functional requirements
From a functional point of view, you should select which features are needed for your organization. It’s smart to use a classification scheme to map all of them against the desired priority such as “must haves” and “nice to haves”.
Different sources
Collecting information from different sources is one of the key features which can’t be missed. These sources include databases, streaming data services, cloud services, “traditional log files” as well as message queues. The desired solution should support every data source that you require. Besides input sources, the target systems for the output of processed log data are also important. This can be another database, a centralized storing system, or a cloud service that is dedicated to the results of your log management solution.
Plugins and filters
Select which plugins you need. “Thick” log management tools might have only a few plugins and provide many things which are out of the box. This makes the tool more robust, but perhaps also increases latency and demands a greater footprint in terms of resource usage. Plugins fill the gap between a lightweight tool and its desired capabilities. FluentD, one of the most popular log management systems provides a huge list of all the available plugins. All of the plugins are sorted into multiple categories such as output systems, monitoring, and cloud-specific plugins.
Filters are important to select what you need from the massive amount of log information that enters your log management system. For example, you can filter on the log level (info, debug, warning, etc), and add extra elements and other metadata to your logging data before it will be processed further. Another feature would be to add tags to specific log messages to group them for later analysis.
These functional requirements are just the most important ones. For sure there are many more such as data integration features as well as features that enable you to send notifications on specific situations. Other features include ways to route data, provide data buffering capabilities, and community support.
Very useful (security) filters
Security always comes in a special place. Log management systems are no exception. Let’s take a look at some very useful features which focus on security-related aspects.
Often log files contain sensitive data like email addresses, usernames, IP addresses, phone numbers, and access tokens. This is privacy-related data that should be masked. A plugin or built-in feature such as “anonymizer” for FluentD does exactly this. Another option would be to anonymize data, such as credit card numbers. This way, the data format keeps in place so you can actually recognize this as a credit-card number even after it has been processed as such.
Yet a third option would be to sanitize sensitive input by using regular expressions. Think of sanitizing hostnames, IP addresses, DNS records, User-agents, etc.
All of these actions prevent the processing and forwarding of logs from being stored in online systems in which it virtually “never gets deleted”.
Every log line should have a proper and this is especially true to trace evidence for security-related events. There are many plugins to help you properly format every log line with the correct timestamp. This includes the full date as well as the current time in the correct timezone. For example, the “elasticsearch timestamp check” plugin for FluentD.
Kentaro Yoshida wrote a plugin for FluentD to capture and rewrite tags like mode_rewrite just before the output stream is sent to another system. This plugin also gives you the option to change Apache logs by domain, status code, user agents, request URIs, etc. It supports regular expressions you can utilize this to exactly specify when it should be triggered in case of an occurrence.
Wrap up
As you’ve seen in this article, there’s a lot to explore when it comes to a log management system. This article provided a starting point for every IT professional that seeks a solution for their needs. It provided an overview of common factors and requirements to take into account when selecting a tool to do the job. Browsing back through the checklist gives you a view on whether or not to select a cloud solution, the integration points of log collectors. In addition to that, we’ve talked about resource usage, performance, and latency. And last but not least we’ve seen key considerations about input/output-related features as well as various security-related filters to help you prevent sensitive files from ending up in target systems. I hope this has helped you get off the ground with your choices.