Many organizations run their applications in the cloud. They already saw the benefits of increased agility and flexibility. To prevent problems it’s important to closely watch those applications to make sure they keep running smooth. Your DevOps teams need to react to incidents and problems, there is no operations team anymore that does it for you. Monitoring tools can help, but also application logging is very important. In this article we will explore important notes about application logging in the cloud. These tips and tricks help your company from an organizational point of view as well as the DevOps teams actually developing and running the application.
In today’s world, logging is essential to the Agile way of working when fast feedback and responses for the underlying system is used to drive the next product increment. On a more operational level, logging is essential to troubleshoot issues caused by application (components) or other infrastructure related aspects. Logging also helps to gain operational insight and intelligence by providing valuable information in dashboards which focus on useful business KPIs. Whatever your logging solution is, make sure it is top of mind of the upper management.
Your cloud provider helps
First of all, be sure to explore the build-in logging solution of your cloud provider. AWS offers AWS CloudWatch as their default logging service. Nearly every resource you create streams to this service. At the other end of the cloud spectrum, Azure offers Azure Monitor Logs. It collects and organizes log and performance data from your resources. Google offers Logs Explorer any deployed resources. It enables a DevOps team to search, sort, and analyze logs through the usage of flexible queries you can define yourself. And last but not least: Alibaba Cloud enables you to use their so called “Log Service”. This is a cloud-native solution to real-time process logs, metrics and traces. It also helps to collect, transform and ship data to other systems.
Three basic rules
Since logging is a critical non functional requirement of your application, treat logging very seriously.
- It’s a good practice to always log your resources and applications whenever you can. There is never a valid reason to switch logging off.
- Your log Life Cycle management and availability matters. It’s important that your log files are always available and in the right format. Structured and standardized formats over unstructured formats. Pay attention to when, where and how you store logs. And on top of that how you would archive and back them up.
- Securing your logs is also vital since they can contain evidence in case something goes really wrong. Auditors demand traceability of your processes and logs play a vital role here. Sometimes logs contain (unwanted) sensitive data so you need to protect them very well.
With these things in mind, the following notes apply for DevOps teams that build applications to maximize their agility and speed when constructing log files the best possible way.
Logging for developers
Developers have a lot of freedom to choose how they want to log certain activity of their applications. In a modern environment, which is focused on micro-services, it’s important to know what to log and what not.
Log the following
- Log the name of the function or method if you want to understand the context of the business logic that spits out other logs. This is vital to understand at which part of the system you are looking at. It should be exactly the name of the function, not a prefix.
- Log every message, both incoming and outgoing to completely understand the flow of the transaction. This includes API endpoints, URLs as well as the headers and the actual body of the requests and responses. Every message being logged should have a unique ID to trace it across different components that make up the total application. This ID also helps to close the feedback loop in which you collect actions generated by those logs.
- Non functional aspects of an application should also be logged: think of system events like startups of components, database connection (attempts), loaded configuration files, etc. Be sure to log data operations such as who or what attempts to login or logout, which role it entails, the action it tries to perform, etc. Furthermore, log all CRUD related events. Performance and security related events should not be forgotten either. Think of network latency, memory consumption and CPU usage to name a few. To capture security issues early on, log failed login attempts and it’s metadata: origin IP address, username, timestamp, etc. Also log unusual system behavior such as traffic spikes, a huge increase in number of logged in users, resource consumption, etc.
Don’t log the following
The following list provides a brief overview of what NOT to log, especially at applications running in the public cloud.
- Never log any financial related data like transactions, bank accounts and credit card related data.
- Don’t log information which can be traced to an individual (remember DPIA?) such as first- and last names of persons, mail address, email addresses, birthdays, usernames, phone-numbers, etc.
- Never log passwords, secrets, authentication tokens and other sensitive information like API keys. Use the methods available in your CI/CD pipelines and runtime systems to hide those from your log files at all times. Don’t make it too easy for a hacker to capture these and compromise your system.
Log levels matter
Every DevOps teams should define the proper log level for their application based on the environment in which it is deployed. From the bottom up, log levels vary between TRACE (low level information like stack-traces) to WARN (red flags but no direct showstoppers) to FATAL (the application stops working and a disaster is likely to happen).
Production environments should limit logging up to the INFO level and nothing more. Developers who are debugging their application in the development environment are helped with the TRACE level that reveals every log message from this level.
Basically everyone is free to choose which log level applies to which (type of) event. It’s best to make agreements within the DevOps team to follow the same approach. Even better: create standards and guidelines per technology stack or even at the organizational level to be consistent everywhere. It really helps to find out the severity of incidents in case multiple different teams investigate an issue.
Log messages should be unique
It’s important to construct unique log messages across the entire system. If you do not do so, you don’t know at which component you are looking at. This really helps to trace issues in case your log files grow and if these log files collect messages from different locations in the system. Pay special care to parent-child relationships in source code such as inheritance or implementation classes in your Java applications. Find a way to denote the name of the parent in case you have multiple classes that share the same name.
In addition to this, make sure the source of the log (called origin) is very clear. Most logging frameworks can provide the most detailed levels such as the called class, function or method. A bit less detailed but very precise: the filename including it’s location and the line-number of the log message.
All of this helps to find the unique log message to solve issues quick.
Add a traceable ID
Since a lot of teams use distributed systems such as application components running in containers, isolated database services and authentication / authorization services it’s important to keep an eye on where your message originates from and where it’s been forwarded to. Assign every event a unique ID from the the start of the request (transaction) and append it to every preceding component it passes. This way, you can trace it from the very beginning until it reaches the end of the system.
Without it, you can’t correlate different events that belong together and collectively provide the right context to the actual root cause in case you are troubleshooting a recurring problem. It also makes it much easier to filter huge logs files to zoom into a specific event.
Add the right context
Context and clarity is everything if you or your colleague want to interpret a log message.
Log messages that contain failures of the application should include background information that tell the log analyst what happened. For example, if a request failed due to an incorrect header, it should also log the actual request body and header itself. This helps to trace back what happened.
Another example would be to log a proper description next to the system level event that provides more information about the issue that happened. For example: Successfully created a new report in the database (ReportID: XYZ). By also logging the ReportID, the person who checks the log can immediately jump to the right record in case he/she wants to investigate it.
Organize your log files
Your physical log files form the heart of every logging solution. It’s vital to store logs in a structured format that can be parsed by other systems. Logs should be split into multiple files (log rotation) where every X moment in time, a new file is generated automatically. For example: use a single file per day, also for the weekends. Be sure to store them in the correct folders (f.e. per year, per month) to find them easily. It’s also important to name them consistently. Sorting log files based on filename works best in a format like YYYY-MM-DD-* and not the other way around in non English languages.
Speaking of that, log files in the English language only so everyone would understand and every system can interpret it. I don’t know of any systems which can read Japanese based log files. Date/time formats should also include the right timezone to prevent any problems in case a server is hosted elsewhere.
Typical challenges in the enterprise
Large organizations deal with a lot of systems, applications and countless ways to implement business features. Logging is no difference. Typical challenges you see around this topic include the following:
- All applications use different logging frameworks, even applications that share the same programming language(s).
- DevOps teams choose their own (centralized) logging solution or log driver which captures their logs and forwards those to the end-solution. Try to use a centralizes solution that works for all.
- Some applications log too much (sensitive) details, while others log too little or nothing at all.
- Every team uses a different logging format and structure. It’s also very difficult to align on these levels.
- Logs produced by team X can’t be understood by team Y. It helps to follow the above-mentioned tips to make sure everyone that deals with logs understands what’s going on.
- It’s rather easy to track applications that do not have any logging method enabled in the cloud, but it’s nearly impossible to enforce certain standards.
Logging is a key aspect of every application development team that operates in the cloud. Since logging helps to trace problems and other issues while the application is developed or already running, it’s important to pay special attention to the way you utilize this discipline. I hope this article helped to shed some light of what is important about logging and that you can apply the practical tips in your organization.