In the labyrinth of IT systems, logging is a fundamental beacon guiding operational stability, troubleshooting, and security. In this quest, however, organizations often find themselves inundated with a deluge of logs. Each action, every transaction, and the minutiae of system behavior generate a trail of invaluable data—verbose, intricate, and at times, overwhelming.
The sheer size and verbosity of these logs strain storage infrastructure, inflate cloud bills, and challenge the efficacy of data management strategies. In the current financial climate, pressure increases to reduce observability costs, while keeping good observability.
In this article, I will dissect the challenges posed by logging volumes and explore strategies for effective log management. From distinguishing and prioritizing essential logs to strategic storage allocation and defining optimal retention policies, I will outline pragmatic steps to reconcile the necessity of data abundance with the imperative of cost efficiency.
Table of Contents
Prioritize Valuable Logs
Start by identifying essential logs. Looking at a common example, an application typically generates logs in multiple levels — debug, info, warning, and error logs. During real-time troubleshooting, the team primarily relies on error and warning logs to pinpoint issues.
In contrast, verbose debug logs, although detailed, are rarely used for immediate problem-solving in production. This is a classic example of a low value log, which is also typically emitted in high velocity.
In these cases, it’s best to just avoid emitting them, for example by configuring the logger or log appender’s threshold level to Info or higher. This configuration is supported by different logging libraries, SDKs and components such as a syslog server.
In cases where this is not an option, such as a third party or legacy code, you can filter out and drop these logs in intermediary log collectors or at your backend log analytics side. These filtering capabilities can be found in many logging frameworks.
Filtering out by log level is one example, but mature frameworks will support conditions on any field in your logs, as well as composite expressions combining multiple such conditions and fields. With these expressions you can capture various logs that you identify as low value in your system, like low-importance housekeeping utilities.
Remember, the closer to the source you filter your logs, the less processing and storage resources your system must spend on these low value logs.
Troubleshooting vs. Compliance Logs
When analyzing the value of your logs, also ask yourself why the log is valuable and why you wish to keep it. You may find that some logs aren’t used for everyday troubleshooting, yet are required for compliance and audit purposes, and cannot be filtered out.
Reserve your primary observability stores for logs crucial to everyday troubleshooting. Rarely accessed logs, which are required for compliance or occasional audits, can be archived in cost-effective storage options like AWS S3 or similar inexpensive object storage solutions. This tiered data architecture is sometimes referred to as hot-cold storage, where the primary in-memory store is the “hot” one and the cheaper object storage is the “cold.”
In case of audit or other need, these logs can be restored into primary storage with full indexing for searching, and may even support partial restore to restore only the data you need, based on expressions (similar to the above filters). Some solutions support certain search capabilities directly on the cold storage, like analyzing data stored in S3 with AWS Athena service.
Roll Up Logs Into Metrics
In other cases, you may find that logs are indeed valuable for your real-time monitoring and troubleshooting. But how are you using them? In many cases, you find that logs are used solely for metric observations, such as HTTP status code frequencies.
In such cases, these logs can be rolled up into actual metrics, namely to numerical data points comprising a time-series data. Storing time series data is far more compact than storing the raw textual verbose logs, and can dramatically reduce storage overhead on this data set.
Moreover, it is far more flexible and convenient to query and visualize metrics as time series data, and perform elaborate trend analysis with the full power of designated query languages and tools such as PromQL, Prometheus and Grafana.
Optimal Retention Periods
Let’s say you find that certain logs are valuable as raw logs for troubleshooting. In this case, ask yourself how long you genuinely need to retain them for. This is known as the retention period of the logs. Often, operational logs queried for problem-solving are only a few days old.
What happens beyond that period? If they offer no further value, just purge them. If you are uncertain about potential future needs, you can instead periodically archive these older logs through a process like “log rotation.” Log management platforms may offer these mechanisms out of the box, for configuration as part of the retention policy.
Determine the Best Logging Path For You
Effectively managing logging data involves a combination of selective storage, conversion of log data into actionable metrics, and defining appropriate retention periods. By trimming unnecessary logs, strategically allocating storage, and purging or archiving old logs, businesses can optimize their logging practices, reducing costs and streamlining operational efficiency, while keeping effective observability.
Remember, the key is not just collecting logs but smartly managing them to derive maximum value without incurring unnecessary expenses. Logz.io offers this flexibility as part of our Open 360™ observability platform. Sign up today for a free trial to see how it works.
Leave a Reply