Cost optimization has been one of the hottest topics in observability (and beyond!) lately. Everyone is striving to be efficient, spend money wisely, and get the most out of every dollar invested. At Logz.io, we recently embarked on a very interesting and fruitful data volume optimization journey, reducing our own internal log volume by a whopping 50%. In this article, I’ll tell you how exactly we achieved this result.
Table of Contents
Background
We always strive to use the observability tools we’ve developed ourselves, i.e. we ‘eat our own dog food’ 🐶. Logging, of course, is no exception. All internal system logs end up in one of the Logz.io accounts that we use every day to monitor the health of dozens of microservices and perform all kinds of troubleshooting in complex distributed system environments.
Logging Cost Components
From a cost perspective, logging costs are a linear function of log volume and the hot retention time: O(m*n) =where m is the daily log volume (GB) and n is the number of days we need these logs. The retention time is often determined by business requirements rather than technical ones, so we started by focusing on the log volume.
Step 0: Find Out Where We Are
Before we could tackle log volume optimization, we needed to know the current situation. To do this, answering the following questions was a good starting point:
- What is the current daily log volume?
- What is retention?
- Are there seasonal variations (weekly, monthly, yearly, etc.) or other interesting usage patterns we should be aware of?
For us, the answer to the first question yielded a whopping number between 2.7 TB and 3.7 TB daily in rainy November 2022:
That seemed a bit too much, so we decided to get to the bottom of it.
Step 1: Identify and Deal with Useless Logs (aka Garbage)
Not all logs are of equal value: some are used very rarely, some become completely obsolete and irrelevant over time, while others are used on a daily basis. The Data Optimization Hub was a very handy tool to look through the piles of logs and understand which types of logs were taking up the most space and had little or no value:
We have treated the different categories of logs as follows:
- Infrequently used logs: We set a drop filter so that the logs are not indexed, and just archived in AWS S3/Azure Blob. The logs can be quickly restored using the Power Search feature.
- Obsolete, irrelevant logs: We worked with individual development teams to lower the log level of certain noisy logs or completely remove anything that became irrelevant.
Using a table like the one above, we could easily identify the biggest log consumers (both in terms of total log size and number of logs) per log type.
Step 2: Reduce the Size of Logs We Actually Need
At this point, we were left with the logs we were actively using. Playing around with different visualizations and searching through the Log Size field (you can easily enable it in the account settings) brought us to the realization that some of the logs were much heavier than others. We had logs that were 5 KB each, and those that were over 1 MB. That’s quite a difference! Very heavy logs are usually the sign of the issue and should be investigated.
While checking the heavier types of logs, we found that we generally did not need all of the information: some of it was just repetitive, and some of it just was not that useful in general. We worked with the responsible teams to change the way the heavier logs are generated and that had an extremely positive effect as well.
Step 3: Setup Logs Volume Monitoring
Once all of the above steps were completed, we recommended setting up ongoing log volume monitoring processes to keep the volume under control. For example:
- Alerts over account utilization logs (you can enable these in the account settings if you have not done so already)
- Schedule a recurring review of the data optimization hub. Reach out to logz.io support and your account manager if you would like to work on this with your team
Results
By following these simple steps, we were able to reduce log volume by 50% to an average of 1.5 TB daily:
We were able to significantly reduce the total logs volume and substantially reduce the costs as a result. But the journey never ends and we have to be attentive and monitor the situation on a regular basis to prevent usage spikes.
Learn more about our Data Optimization Hub and how Logz.io can help transform your observability strategy by signing up for a free trial today.
Leave a Reply