As more DevOps teams rely on an observability platform to continuously collect metrics, traces, and log data, they are discovering that the cost of storing all that high-cardinality data can come at a high price.
Most organizations, as a result, have been limiting the amount of data they collect to minimize storage costs. The paradox, of course, is the reason they invested in an observability platform in the first place was to be able to analyze data they often can’t afford to store for more than a few months.
Providers of platforms for storing metrics, traces and log data such as Mezmo have addressed this issue by enabling DevOps teams to set a daily or monthly hard limit on the volume of logs stored or, alternatively, set soft daily or monthly quotas that can apply throttling logic to ensure mission-critical log data will continue to flow.
DevOps teams can also make use of soft limits to reallocate storage resources from one team to another based on how much log data might be generated for a specific amount of time.
However, providers of observability platforms such as Grafana Labs are now going a step further by making available an Adaptive Metrics capability to its Grafana Cloud platform that enables DevOps teams to fine-tune what data is collected and stored.
The Grafana Cloud employs open source Grafana Mimir software to store data collected in a format originally defined for the open source Prometheus monitoring tool. The Adaptive Metrics aggregation engine transform metrics at the point of ingestion into versions that have much lower levels of cardinality. Unused or partially used labels are stripped from incoming metrics, reducing the total count of time-series data collected. Adaptive Metrics also recommends aggregations based on an organization’s historic usage patterns, and DevOps teams can choose which aggregation rules to apply.
Dashboards, alerts and historic queries are guaranteed to continue to work as they did before aggregation, with no rewrites needed. Based on results reported by early users, Grafana Labs reports that Grafana Cloud Adaptive Metrics could eliminate an estimated 20-50% of the time-series data collected with no perceived impact on observability.
Grafana Labs is also making it easier to further reduce storage costs by making it easier to identify metrics that are being collected but not used by anyone on the DevOps team.
DevOps teams can also when needed also turn back on all data the platform can collect to aid a specific investigation of the root cause of disruption.
As more DevOps teams deploy cloud-native applications they are algo noticing the volume of metrics, traces and log data being created starts to exponentially increase so the more modern the application environment the more likely it becomes storage costs will become an issue.
It’s still early days as far as adoption of observability is concerned, but in uncertain economic times, there’s generally a lot more sensitivity to the total cost of IT. Many organizations are especially focused on reducing the cost of storing data in the cloud to rein in monthly spending on cloud services. Regardless of the motivation, collecting and storing data that isn’t needed is never the best use of cloud storage resources that, from a cost perspective, are just as finite in the cloud.
Less clear is to what degree organizations that embrace observability will need to add a data engineer to their DevOps team, but one way or another someone on the DevOps team will need to pay attention to storage management fundamentals.
In the meantime, the number of cloud-native applications deployed in production environments is only going to increase in the months and years ahead. A recent Splunk report found 58% of respondents expect cloud-native apps will account for a larger percentage of their internally developed applications a year from now. In fact, it’s nearly impossible to manage cloud-native applications without being able to observe interactions between the microservices used to construct them.
Of course, minimizing storage costs is always going to be a higher priority during challenging economic times. Some organizations are even limiting the amount of storage resources made available to individual DevOps teams. After all, the days when everyone assumed cloud storage resources was more or less freely available are now well behind us.