Observability is a hot topic in the IT world these days. It is oftentimes discussed through the lens of the “three pillars of observability”: Logs, Metrics and Traces. Indeed these telemetry signal types help us understand what happened, where it happened and why it happened in our system.
Observability ≠ logs + metrics + traces
However logs, metrics and traces are, by themselves, not Observability. In fact, many organizations collect logs, metrics and traces, and still end up with poor observability.
We need to change our mindset.
Firstly, there’s no reason to limit ourselves to just logs, metrics and traces. We humans tend to favor the Rule of Three, but other signal types may be required to gain better observability, such as events and continuous profiling.
But more importantly, we need to remember that these signals are, after all, the raw data. And we’re looking for the insights and the root cause analysis, to understand our system. I favor the following definition for observability, which makes it clear:
Observability is the capability to allow a human to ask and answer questions about the system.
With this definition, it becomes evident: Observability is essentially a data analysis problem: the more questions we can ask and answer about the system, the more observable it is.
So what is Observability all about then?
- It’s about answering ad-hoc questions
- It’s about collecting data of different sources, formats and types
- It’s about enriching and correlating data
- It’s about unified querying, visualization and alerting
- It’s about fusing telemetry data to answer questions
Want to read more? Check out my article on InsideBIGDATA, where I elaborate on each one of these elements.
And after you’ve read it, do share your thoughts and comments here. I’d love for it to be a starting point for a community discussion on where we want to take observability next.
Leave a Reply