Many developers don’t know what instrumentation really is, and those who do don’t really understand the black magic that takes an application and makes it emit telemetry, especially when automatic instrumentation is involved.
On top of that, each programming language has its own tricks. I wanted to unwrap this loaded topic on my podcast, OpenObservability Talks. For this topic I invited Eden Federman, CTO of Keyval, a company focused on making observability simpler. Eden is the creator of two open source projects: Odigos and Go automatic instrumentation, now part of OpenTelemetry, so he knows the subject of instrumentation inside and out.
What is Instrumentation?
Eden explained that instrumentation is “the process of making applications report back to us.” It includes the observability signals of traces, metrics and logs.
“You need to report all those different signals somehow,” Eden said. “So, instrumentation is the process of changing your application to report all those signals back. There are mainly two kinds of instrumentation. You can do it either manually or automatically, and there are pros and cons to both.”
Engineers might view these as alien concepts they’ve never encountered, but I usually tell them that what they’ve always done with logging is actually instrumentation, without calling it that. When you add a “printf” or a similar line of code to report the state of your application to a file or to standard output, that’s manual instrumentation. If you use logging libraries and utilities, that’s an example of using an SDK for instrumentation.
If you are dependent on the specific library or SDK that you use, and you use the proprietary API and proprietary data models, it can be a pain if you want to switch. One of the success stories early on with distributed tracing was when it came out with OpenTracing as a standard API. No matter what actual tracer you used to instrument your application, which SDK or client library, the API was the same as the standardized OpenTracing API. Now OpenTracing is merged into OpenTelemetry.
The Instrumentation Challenge of Distributed Tracing
Logging is a simple case of instrumentation, but distributed tracing is a more challenging one.
Distributed tracing has intrinsic data called context that needs to be propagated along the request invocation flow, which is typically made up of many interacting microservices. The context contains the Trace ID as well as other global identifiers and additional metadata that represents the unique request. Similar to the logging case, in distributed tracing the instrumentation outputs a “span,” which is a form of a structured log that reports a single operation within the request execution flow. However, unlike logging, in the case of distributed tracing the instrumentation also needs to take care of propagating the trace context across service boundaries and incorporate it within the individual spans, so that later the full trace could be constructed at the backend from the individual spans based on causality.
“Context propagation is the main reason why creating distributed traces is much harder than creating logs or even metrics,” Eden said. In manual instrumentation “you have to manually pass the context object between different libraries that you’re instrumenting. or the different parts of your code, or even across the network if you want to have distributed traces across multiple applications.”
Regardless of signal, the process of manual instrumentation is pretty much the same. If you want to produce logs, metrics, or traces, you always have to bring some new dependency to your code, including integrating SDKs or APIs in relevant places.
“For example, if you want to add a new metric of how much time some function takes, you’d want to record the “start” timestamp and the “end” timestamp at the end of the function,” he added. “So, the process of manual instrumentation is a little bit different between the signals, but the process looks also the same.”
The issue with manual instrumentation comes down to scale. If you have hundreds of microservices, it will take a long time to manually instrument all of it. This includes a lot of repetitive work, especially when dealing with polyglot applications and multiple frameworks for each programming language.
Automatic Instrumentation Challenges
The different programming languages also pose different options and challenges for automatic instrumentation. There are dynamic languages, such as JavaScript, Python and Ruby, and then there are compiled languages, such as Java, Go, and C++, and even among the latter, some compile directly to executables while others compile to a bytecode that then undergoes just-in-time compilation on a runtime environment. As a result, there are different instrumentation options for each language and different mechanisms to achieve it. This often yields varying levels of instrumentation with respect to the data and ease of use.
I asked Eden, what kind of data is even possible to collect in an automated fashion through automated instrumentation?
“Mainly everything that is located in the open source libraries,” he said. “So, libraries that communicate with the database usually collect the queries, and if you’re instrumenting an HTTP library you’ll probably record the method and the specific path and maybe some part of the headers. Luckily for us, I think that OpenTelemetry has a very good specification for that. They have really good guidelines of what spans should look like, which fields it should contain. Which applies both to automatic instrumentation, but also, if you’re manually instrumenting, you probably want to follow those guidelines also.”
Want to learn more? Check out the OpenObservability Talks episode: Where Are My App’s Traces?? Instrumentation in Practice
Leave a Reply