As applications in the cloud become more distributed and complex, the Mean Time To Resolution (MTTR) for production issues is getting longer.
Modern systems are built with hundreds of distinct, ephemeral, and interconnected cloud components, which can make it exceptionally hard for engineers to understand the current state of their applications, what problems are impacting customers, and why those problems are occurring. As the business tolerance for slow loading times, errors, and downtime decreases, engineers are thrust into incredibly stressful situations when investigating production incidents.
Observability is meant to help engineers quickly answer questions about the current state of their systems to troubleshoot issues faster – such as, “what is causing this new latency in my service?” Or, “why did check outs suddenly decrease?”
However, observability can be complex itself, as well as extremely expensive. Onboarding application observability requires instrumenting your services to emit telemetry data, configuring a back-end to store and process this data, building application monitoring dashboards, and other steps. When it’s all up and running, it can require huge costs from vendors like Datadog or New Relic.
Logz.io App 360 provides a simple (but not simplistic) and cost-effective alternative to these vendors as part of the Logz.io Open 360™ platform.
In a matter of minutes, our customers can achieve full application observability so they can answer difficult questions about the current state of their environments. And it’s all at a cost far lower than other vendors due to unique data optimization capabilities.
Collecting application telemetry data for observability
The first step to application observability with Logz.io is to instrument your applications and begin collecting logs, metrics, and traces from your services.
To do this, simply go to Logz.io’s ‘Send your Data’ page and hit the ‘Telemetry Collector’ option. The Telemetry Collector is Logz.io’s agent that collects logs, metrics, and traces in a single deployment, which is all based on OpenTelemetry. It automates the normally-complex process of implementing OpenTelemetry.
Now, let’s select a platform. For App 360, we’ll need to be running Kubernetes – in this case, we’ll choose the ‘EKS’ option.
This automatically generates a script for us to run on our clusters, which is easiest to do via Helm, so don’t change the ‘Where are you running this script from?” option.
Let’s copy the snippet and deploy the agent in our terminal, which will install in a few minutes.
At this point, our new agent will begin collecting and streaming AWS infrastructure data to Logz.io, which we can view in Kubernetes 360 – an out-of-the-box view of infrastructure performance across our clusters. However, this blog is on application observability, so we have one more step to collect our application data.
To do this, we can go to the bottom of the Telemetry Collector page and open the Easy Connect drop down.
After running this script in our cluster, Easy Connect will automatically discover all of our services and provide the option to instrument them in a single click!
After instrumenting our services, our newly-installed agent will collect this data and send it to Logz.io. From there, Logz.io’s SaaS platform will automatically process and store this data for analysis, which brings us to the next phase of App 360 onboarding.
Analyzing the data to achieve application observability
As my colleague Dotan Horovits mentions in his blog “Observability is a Data Analytics Problem,” observability is not as easy as collecting logs, metrics, and traces. To better understand why our system is behaving the way that it is, we need the right analytics to make sense of the data.
Finding the relevant data to help us achieve application observability can be difficult, which is why Logz.io’s App 360 aims to automatically surface the critical data needed to monitor and troubleshoot our applications.
By selecting App 360 on the right menu, I can immediately see all of my services and high-level performance metrics for each one – providing a bird’s eye view of my system health.
To visualize the relationships between these services, I can hit the ‘Map’ option in the top right corner, which shows me how each service communicates with each other. This kind of context is critical when trying to understand dependencies and interactions between microservices during incident investigations.
On the left, we have the option to highlight specific services by latency, error rates, or request rate. Let’s view our services by errors, which shows that our front end service has the highest error rate.
By clicking on a service, we can pull up additional performance metrics and dive deeper into the telemetry data for this service if we want to investigate the cause of these errors. From here, we can see:
- The request rate, latency, and error rate for each operation executed by the service
- The infrastructure metrics for the service, like CPU and memory
- A fully-functional log search interface for this specific service
Without any configuration for data visualization, we can see the four golden signals for service reliability in a single view.
All of this data is fully correlated, so we’re seeing the logs, metrics, and traces from the same service, generated within the same timeframe defined at the top of the page.
We can also enable the Deployments feature, which tracks and overlays new deployments against our telemetry data visualizations – making it fast and easy to correlate changes in production with the health of our services.
To continue our investigation and drill into code-level details, we can select one of the operations. Let’s choose the ‘HTTP GET’ operation. This brings us to the specific trace that maps the flow of the application request – making it exceptionally easy to pinpoint the source of latency within complex microservices architectures.
This brief example showed us how App 360 simplifies application observability while accelerating troubleshooting. In summary, App 360 aims to:
- Reduce MTTR when debugging microservices: In this example, we started with a very high level view of our application observability and performance, and drilled into the details quickly to investigate the issue.
- Achieve application observability in minutes: Rather than building observability dashboards from scratch, App 360 automatically highlights the critical application data out-of-the-box – so we could begin investigating immediately.
If you’d like to give it a try yourself, try our free trial or request a demo to get started!
Leave a Reply