• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Home
  • About Us
  • Contact Us

iHash

News and How to's

  • Prodigy Afterschool Masterclasses for Kids for $99

    Prodigy Afterschool Masterclasses for Kids for $99
  • 10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149

    10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149
  • 8" WiFi Cloud Photo Frame for $112

    8" WiFi Cloud Photo Frame for $112
  • 8" WiFi Digital Photo Frame with Auto Rotation & Photo/Video Sharing for $112

    8" WiFi Digital Photo Frame with Auto Rotation & Photo/Video Sharing for $112
  • Wireless Wall Tap Smart Plug for $39

    Wireless Wall Tap Smart Plug for $39
  • News
    • Rumor
    • Design
    • Concept
    • WWDC
    • Security
    • BigData
  • Apps
    • Free Apps
    • OS X
    • iOS
    • iTunes
      • Music
      • Movie
      • Books
  • How to
    • OS X
      • OS X Mavericks
      • OS X Yosemite
      • Where Download OS X 10.9 Mavericks
    • iOS
      • iOS 7
      • iOS 8
      • iPhone Firmware
      • iPad Firmware
      • iPod touch
      • AppleTV Firmware
      • Where Download iOS 7 Beta
      • Jailbreak News
      • iOS 8 Beta/GM Download Links (mega links) and How to Upgrade
      • iPhone Recovery Mode
      • iPhone DFU Mode
      • How to Upgrade iOS 6 to iOS 7
      • How To Downgrade From iOS 7 Beta to iOS 6
    • Other
      • Disable Apple Remote Control
      • Pair Apple Remote Control
      • Unpair Apple Remote Control
  • Special Offers
  • Contact us

Observability Is A Data Analytics Problem

Apr 7, 2022 by iHash Leave a Comment

In this special guest feature, Dotan Horovits, Technology Evangelist at Logz.io, delves into the three pillars of Observability (logs, metrics and traces), the ways in which Observability is defined and how the tech industry should be using it moving forward. Dotan lives at the intersection of technology, product and innovation. With over 20 years in the hi-tech industry as a software developer, a solutions architect and a product manager, he brings a wealth of knowledge in cloud computing, big data solutions, DevOps practices and more. Dotan is an avid advocate of open source software, open standards and communities. He also is an advocate of the Cloud Native Computing Foundation (CNCF), organizes the local CNCF chapter in Tel-Aviv and runs the OpenObservability Talks podcast, among others.

Observability is a hot topic in the IT world these days. It is oftentimes discussed through the lens of the “three pillars of observability”: Logs, Metrics and Traces. These pillars help us understand what happened, where it happened and why it happened in our system:

Metrics help detect the issues and tell what happened: Is the service down? Was the endpoint slow to respond? Metrics are aggregated numerical data that lends itself to spotting abnormal behavior.

Next, Logs help diagnose the issues and tell why they happened. Logs are perfect for that job, as the developer who writes application code outputs all the relevant context for that code into logs. 

Finally, traces help isolate issues and tell where they happened. As a request comes into the system, it flows through a chain of interacting microservices, which we can trace using distributed tracing, to pinpoint the issues. 

Indeed these telemetry signals are very important for gaining observability. However they are, by themselves, not Observability. In fact, many organizations collect all these signals, and still end up with poor observability. Why’s that?

Perhaps the problem starts with the way we define Observability.

So what is Observability about?

The formal definition of Observability, taken from Control Theory, is:

“a measure of how well internal states of a system can be inferred from knowledge of its external outputs.”

This definition may have driven people to put great emphasis on the external outputs, the signals that our systems emit, the raw data. The other important element in that definition was too often overlooked: the inference process.

A more useful definition of observability for a software systems is: 

“The capability to allow a human to ask and answer questions about the system”.

I like this definition for two important reasons: 

First, it makes it clear that Observability is a property of the system. It might sound trivial, but it’s important as it drives a different mindset from traditional monitoring solutions that we used to bolt on in the aftermath. Being a system property means you should incorporate observability as part of the system design, as a first-level citizen, from Day-1. This draws practical implications that I’ll touch upon later. 

It’s about answering ad-hoc questions

The second reason I like the above definition for Observability better is that it makes it clear that observability is essentially a data analysis problem: the more questions we can ask and answer about the system, the more observable it is. This, as well, calls for a different mindset: rather than the sysadmin’s relatively reactive monitoring and maintenance mindset, observability calls for a data analyst mindset, proactively querying to get the right insights from their systems.

And this can’t be limited to a set of predefined questions either. Common monitoring practices, for example, use pre-defined aggregations, which bear hidden assumptions of the questions we’d want to ask. In today’s high cardinality and dynamically changing systems, however, it is unreasonable to assume we can anticipate all the questions, all the permutations of dimensions, and the required aggregations. We don’t encounter the same problems over and over again (we’ve set alerts or auto-remediation for these “known unknowns” anyway). Observability must enable us to ask ad-hoc questions, ones that arise when handling incidents we haven’t anticipated or seen before, the “unknown unknowns”.

So how are we achieving this promise?

It’s about collecting data of different sources, formats and types

To gain observability, we need to collect the different signal types. Logs, metrics and traces are the classical “three pillars”, but we need to make it flexible to incorporate additional signal types as need arises, such as events and continuous profiling. 

Also, observability involves the ingestion of signals from many different data sources, across different tiers, frameworks and programming languages in today’s cloud-native and polyglot organizations. You may need to monitor a NodeJS front-end app, a Java back-end, your SQL and NoSQL databases, a Kafka cluster and a few cloud services (perhaps even in a multi-cloud setup) – and that’s not even considered exaggerated. To top it up, these may also come in different formats, as we’ll see below.

Consistently collecting heterogeneous data across so many different signal types, sources and formats requires careful planning and automation in order to support data analytics flows. 

The industry is heading in this direction: Fluentd unified logs collection tool is expanding into collecting metrics, Telegraf is expanding from metrics to logs and events, Elastic is unifying Filebeat, Metricbeat, Packetbeat and the other Beats collectors of the ELK Stack into one unified Elastic Agent. These and other tools are also constantly expanding their integration with different data sources. OpenTelemetry under the Cloud Native Computing Foundation (CNCF) aims to provide a standard unified framework for collecting data, to converge the industry.  

It’s about enriching and correlating data

Remember we said that observability is a property of the system? It begins with the way we emit our telemetry. For example, forget about unstructured plain text logs. No human is going to read through your mounts of log lines to extract insights, and full-text indexing and search is prohibitively expensive. We’re running data analytics here. Data needs to be structured and in a machine-readable format such as JSON or Protobuf.

In order to support effective data analytics, it is also important to build a concise data model, and adhere to it across the different telemetry sources and formats. If every data source calls the service name label in a different way (“service”, “service_name”, “ServiceID”, “container”), it would be very difficult to correlate across the sources. Open source projects such as OpenMetrics and OpenTelemetry take a central role in standardizing data models and semantic conventions. It’s important to note that integrating with legacy systems may require transformations to align these conventions, in an ETL fashion. 

Data enrichment is also an important step in data analytics. Adding metadata such as the user ID or the build version to your logs, for instance, can greatly help map the log to the root cause (e.g. per specific customer or specific build version). Effective data enrichment can turn your logs into more meaningful events. 

Data enrichment can also support correlation between signals. For example, systematically adding the request trace ID as metadata to all the logs will enable log-trace correlation later. Another example would be to add Exemplars to metrics: Exemplars are metadata that can be attached to the metric, to provide additional context and external references. A common use case of exemplars is to attach the trace ID for easy jump from a metric to a sample trace. The Prometheus community is actively working to formalize exemplars in the context of Prometheus, and a similar effort is taking place in OpenTelemetry.

It’s about unified querying, visualization and alerting

Having the data ingested and stored in a conventional form is a good start. Next, we need a way to easily ask and answer ad-hoc questions about our system – that is, as discussed above, the essence of observability. That requires the ability to query the data from the different sources to draw the relationships. 

Today’s landscape, however, is quite fragmented, which makes unified querying, visualization and other investigation aspects challenging. In today’s world you often encounter specialized query languages for the different signal types and sources. You may use Lucene to query your logs, and PromQL to query your metrics. This, however, makes it difficult to phrase queries across the different signals. 

The other common online investigative way, alongside querying, is visualization through dashboarding. There, too, it’s common to see different specialized tools for visualizing the different data types. You may find yourself using Kibana to visualize your logs and Grafana to visualize your metrics. If you want to correlate across the different signals, however, the multi-tool approach requires you to manually copy your search context over between the tools (things such as the time window under investigation and the filters in use), which can be very inefficient and error prone. 

Taking the data analyst’s approach, we strive for unified querying as well as a unified dashboard to show different signals and slice and dice the telemetry data. Another related aspect for investigation is alerting, which potentially incorporates conditions over multiple different signals. There are multiple attempts in this direction to offer a unified user experience, whether through a single platform or a tightly integrated suite of tools. 

Another important building block, which is gaining a lot of momentum recently, is the ability to run anomaly detection and other AIOps algorithms to automatically detect patterns for issues. Some of these patterns can only be detected when you correlate the different signals. AIOps use case requires an established data model, query and API across the signals.

Summary: It’s about fusing telemetry data to answer questions

The three pillars of observability – metrics, traces and logs – provide the essential signals for understanding what happened, where it happened and why it happened. But it’s important to remember that these signals are, after all, the raw data. 

The goal is to bring together telemetry signals of different types and from different sources into one conceptual data lake or a data mesh, and then ask and answer questions to understand your system. 

We as an industry should join forces in achieving the unified vision. 

While we’re still early on in our journey, looking into the new year and beyond I’m optimistic that we’re moving in the right direction. I’m also confident that our efforts around open standards shall help converge the efforts across the industry towards a unified natural way of asking questions about our system. Who knows, perhaps we’ll wake up one day and be able to simply ask: “Siri what’s up with my system?” 

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Source link

Share this:

  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn

Filed Under: BigData

Special Offers

  • Prodigy Afterschool Masterclasses for Kids for $99

    Prodigy Afterschool Masterclasses for Kids for $99
  • 10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149

    10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149
  • 8" WiFi Cloud Photo Frame for $112

    8" WiFi Cloud Photo Frame for $112
  • 8" WiFi Digital Photo Frame with Auto Rotation & Photo/Video Sharing for $112

    8" WiFi Digital Photo Frame with Auto Rotation & Photo/Video Sharing for $112
  • Wireless Wall Tap Smart Plug for $39

    Wireless Wall Tap Smart Plug for $39

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

E-mail Newsletter

  • Facebook
  • GitHub
  • Instagram
  • Pinterest
  • Twitter
  • YouTube

More to See

insideBIGDATA Latest News – 6/27/2022

Jun 27, 2022 By iHash

Cybersecurity Experts Warn of Emerging Threat of “Black Basta” Ransomware

Jun 27, 2022 By iHash

Tags

* Apple Cisco computer security cyber attacks cyber crime cyber news Cyber Security cybersecurity cyber security news cyber security news today cyber security updates cyber threats cyber updates data breach data breaches google hacker hacker news Hackers hacking hacking news how to hack incident response information security iOS iOS 7 iOS 8 iPhone iPhone 6 Malware microsoft network security Privacy ransomware malware risk management security security breaches security vulnerabilities software vulnerability the hacker news Threat update video web applications

Latest

Prodigy Afterschool Masterclasses for Kids for $99

Expires June 28, 2122 23:59 PST Buy now and get 85% off KEY FEATURES Unlock Your Child’s Potential For Success! No dream is too big when you have the tools to achieve it. Whether your child dreams of saving lives as a doctor or inspiring people through the arts, Prodigy will give them the tools […]

10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149

Expires June 25, 2122 23:59 PST Buy now and get 6% off KEY FEATURES Send Pictures and Videos from your smartphone to eco4life WiFi Digital Photo Frame, from anywhere in the world using the eco4life App. The eco4life smart frame is simply the best way to enjoy your favorite photos and videos with your families […]

Charlie Klein

Key-Thoughts on Cross-Organizational Observability Strategy

Logz.io ran two surveys earlier this year to better understand current trends, challenges, and strategies for implementing more effective and efficient observability – including the DevOps Pulse Survey and a survey we ran with Forrester Research. Together, we received responses from 1300+ DevOps and IT Ops practitioners on observability challenges, opportunities, and ownership strategies. Additionally, […]

Wi-Fi 1080p Indoor 360° View PTZ IP Camera for $57

Expires June 25, 2122 23:59 PST Buy now and get 17% off KEY FEATURES Experience the flexibility and power of 7/24 all-day recording with this 360° PTZ IP Camera. It shows you live videos on your phone in 1920×1080 full HD resolution, day or night. It’s also packed with two-way audio, advanced night vision, and […]

Survey Results Identifying the Benefits and Challenges of RPA

Robocorp, a top provider of Gen2 robotic process automation (RPA), announced the results of their State of RPA survey, which was designed to understand the challenges users face with current RPA solutions. The results will help usher in the next generation of enterprise automation – Gen2 RPA. Conducted online in May 2022, The State of […]

How is IoT Changing the Future of Cruising?

In this special guest feature, Ian Richardson, CEO & Co-Founder, theICEway, discusses how as the world continues to open for travel, cruise industry leaders are looking to leverage the next wave of travel technology to improve the passenger experience. With 20+ years of experience in both IT and the cruise industry, Ian Richardson co-founded theICEway […]

Jailbreak

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

  Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate.   Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

  Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Copyright iHash.eu © 2022
We use cookies on this website. By using this site, you agree that we may store and access cookies on your device. Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT