The importance of runtime fields and schema on write or read for better analytics

Jan 14, 2023 by iHash Leave a Comment

In an earlier blog post, Log monitoring and unstructured log data, moving beyond tail -f, we talked about collecting and working with unstructured log data. We learned that it’s very easy to add data to the Elastic Stack. So far the only parsing we did was to extract the timestamp from this data, so older data gets backfilled correctly.

We also talked about searching this unstructured data toward the end of the blog. While unstructured data can be incredibly useful when combined with full text search functionality, there are cases where we need a little more structure to use the data to answer our questions.

Schema on write or schema on read — why not both?

Schema on write remains the default option that Elasticsearch uses to handle incoming data. All fields in a document are indexed as it’s ingested, otherwise known as schema on write. This is what makes running searches in Elastic so fast, regardless of the volume of data returned or the number of queries executed. It’s also a big part of what our users love about Elastic.

Schema on write works really well if you know your data and how it’s structured before ingest. That way, the schema (logical view of data structure) can be fully defined in the index mapping. It also requires sticking to that defined schema when queries are run against the index. In the real world, however, monitoring and telemetry data can often change. New data sources may appear in your environment, for example. An added layer of flexibility to dynamically extract or query new fields after the data has been indexed adds tremendous value, even if it comes at a slight cost to performance.

That’s where schema on read comes in. Data can be quickly ingested in raw form without any indexing, except for certain necessary fields such as timestamp or response codes. Other fields can be created on the fly when queries are run against the data. You don’t need to have intimate knowledge of your data ahead of time, nor do you have to predict all the possible ways that the data may eventually be queried. You can change the data structure at any time, even after the documents have been indexed — a huge benefit of schema on read.

Here’s what’s unique about how Elastic has implemented schema on read. We’ve built runtime fields on the same Elastic platform — the same architecture, the same tools, and the same interfaces you’re already using. There are no new datastores, languages, or components, and there’s no additional procedural overhead. Schema on read and schema on write work well together and seamlessly complement each other, so that you can decide which fields to calculate when a query requires them and which fields to index when your data is ingested into Elasticsearch.

By offering you the best of both worlds on a single stack, we make it easy for you to decide which combination of schema on write and schema on read works best for your specific use cases.

Using runtime fields on the Elastic Stack

Let us start with a quick example.

Using unstructured data we can easily answer questions like “How many errors did we have in the last 15 minutes?” or “When did we last have error X?” But if we want to ask questions like “What’s the sum of number X that appears in our logs?” or “What are our top 5 errors?”, then we need to extract the relevant information first in order to aggregate.

If you’ve followed along with our last blog, our data in the cluster now looks like this:

Source link

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Gulf Bank Saves Time and Money with CrowdStrike

The Middle East’s financial sector, particularly in Kuwait, faced a surge of sophisticated cyberattacks starting in 2018. For Gulf Bank, a leading financial institution in Kuwait, this wave of attacks was a wakeup call that spurred a pivotal shift in its cybersecurity strategy. Enter Ross McNaughton. Hired by Gulf Bank as CISO in 2019, […]

MITRE Center for Threat-Informed Defense Secure AI Partnership

The goal of the Secure AI project is to fortify the security of AI-enabled systems and address the unique vulnerabilities and novel adversary attacks they face Its results were used to expand MITRE ATLAS®, a comprehensive knowledge base of adversary tactics and techniques targeting AI systems As a cybersecurity industry leader and a Center for […]

APT-K-47 Uses Hajj-Themed Lures to Deliver Advanced Asyncshell Malware

Nov 22, 2024Ravie LakshmananCyber Attack / Malware The threat actor known as Mysterious Elephant has been observed using an advanced version of malware called Asynshell. The attack campaign is said to have used Hajj-themed lures to trick victims into executing a malicious payload under the guise of a Microsoft Compiled HTML Help (CHM) file, the […]

Strengthen SMB Security with Seamless Mobile Protection

Small and medium-sized businesses (SMBs) face many of the same cybersecurity threats as large enterprises but often lack the resources to maintain robust security across all devices. As SMBs rely on a growing number of smartphones and tablets, they must defend against a range of mobile-focused cyberattacks. The need for comprehensive security has never […]

Over 2,000 Palo Alto Networks Devices Hacked in Ongoing Attack Campaign

Nov 21, 2024Ravie LakshmananVulnerability / Cyber Attack As many as 2,000 Palo Alto Networks devices are estimated to have been compromised as part of a campaign abusing the newly disclosed security flaws that have come under active exploitation in the wild. According to statistics shared by the Shadowserver Foundation, a majority of the infections have […]

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

The demand for streamlined equipment operations is more apparent as industrial processes have become more complex and diversified. Data analysis and AI are transforming how all industries operate, promoting productivity and enhancing performance. These technological advances automate operations and allow businesses to allocate more resources to service improvement, customer service and innovations. Real-time data analysis […]

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate. Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

JBL Flip 6 Portable Bluetooth Speaker (Open Box) for $74

Navee V25 300W Foldable e-Scooter for $299

Smart Tracker Includes Key Ring – Works with Apple Find My App (2-Pack) for $34

Harmony Premium Plan Lifetime Subscription for $99

Lenovo 11.6" 100e Chromebook 2nd Gen (2019) MediaTek MT8173C 4GB RAM 16GB eMMC (Refurbished) for $54

The importance of runtime fields and schema on write or read for better analytics

Schema on write or schema on read — why not both?

Using runtime fields on the Elastic Stack

JBL Flip 6 Portable Bluetooth Speaker (Open Box) for $74

Navee V25 300W Foldable e-Scooter for $299

Smart Tracker Includes Key Ring – Works with Apple Find My App (2-Pack) for $34

Harmony Premium Plan Lifetime Subscription for $99

Lenovo 11.6" 100e Chromebook 2nd Gen (2019) MediaTek MT8173C 4GB RAM 16GB eMMC (Refurbished) for $54

Schema on write or schema on read — why not both?

Using runtime fields on the Elastic Stack

Share this:

Reader Interactions

Leave a ReplyCancel reply