• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Home
  • About Us
  • Contact Us

iHash

News and How to's

  • OTTERBOX DEFENDER SERIES SCREENLESS EDITION Case for iPhone 13 Pro (ONLY) – HUNTER GREEN for $29

    OTTERBOX DEFENDER SERIES SCREENLESS EDITION Case for iPhone 13 Pro (ONLY) – HUNTER GREEN for $29
  • DUBLIN 1L Stainless Steel French Press for $63

    DUBLIN 1L Stainless Steel French Press for $63
  • Prodigy Afterschool Masterclasses for Kids for $99

    Prodigy Afterschool Masterclasses for Kids for $99
  • 10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149

    10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149
  • 8" WiFi Cloud Photo Frame for $112

    8" WiFi Cloud Photo Frame for $112
  • News
    • Rumor
    • Design
    • Concept
    • WWDC
    • Security
    • BigData
  • Apps
    • Free Apps
    • OS X
    • iOS
    • iTunes
      • Music
      • Movie
      • Books
  • How to
    • OS X
      • OS X Mavericks
      • OS X Yosemite
      • Where Download OS X 10.9 Mavericks
    • iOS
      • iOS 7
      • iOS 8
      • iPhone Firmware
      • iPad Firmware
      • iPod touch
      • AppleTV Firmware
      • Where Download iOS 7 Beta
      • Jailbreak News
      • iOS 8 Beta/GM Download Links (mega links) and How to Upgrade
      • iPhone Recovery Mode
      • iPhone DFU Mode
      • How to Upgrade iOS 6 to iOS 7
      • How To Downgrade From iOS 7 Beta to iOS 6
    • Other
      • Disable Apple Remote Control
      • Pair Apple Remote Control
      • Unpair Apple Remote Control
  • Special Offers
  • Contact us

Too many fields! 3 ways to prevent mapping explosion in Elasticsearch

Jun 4, 2022 by iHash Leave a Comment


Table of Contents

  • Too many fields! 3 ways to prevent mapping explosion in Elasticsearch
  • Putting Elasticsearch to work for your data
  • Strategy #1: Being strict
  • Strategy #2: Not too strict
  • Strategy #3: Runtime Fields
    • Note on using Kibana and Runtime Fields
  • Choosing the best strategy

Too many fields! 3 ways to prevent mapping explosion in Elasticsearch

blog-thumb-website-search.png

A system is said to be “observable” when it has three things: logs, metrics, and traces. While metrics and traces have predictable structures, logs (especially application logs) are usually unstructured data that need to be collected and parsed to be really useful. Therefore, getting your logs under control is arguably the hardest part of achieving Observability. 

In this article, we’ll dive into three effective strategies developers can use to manage logs with Elasticsearch.

[Related article: Leveraging Elastic to improve data management and observability in the cloud]

Putting Elasticsearch to work for your data

Sometimes we don’t have control over what types of logs we receive in our cluster. Think of a log analytics provider that has a specific budget for storing its customers’ logs and needs to keep storage at bay (Elastic deals with many similar cases in Consulting). 

More often than not, we have customers indexing fields “just in case” they need to be used for search. If that is the case for you, then the following techniques should prove to be valuable in helping you cut costs and focus your cluster performance on what really matters.

Let’s first outline the problem. Consider the following JSON document with three fields: message, transaction.user, transaction.amount:

{
 "message": "2023-06-01T01:02:03.000Z|TT|Bob|3.14|hello",
 "transaction": {
   "user": "bob",
   "amount": 3.14
 }
}

The mapping for an index that will hold documents like these could be something like the following:

PUT dynamic-mapping-test
{
 "mappings": {
   "properties": {
     "message": {
       "type": "text"
     },
     "transaction": {
       "properties": {
         "user": {
           "type": "keyword"
         },
         "amount": {
           "type": "long"
         }
       }
     }
   }
 }
}
Read more

However, Elasticsearch allows us to index new fields without having to necessarily specify a mapping beforehand, and that’s part of what makes Elasticsearch so easy to use: we can onboard new data easily. So it’s okay to index something that deviates from the original mapping, like:

POST dynamic-mapping-test/_doc
{
 "message": "hello",
 "transaction": {
   "user": "hey",
   "amount": 3.14,
   "field3": "hey there, new field with arbitrary data"
 }
}

A GET dynamic-mapping-test/_mapping will show us the resulting new mapping for the index. It now has transaction.field3 as both text and keyword — actually two new fields.

{
 "dynamic-mapping-test" : {
   "mappings" : {
     "properties" : {
       "transaction" : {
         "properties" : {
           "user" : {
             "type" : "keyword"
           },
           "amount" : {
             "type" : "long"
           },
           "field3" : {
             "type" : "text",
             "fields" : {
               "keyword" : {
                 "type" : "keyword",
                 "ignore_above" : 256
               }
             }
           }
         }
       },
       "message" : {
         "type" : "text"
       }
     }
   }
 }
}
Read more

Great, but that is now part of the problem: when we don’t have any control over what is being sent to Elasticsearch, we can easily face the problem called mapping explosion. Nothing prevents you from creating sub fields and sub sub fields, which will have the same two types text and keyword, like:

POST dynamic-mapping-test/_doc
{
 "message": "hello",
 "transaction": {
   "user": "hey",
   "amount": 3.14,
   "field3": "hey there, new field",
   "field4": {
     "sub_user": "a sub field",
     "sub_amount": "another sub field",
     "sub_field3": "yet another subfield",
     "sub_field4": "yet another subfield",
     "sub_field5": "yet another subfield",
     "sub_field6": "yet another subfield",
     "sub_field7": "yet another subfield",
     "sub_field8": "yet another subfield",
     "sub_field9": "yet another subfield"
   }
 }
}Read more

We would be wasting RAM and disk space to store those fields, as data structures will be created to make them searchable and aggregatable. It might be the case that those fields are never used — they are there “just in case” they need to be used for search. 

One of the first steps we take in consulting when asked to optimize an index is to inspect the usage of every field in an index to see which ones are really searched and which ones are just wasting resources.

Strategy #1: Being strict

If we want to have complete control over the structure of the logs we store in Elasticsearch and how we store them, we can set a clear mapping definition so anything that deviates from what we want is simply not stored. 

By using dynamic: strict at either top-level or in some sub-field, we reject documents that don’t match what is in our mappings definition, forcing the sender to comply with the pre-defined mapping:

PUT dynamic-mapping-test
{
 "mappings": {
   "dynamic": "strict",
   "properties": {
     "message": {
       "type": "text"
     },
     "transaction": {
       "properties": {
         "user": {
           "type": "keyword"
         },
         "amount": {
           "type": "long"
         }
       }
     }
   }
 }
}Read more

Then when we try to index our document with an extra field…

POST dynamic-mapping-test/_doc
{
 "message": "hello",
 "transaction": {
   "user": "hey",
   "amount": 3.14,
   "field3": "hey there, new field"
   }
 }
}

… the response we get is this:

{
 "error" : {
   "root_cause" : [
     {
       "type" : "strict_dynamic_mapping_exception",
       "reason" : "mapping set to strict, dynamic introduction of [field3] within [transaction] is not allowed"
     }
   ],
   "type" : "strict_dynamic_mapping_exception",
   "reason" : "mapping set to strict, dynamic introduction of [field3] within [transaction] is not allowed"
 },
 "status" : 400
}Read more

If you are absolutely sure you just want to store what is in the mappings, this strategy forces the sender to comply with the pre-defined mapping.

Strategy #2: Not too strict

We can be a little bit more flexible and let documents pass, even if they are not exactly how we expect, by using “dynamic”: “false”.

PUT dynamic-mapping-disabled
{
 "mappings": {
   "dynamic": "false",
   "properties": {
     "message": {
       "type": "text"
     },
     "transaction": {
       "properties": {
         "user": {
           "type": "keyword"
         },
         "amount": {
           "type": "long"
         }
       }
     }
   }
 }
}Read more

When using this strategy, we accept all documents that come our way but only index the fields that are specified in the mapping, making the extra fields simply not searchable. In other words we are not wasting RAM on the new fields, only disk space. The fields can still be visible in the hits of a search, and that includes a top_hits aggregation. However, we can’t search or aggregate on them, as no data structures are created to hold their content.

It doesn’t need to be all or nothing — you can even have the root to be strict and a sub-field to accept new fields without indexing them. Our Setting dynamic on inner objects documentation covers it pretty well.

PUT dynamic-mapping-disabled
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "message": {
        "type": "text"
      },
      "transaction": {
        "dynamic": "false",
        "properties": {
          "user": {
            "type": "keyword"
          },
          "amount": {
            "type": "long"
          }
        }
      }
    }
  }
}Read more

Strategy #3: Runtime Fields

Elasticsearch supports both schema on read and schema on write, each with its caveats. With dynamic:runtime, the new fields will be added to the mapping as Runtime Fields. We index the fields that are specified in the mapping and make the extra fields searchable/aggregatable only at query time. In other words, we don’t waste RAM up front on the new fields, but we pay the price of a slower query response, as the data structures will be built at run time.

PUT dynamic-mapping-runtime
{
 "mappings": {
   "dynamic": "runtime",
   "properties": {
     "message": {
       "type": "text"
     },
     "transaction": {
       "properties": {
         "user": {
           "type": "keyword"
         },
         "amount": {
           "type": "long"
         }
       }
     }
   }
 }
}Read more

Let’s index our large document:

POST dynamic-mapping-runtime/_doc
{
 "message": "hello",
 "transaction": {
   "user": "hey",
   "amount": 3.14,
   "field3": "hey there, new field",
   "field4": {
     "sub_user": "a sub field",
     "sub_amount": "another sub field",
     "sub_field3": "yet another subfield",
     "sub_field4": "yet another subfield",
     "sub_field5": "yet another subfield",
     "sub_field6": "yet another subfield",
     "sub_field7": "yet another subfield",
     "sub_field8": "yet another subfield",
     "sub_field9": "yet another subfield"
   }
 }
}Read more

A GET dynamic-mapping-runtime/_mapping will show that our mapping is changed upon indexing our large document:

{
 "dynamic-mapping-runtime" : {
   "mappings" : {
     "dynamic" : "runtime",
     "runtime" : {
       "transaction.field3" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_amount" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_field3" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_field4" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_field5" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_field6" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_field7" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_field8" : {
         "type" : "keyword"
       },
       "transaction.field4.sub_field9" : {
         "type" : "keyword"
       }
     },
     "properties" : {
       "transaction" : {
         "properties" : {
           "user" : {
             "type" : "keyword"
           },
           "amount" : {
             "type" : "long"
           }
         }
       },
       "message" : {
         "type" : "text"
       }
     }
   }
 }
}Read more

The new fields are now searchable like a normal keyword field. Note the data type is guessed upon indexing the first document, but this can also be controlled using dynamic templates.

GET dynamic-mapping-runtime/_search
{
 "query": {
   "wildcard": {
     "transaction.field4.sub_field6": "yet*"
   }
 }
}
{
…
 "hits" : {
   "total" : {
     "value" : 1,
     "relation" : "eq"
   },
   "hits" : [
     {
       "_source" : {
         "message" : "hello",
         "transaction" : {
           "user" : "hey",
           "amount" : 3.14,
           "field3" : "hey there, new field",
           "field4" : {
             "sub_user" : "a sub field",
             "sub_amount" : "another sub field",
             "sub_field3" : "yet another subfield",
             "sub_field4" : "yet another subfield",
             "sub_field5" : "yet another subfield",
             "sub_field6" : "yet another subfield",
             "sub_field7" : "yet another subfield",
             "sub_field8" : "yet another subfield",
             "sub_field9" : "yet another subfield"
           }
         }
       }
     }
   ]
 }
}Read more

Great! It’s easy to see how this strategy could be useful when you don’t know what type of documents you are going to ingest, so using Runtime Fields sounds like a conservative approach with a nice tradeoff between performance and mapping complexity.

Note on using Kibana and Runtime Fields

Keep in mind that if we don’t specify a field when searching on Kibana using its search bar, (for example, just typing “hello” instead of “message: hello”,that search will match all fields, and that includes all runtime fields we have declared. You probably don’t want this behavior, so our index must use the dynamic setting index.query.default_field. Set it to be all or some of our mapped fields, and leave the runtime fields to be queried explicitly (e.g., “transaction.field3: hey”).

Our updated mapping would finally be:

PUT dynamic-mapping-runtime
{
  "mappings": {
    "dynamic": "runtime",
    "properties": {
      "message": {
        "type": "text"
      },
      "transaction": {
        "properties": {
          "user": {
            "type": "keyword"
          },
          "amount": {
            "type": "long"
          }
        }
      }
    }
  },
  "settings": {
    "index": {
      "query": {
        "default_field": [
          "message",
          "transaction.user"
        ]
      }
    }
  }
}Read more

Choosing the best strategy

Each strategy has its own advantages and disadvantages, so the best strategy will ultimately depend on your specific use case. Below is a summary to help you make the right choice for your needs:

Strategy

Pros

Cons

#1 – strict

Stored documents are guaranteed to be compliant with the mapping

Documents are rejected if they have fields that are not declared in the mapping

#2 – dynamic: false

Stored documents can have any number of fields, but only mapped fields will use resources

Fields that are not mapped cannot be used for searches or aggregations

#3 – Runtime Fields

All advantages of #2

Runtime Fields can be used in Kibana like any other field

Relative slower search response times when querying the Runtime Fields

Observability is where the Elastic Stack really shines. Whether it involves securely storing years of financial transactions while tracking impacted systems or ingesting several terabytes of daily network metrics, our customers are doing Observability ten times faster at a fraction of the cost. 

Looking to get started with Elastic Observability? The best way is in the cloud. Start your free trial of Elastic Cloud today!



Source link

Share this:

  • Facebook
  • Twitter
  • Pinterest
  • LinkedIn

Filed Under: News Tagged With: elasticsearch, explosion, fields, Mapping, Prevent, ways

Special Offers

  • OTTERBOX DEFENDER SERIES SCREENLESS EDITION Case for iPhone 13 Pro (ONLY) – HUNTER GREEN for $29

    OTTERBOX DEFENDER SERIES SCREENLESS EDITION Case for iPhone 13 Pro (ONLY) – HUNTER GREEN for $29
  • DUBLIN 1L Stainless Steel French Press for $63

    DUBLIN 1L Stainless Steel French Press for $63
  • Prodigy Afterschool Masterclasses for Kids for $99

    Prodigy Afterschool Masterclasses for Kids for $99
  • 10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149

    10.1" WiFi Digital Photo Frame with Photo/Video Sharing for $149
  • 8" WiFi Cloud Photo Frame for $112

    8" WiFi Cloud Photo Frame for $112

Reader Interactions

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

E-mail Newsletter

  • Facebook
  • GitHub
  • Instagram
  • Pinterest
  • Twitter
  • YouTube

More to See

New ‘FabricScape’ Bug in Microsoft Azure Service Fabric Impacts Linux Workloads

Jun 29, 2022 By iHash

Cloud Security Resources and Guidance

Cloud Security Resources and Guidance

Jun 29, 2022 By iHash

Tags

* Apple Cisco computer security cyber attacks cyber crime cyber news Cyber Security cybersecurity cyber security news cyber security news today cyber security updates cyber threats cyber updates data breach data breaches google hacker hacker news Hackers hacking hacking news how to hack incident response information security iOS iOS 7 iOS 8 iPhone iPhone 6 Malware microsoft network security Privacy ransomware malware risk management security security breaches security vulnerabilities software vulnerability the hacker news Threat update video web applications

Latest

MLPerf Results Highlight More Capable ML Training

Today, MLCommons®, an open engineering consortium, released new results from MLPerf™ Training v2.0, which measures the performance of training machine learning models. Training models empowers researchers to unlock new capabilities faster such as diagnosing tumors, automatic speech recognition or improving movie recommendations. The latest MLPerf Training results demonstrate broad industry participation and up to 1.8X […]

OTTERBOX DEFENDER SERIES SCREENLESS EDITION Case for iPhone 13 Pro (ONLY) – HUNTER GREEN for $29

Expires June 28, 2122 21:59 PST Buy now and get 0% off PRODUCT SPECS Compatible with iPhone 13 Pro (ONLY) Multi-layer defense from the solid inner shell and resilient outer slipcover with port covers that block dirt, dust and lint from getting into jacks and ports Tested to survive 4X as many drops as military […]

DUBLIN 1L Stainless Steel French Press for $63

Expires June 29, 2122 23:59 PST Buy now and get 10% off KEY FEATURES This elegant, durable, and stylish coffee press is the ideal way to enjoy coffee at home. It has a one-liter (1000 ml/34 fl. oz) capacity that makes it perfect for making two to three large cups of coffee. The double-wall insulated […]

What Is Data Reliability Engineering?

Data Reliability Engineering (DRE) is the work done to keep data pipelines delivering fresh and high-quality input data to the users and applications that depend on them. The goal of DRE is to allow for iteration on data infrastructure, the logical data model, etc. as quickly as possible, while—and this is the key part! —still […]

Prodigy Afterschool Masterclasses for Kids for $99

Expires June 28, 2122 23:59 PST Buy now and get 85% off KEY FEATURES Unlock Your Child’s Potential For Success! No dream is too big when you have the tools to achieve it. Whether your child dreams of saving lives as a doctor or inspiring people through the arts, Prodigy will give them the tools […]

Cybersecurity Experts Warn of Emerging Threat of “Black Basta” Ransomware

The Black Basta ransomware-as-a-service (RaaS) syndicate has amassed nearly 50 victims in the U.S., Canada, the U.K., Australia, and New Zealand within two months of its emergence in the wild, making it a prominent threat in a short window. “Black Basta has been observed targeting a range of industries, including manufacturing, construction, transportation, telcos, pharmaceuticals, […]

Jailbreak

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

  Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate.   Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

  Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Copyright iHash.eu © 2022
We use cookies on this website. By using this site, you agree that we may store and access cookies on your device. Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT