Elasticsearch, built on top of Lucene, can automatically detect and map new fields for users with its dynamic mapping feature. When users index low cardinality fields, such as height and age, they often use numbers to represent these values. Elasticsearch will infer these fields as “long” data types and use the BKD tree as the index for these long fields. As the data volume grows, building the result set for low-cardinality fields can lead to high CPU usage and increased load.
This issue is multiplied when the CPU is heavily used in bulk data operations. During a reindexing process, converting “long” fields to “keyword” fields can significantly reduce cluster load and search latency but can also be time-consuming. Elastic recommends using “keyword” for term/terms queries and “long” for range queries. However, users often don’t realize the performance impact of using “long” for low cardinality fields, relying on dynamic mapping that automatically selects the type. Optimizing the BKD tree is a solution for low/medium cardinality fields that would make a significant difference. This understanding highlights the importance of making the BKD tree indexing more efficient for these types of fields, addressing both CPU load and search latency issues.
By addressing these underlying issues, particularly in the context of Elasticsearch’s dynamic mapping and the use of “long” fields for low cardinality data, we can better understand the root causes of lock contention and develop more effective solutions.
Leave a Reply