Vector search is a textbook example for the benefit of Single Instructions Multiple Data (SIMD) because the whole process of comparing two vectors to see how similar they are to each other is performed by comparing each of their dimensions in one form or another. So, the same operation is repeated on each of the dimensions and that for each of the candidate vectors. Performing the same instruction on multiple data is literally the acronym of SIMD, and the need to perform that on every candidate document makes the performance (and cost) impact so crucial. For a deeper explanation about that, see Vector similarity computations — ludicrous speed.
Since Elasticsearch 8.9, we’ve been taking advantage of SIMD to optimize the performance of vector comparisons. Now, we’ve built upon this to further improve scalar quantized vectors. The way that scalar quantization is performed in Elasticsearch (and Lucene) means that the resulting quantized vectors have certain properties that can be optimized by even more parallel SIMD instructions.
In 8.14, we released these new SIMD based optimizations to improve the performance of vector comparison, but we enabled it only for merging index segments and only on ARM. Elasticsearch constantly generates new segments as documents are added or modified in the index and then merges them into larger and larger segments. Making this process more efficient released computing resources and optimized segment size, which indirectly improves query latency, but it did not directly influence vector comparison during query execution.
We now introduce the SIMD optimization for query time as well and expand the set of platforms to include both x64 and ARM. With these optimizations, we see improved query latency times across the board. For example, on ARM with the StackOverflow data set (2 million vectors of 768 dimensions), we see an X2 improvement in kNN search query latency. Our approach to adding support for these capabilities is a primary reason our vector database has continued to see a 8x performance improvement with a 32x efficiency increase over the last few releases. Even better, it is unique to Elastic’s implementation of Lucene and outperforms others out of the box.
Leave a Reply