Qdrant’s BM42 search algorithm delivers more accurate and efficient retrieval for retrieval-augmented generation applications, the company says.
Open-source vector database provider Qdrant has launched BM42, a vector-based hybrid search algorithm intended to provide more accurate and efficient retrieval for retrieval-augmented generation (RAG) applications. BM42 combines the best of traditional text-based search and vector-based search to lower the costs for RAG and AI applications, Qdrant said.
Qdrant’s BM42 was announced July 2. Traditional keyword search engines, using algorithms such as BM25, have been around for more than 50 years and are not optimized for the precise retrieval needed in modern applications, according to Qdrant. As a result they struggle with specific RAG demands, particularly with short segments requiring further context to inform successful search and retrieval. Moving away from a keyword-based search to a fully vectorized based offers a new industry standard, Qdrant said.
“BM42, for short texts which are more prominent in RAG scenarios, provides the efficiency of traditional text search approaches, plus the context of vectors, so is more flexible, precise, and efficient,” Andrey Vasnetsov, Qdrant CTO and co-founder, said. This helps to make vector search more universally applicable, he added.
Unlike traditional keyword-based search suited for long-form content, BM42 integrates sparse and dense vectors to pinpoint relevant information within a document. A sparse vector handles exact term matching, while dense vectors handle semantic relevance and deep meaning, according to the company.