Skip to content

Hybrid Retrieval with RRF Fusion

Hybrid Retrieval with RRF Fusion is a search architecture that combines vector search (semantic similarity) with keyword search (lexical matching) and merges the results using the Reciprocal Rank Fusion (RRF) algorithm^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md].

This approach is designed to overcome the limitations of using a single retrieval method. While vector search excels at understanding meaning and intent, and keyword search (BM25/tf-idf) excels at precise term matching and entity recognition, a hybrid system aims to provide results that are both semantically relevant and lexically precise^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。

Fusion Algorithm: RRF

The core of this architecture is the RRF (Reciprocal Rank Fusion) algorithm^[001-TODO__GBrain_-AI_Agent_个人知识库与混合检索引擎.md]。RRF is a method for combining ranked lists from different information retrieval systems. Unlike scoring methods that require normalizing scores across different algorithms—which can be difficult due to differing scales—RRF operates solely on the rank position of each document^[001-TODO__GBrain-_AI_Agent_个人知识库与混合检索引擎.md]。

The standard RRF formula used to calculate the final score is:

\[ \text{score} = \sum \frac{1}{k + \text{rank}} \]

In the specific implementation referenced here, a constant \(k=60\) is used^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。This formula ensures that a document appearing at the top of multiple lists receives a significantly higher score than one appearing only at the bottom, effectively harmonizing the results.

System Architecture

In a typical implementation, such as the GBrain system, the retrieval pipeline follows these specific steps^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]:

  1. Query Expansion: The user's query is first analyzed and potentially expanded using an LLM (e.g., Claude Haiku) to generate multiple sub-queries or related terms to improve recall^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。
  2. Parallel Search: The system executes two distinct searches in parallel:
    • Vector Search: Utilizing embedding models (e.g., text-embedding-3-large) and approximate nearest neighbor (ANN) indexes like HNSW (Hierarchical Navigable Small World) to find documents based on cosine similarity^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。
    • Keyword Search: Utilizing inverted indexes (like tsvector) and ranking functions (like ts_rank) to find documents containing exact terms or phrases^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。
  3. Fusion: The ranked result lists from both searches are merged using the RRF algorithm.
  4. Post-Processing: The fused list undergoes deduplication (removing redundant chunks from the same source) and re-ranking before being presented to the user^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。

Technical Components

Implementing hybrid retrieval often involves specific database technologies capable of handling both vector and traditional full-text search:

  • Database: Typically PostgreSQL with extensions like pgvector for vector data and native full-text search capabilities^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。
  • Vector Index: HNSW indexes are commonly used for high-performance vector search^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。
  • Embedding Models: High-dimensional models (e.g., OpenAI's 1536-dim models) are used to convert text into vectors for the semantic component^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。

Application

This architecture is prevalent in advanced RAG (Retrieval-Augmented Generation) systems and AI agents^[001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md]。By retrieving information that is both contextually relevant (vector) and factually specific (keyword), the AI agent can generate responses that are more accurate and hallucination-resistant.

  • [[RAG (Retrieval-Augmented Generation)]]
  • [[向量資料庫]]
  • [[知識庫]]
  • [[查詢擴展]]

Sources

  • 001-TODO__GBrain_-_AI_Agent_个人知识库与混合检索引擎.md