Skip to content

Inverted index

An inverted index is the core data structure that facilitates efficient full-text search. It operates by mapping specific words or terms to their locations within a set of documents, allowing a system to quickly retrieve all documents containing a particular query term.^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]

In Elasticsearch, which is built upon the [[Lucene]] library, the inverted index is used to power the analysis and search of non-relational document data.^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md] This structure is distinct from traditional database indexes (like B+ Trees used in [[RDBMS]] systems) and is essential for performing high-performance text retrieval and relevance scoring.^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]

Structure and Mapping

The logic of the inverted index acts as an inversion of the natural document storage structure. While a standard document stores words within a file, the inverted index maps specific terms back to the documents (or specific positions within documents) where they appear.

In the context of data storage schemas, the relationship between document containers and search terms follows this mapping:

  • Table (in RDBMS) \(\rightarrow\) Index^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]
  • Row (in RDBMS) \(\rightarrow\) Document^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]

Role in Analysis and Mapping

The creation and utilization of an inverted index are heavily influenced by the Mapping definitions of the data fields^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].

  • Text Fields: Fields mapped as text undergo tokenization (分詞)^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]. The text is broken down into individual terms (tokens) which are then stored in the inverted index, allowing for partial matches and full-text search capabilities^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].
  • Keyword Fields: Fields mapped as keyword are indexed as exact values^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]. They support exact queries (精確查詢) without tokenization, meaning the entire string is treated as a single unit in the index^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].

Comparison with Query Types

The inverted index enables different types of search behaviors, primarily distinguished by how the query interacts with the indexed terms:

  • term query: Treats the query string as a single, exact token (e.g., treating "iphone mobile" as one specific word)^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].
  • match query: Analyzes the query string, performing tokenization (分詞) to match against the terms in the inverted index^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].

Sources

^[600-developer-elasticsearch.md] ^[600-developer__Elasticsearch.md]