Inverted index¶
An inverted index is the core data structure that facilitates efficient full-text search. It operates by mapping specific words or terms to their locations within a set of documents, allowing a system to quickly retrieve all documents containing a particular query term.^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]
In Elasticsearch, which is built upon the [[Lucene]] library, the inverted index is used to power the analysis and search of non-relational document data.^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md] This structure is distinct from traditional database indexes (like B+ Trees used in [[RDBMS]] systems) and is essential for performing high-performance text retrieval and relevance scoring.^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]
Structure and Mapping¶
The logic of the inverted index acts as an inversion of the natural document storage structure. While a standard document stores words within a file, the inverted index maps specific terms back to the documents (or specific positions within documents) where they appear.
In the context of data storage schemas, the relationship between document containers and search terms follows this mapping:
- Table (in RDBMS) \(\rightarrow\) Index^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]
- Row (in RDBMS) \(\rightarrow\) Document^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]
Role in Analysis and Mapping¶
The creation and utilization of an inverted index are heavily influenced by the Mapping definitions of the data fields^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].
- Text Fields: Fields mapped as
textundergo tokenization (分詞)^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]. The text is broken down into individual terms (tokens) which are then stored in the inverted index, allowing for partial matches and full-text search capabilities^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]. - Keyword Fields: Fields mapped as
keywordare indexed as exact values^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md]. They support exact queries (精確查詢) without tokenization, meaning the entire string is treated as a single unit in the index^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].
Comparison with Query Types¶
The inverted index enables different types of search behaviors, primarily distinguished by how the query interacts with the indexed terms:
termquery: Treats the query string as a single, exact token (e.g., treating "iphone mobile" as one specific word)^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].matchquery: Analyzes the query string, performing tokenization (分詞) to match against the terms in the inverted index^[600-developer-elasticsearch.md, 600-developer__Elasticsearch.md].
Sources¶
^[600-developer-elasticsearch.md] ^[600-developer__Elasticsearch.md]