Skip to content

MySQL B+ tree index structure

In MySQL, particularly within the InnoDB storage engine, the primary data structure used for indexing is the B+ Tree. This structure is fundamental to the database's performance, enabling efficient data retrieval, insertion, and deletion operations.^[600-developer-mysql.md]

Structure and Capacity

A B+ Tree in InnoDB is a balanced tree structure where all data is stored in the leaf nodes.^[600-developer-mysql.md] * Node Size: Index pages are typically 16 KB in size.^[600-developer-mysql.md] * Data Storage: The tree organizes data into "Directory + Data" layers^[600-developer-mysql.md]. * Capacity: A B+ Tree with a height of 3 can store data at the million level (千万级别).^[600-developer-mysql.md] * Linking: The leaf nodes of the B+ Tree are linked together in a bidirectional manner to facilitate sequential access.^[600-developer-mysql.md]

Comparison with B-Trees

While similar to B-Trees, B+ Trees offer specific advantages for database systems^[600-developer-mysql.md]: * Data Location: In B+ Trees, data is stored exclusively in the leaf nodes, whereas B-Trees may store data in both branch and leaf nodes.^[600-developer-mysql.md] * Query Performance: B+ Trees generally provide more stable query performance because all queries ultimately reach the leaf nodes.^[600-developer-mysql.md] * Range Scans: The structure of B+ Trees is optimized for range retrieval operations.^[600-developer-mysql.md]

Index Types and Mechanics

Clustered and Secondary Indexes

  • Clustered Index (Primary Key): The data organization is physically sorted based on the primary key.^[600-developer-mysql.md] When inserting records, the data must be sorted according to this key^[600-developer-mysql.md].
  • Secondary Indexes: These indexes contain the indexed column values and a pointer (usually the Primary Key) to the actual data row.

Index Retrieval Mechanisms

  • Covering Index: When the columns selected in a query (e.g., SELECT column_1, column_2) are all present within an index, MySQL can retrieve the result directly from the index structure without accessing the data pages^[600-developer-mysql.md].
  • Table Return (回表): If a query selects columns not included in the secondary index, the database must locate the full row in the clustered index using the primary key found in the secondary index.^[600-developer-mysql.md]

Composite Indexes and the Leftmost Prefix Principle

For composite indexes (indexes containing multiple columns), the Leftmost Prefix Principle is crucial^[600-developer-mysql.md]: * Usage: The index can only be utilized if the query references the leftmost column(s) of the index^[600-developer-mysql.md]. * Scope: This applies to both WHERE filtering and ORDER BY sorting operations^[600-developer-mysql.md].

Query Optimization and Failures

Index usage depends heavily on how queries are written^[600-developer-mysql.md]: * Functions on Columns: Performing operations on indexed columns (e.g., WHERE a + 1 = 5) typically causes index failure^[600-developer-mysql.md]. * Implicit Conversion: Type mismatches or implicit conversions can prevent the optimizer from using the index^[600-developer-mysql.md]. * Range Queries: In composite indexes, range queries on a column may prevent the use of subsequent columns in the index^[600-developer-mysql.md].

Sources

^[600-developer-mysql.md]