B-tree vs B+ tree¶
B-trees and B+ trees are both self-balancing tree data structures that maintain sorted data and allow for efficient insertion, deletion, and search operations. They are widely used in database indexing and file systems to minimize disk I/O operations^[600-developer-mysql.md].
Key Structural Differences¶
The fundamental difference between the two lies in how they store data:
- B-Tree: Data records are stored in both internal nodes and leaf nodes^[600-developer-mysql.md]. This structure can sometimes retrieve data without reaching the leaf level.
- B+ Tree: Data records are stored only in the leaf nodes^[600-developer-mysql.md]. The internal nodes serve solely as a directory or index to guide the search path^[600-developer-mysql.md]. This means all data access in a B+ tree requires traversing from the root to a leaf.
Additionally, B+ tree leaf nodes are typically linked together in a linked list (or doubly linked list), which is not a standard feature of B-trees^[600-developer-mysql.md].
Query Patterns and Performance¶
The structural differences lead to distinct performance characteristics for different types of queries:
Range Queries¶
B+ trees are generally superior for range scans (e.g., BETWEEN, >, < queries). Because the leaf nodes are linked together, once a starting point is found, the database can simply scan sequentially through the leaf nodes to retrieve the range of values^[600-developer-mysql.md]. In contrast, a B-tree would need to perform tree traversals (in-order traversal) to access range data, which involves more random access and is less efficient for sequential retrieval^[600-developer-mysql.md].
Point Queries¶
For point queries (searching for a specific single value), a B-tree might theoretically find the data faster if it is located in a root or internal node, as the search could terminate before reaching the leaf level^[600-developer-mysql.md]. However, B+ trees are often preferred for point queries as well because the internal nodes can hold more keys (since they do not store data), resulting in a wider fan-out and a shorter tree height (fewer disk I/Os to reach the leaves).
Why MySQL Uses B+ Trees¶
MySQL (specifically the InnoDB storage engine) uses B+ trees for its indexes^[600-developer-mysql.md]. This choice is driven primarily by the efficiency of B+ trees in handling range queries and scanning^[600-developer-mysql.md].
Furthermore, the B+ tree structure naturally supports the implementation of clustered indexes. In InnoDB, the leaf nodes of the B+ tree contain the actual data rows (the "data page"), organized by the primary key^[600-developer-mysql.md]. The internal nodes act purely as a navigational aid ("directory + data"), allowing a tree of height 3 to support millions of records^[600-developer-mysql.md].
Related Concepts¶
- [[MySQL]]
- [[Database Indexing]]
- Clustered Index
- Covering Index
- [[Data Structures]]
Sources¶
600-developer-mysql.md