Skip to content

B-tree vs B+ tree

B-trees and B+ trees are both self-balancing tree data structures that maintain sorted data and allow for efficient insertion, deletion, and search operations. They are widely used in database indexing and file systems to minimize disk I/O operations^[600-developer-mysql.md].

Key Structural Differences

The fundamental difference between the two lies in how they store data:

  • B-Tree: Data records are stored in both internal nodes and leaf nodes^[600-developer-mysql.md]. This structure can sometimes retrieve data without reaching the leaf level.
  • B+ Tree: Data records are stored only in the leaf nodes^[600-developer-mysql.md]. The internal nodes serve solely as a directory or index to guide the search path^[600-developer-mysql.md]. This means all data access in a B+ tree requires traversing from the root to a leaf.

Additionally, B+ tree leaf nodes are typically linked together in a linked list (or doubly linked list), which is not a standard feature of B-trees^[600-developer-mysql.md].

Query Patterns and Performance

The structural differences lead to distinct performance characteristics for different types of queries:

Range Queries

B+ trees are generally superior for range scans (e.g., BETWEEN, >, < queries). Because the leaf nodes are linked together, once a starting point is found, the database can simply scan sequentially through the leaf nodes to retrieve the range of values^[600-developer-mysql.md]. In contrast, a B-tree would need to perform tree traversals (in-order traversal) to access range data, which involves more random access and is less efficient for sequential retrieval^[600-developer-mysql.md].

Point Queries

For point queries (searching for a specific single value), a B-tree might theoretically find the data faster if it is located in a root or internal node, as the search could terminate before reaching the leaf level^[600-developer-mysql.md]. However, B+ trees are often preferred for point queries as well because the internal nodes can hold more keys (since they do not store data), resulting in a wider fan-out and a shorter tree height (fewer disk I/Os to reach the leaves).

Why MySQL Uses B+ Trees

MySQL (specifically the InnoDB storage engine) uses B+ trees for its indexes^[600-developer-mysql.md]. This choice is driven primarily by the efficiency of B+ trees in handling range queries and scanning^[600-developer-mysql.md].

Furthermore, the B+ tree structure naturally supports the implementation of clustered indexes. In InnoDB, the leaf nodes of the B+ tree contain the actual data rows (the "data page"), organized by the primary key^[600-developer-mysql.md]. The internal nodes act purely as a navigational aid ("directory + data"), allowing a tree of height 3 to support millions of records^[600-developer-mysql.md].

Sources

  • 600-developer-mysql.md