Skip to content

Clustered vs non-clustered index

In database management systems, particularly within MySQL, the distinction between clustered and non-clustered indexes fundamentally changes how data is stored, retrieved, and organized on the disk^[600-developer__mysql.md].

Clustered Index

A clustered index defines the physical order of data in a table. The data rows themselves are stored in the leaf nodes of the index structure^[600-developer__mysql.md].

In the InnoDB storage engine, the clustered index is typically built upon the primary key. Because the data is physically sorted by the key, inserting data requires sorting operations to maintain the sequence^[600-developer__mysql.md]. This structure allows a B+ Tree of height 3 to support data storage in the millions^[600-developer__mysql.md].

Characteristics

  • Data-Page Integration: The leaf pages contain the actual data pages ("Directory + Data")^[600-developer__mysql.md].
  • Lookup Efficiency: The structure supports efficient range scans because the data is stored sequentially on disk^[600-developer__mysql.md].
  • Secondary Index Impact: Other indexes (non-clustered) typically store the primary key value as a pointer to the actual row^[600-developer__mysql.md].

Non-clustered Index

A non-clustered index stores a copy of the indexed columns (along with a pointer to the actual data) separately from the main data storage^[600-developer__mysql.md].

Secondary Indexes & "Table Return"

In InnoDB, secondary indexes are technically non-clustered indexes. Unlike the clustered index where the leaf node holds the full data row, the leaf node of a secondary index stores the indexed columns plus the primary key^[600-developer__mysql.md].

When a query searches a secondary index but needs columns not present in the index, the database must use the primary key found in the leaf node to look up the full row in the clustered index^[600-developer__mysql.md]. This process is known as "Table Return" (回表)^[600-developer__mysql.md].

Key Differences

The primary operational difference lies in the relationship between the index and the data storage^[600-developer__mysql.md]: * Clustered: The data is stored within the index structure (1:1 relationship with the physical data order). * Non-clustered: The index points to the data; the data is stored separately.

Optimization Strategies

Covering Index

A Covering Index is a powerful optimization technique involving non-clustered (secondary) indexes^[600-developer__mysql.md].

If a query selects only columns that are included in the index structure, the database can satisfy the query directly from the index without accessing the main data table^[600-developer__mysql.md]. This eliminates the performance cost of "Table Return" and is known as an Index Scan^[600-developer__mysql.md].

Composite Indexes

When designing indexes, particularly composite ones involving multiple columns, adherence to the Leftmost Prefix Principle is required to ensure the database utilizes the index efficiently^[600-developer__mysql.md].

Sources

^[600-developer__mysql.md]