Deferred join optimization¶
Deferred join optimization is a database query technique used to improve performance when executing deep pagination—specifically when retrieving data sets with a large offset[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md]. It addresses the I/O overhead associated with retrieving full rows for data that is ultimately discarded by the LIMIT clause[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md].
Problem Statement¶
In MySQL, using SELECT * with a large offset causes the query optimizer to scan and discard the preceding rows[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md]. Because the query selects all columns (e.g., name, age alongside the primary key), the database must perform a "lookup" (return to the table data) for every single row scanned along the index, even for rows that will be skipped[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md]. This significantly increases I/O and latency as the offset grows[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md].
Solution¶
The optimization involves splitting the query into two distinct phases[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md]:
- Index Scan: Execute a lightweight query that retrieves only the primary IDs using the offset and limit[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md].
select id from table limit 3000000,10; - Data Lookup: Use the resulting IDs to fetch the full row data[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md].
By only fetching the IDs in the first step, the database avoids the costly operation of looking up full table rows for the millions of records being skipped[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md]. According to the source, this can reduce query time by roughly half[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md].
Implementation¶
The two phases are combined into a single SQL statement using an inner join[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md]. The subquery handles the efficient index scan to locate the target IDs, and the outer join retrieves the corresponding data[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md].
select table.*
from table
inner join (
select id from table limit 3000000,10
) as tmp
on tmp.id=table.id;
Using this method, query speeds for deep pagination can typically be maintained within 1 second[600-developer-database-mysql.md][600-developer__database__mysql__大数据分页优化.md].
Related Concepts¶
- [[Pagination]]
- [[Database Indexing]]
- [[Query Optimization]]
- [[SQL]]
Sources¶
- 600-developer-database-mysql.md
- 600-developer__database__mysql__大数据分页优化.md