Analysis of the Principle of Deferred Join Optimization in Database Query Optimization
I. Concept and Problem Context of Deferred Join
Deferred Join is an optimization technique for deep pagination queries. When a query needs to skip a large amount of data and retrieve a small result set (e.g., LIMIT 10000, 10), traditional pagination requires first reading 10010 records and then discarding the first 10000. Deferred Join rewrites the query to first use an index to locate the primary keys of the target data, and then join back to the table via the primary keys to obtain the complete data, reducing unnecessary back-to-table operations.
II. Performance Issues of Traditional Pagination Queries
- Typical Scenario:
SELECT * FROM orders ORDER BY create_time DESC LIMIT 10000, 10; - Execution Process:
- Read 10010 complete records via index or full table scan.
- Sort at the Server layer (if unable to utilize an index).
- Discard the first 10000 records and return the last 10.
- Performance Bottleneck: Significant time is consumed reading and discarding unneeded data, especially due to back-to-table operations (for InnoDB, accessing the primary key index is required).
III. Optimization Approach of Deferred Join
- Core Idea: Split the query into two steps:
- Step 1: Quickly locate the primary keys of the target data using a covering index (avoiding back-to-table).
- Step 2: Join back to the original table via the primary keys to obtain complete data.
- Query Rewrite Example:
SELECT * FROM orders INNER JOIN ( SELECT id FROM orders ORDER BY create_time DESC LIMIT 10000, 10 ) AS tmp USING(id); - Key Advantage: The
SELECT idin the subquery only needs to access the index (if(create_time, id)is a composite index), without back-to-table, significantly reducing I/O.
IV. Detailed Execution Flow of Deferred Join
- Subquery Phase:
- Use a covering index (e.g.,
(create_time, id)) for fast scanning. - Directly obtain the ids of records 10000 to 10010 (reading only index pages).
- Avoid back-to-table, reducing disk access.
- Use a covering index (e.g.,
- Join Phase:
- Join the original table using the 10 ids as conditions (
USING(id)). - Quickly locate the 10 records via the primary key index (primary key lookup is highly efficient).
- Join the original table using the 10 ids as conditions (
- Resource Comparison: The traditional method requires 10010 back-to-table operations, while deferred join requires only 10.
V. Applicable Conditions and Limitations
- Best Scenarios:
- Deep pagination (offset is much larger than the result set size).
- Existence of an index that can cover the query (e.g.,
(sort field, id)).
- Non-Applicable Cases:
- Shallow pagination (e.g.,
LIMIT 0, 10) may increase complexity. - Cannot utilize covering index optimization when no suitable index exists.
- Shallow pagination (e.g.,
- Index Design Suggestion: Create a composite index of
(sort field, primary key)for pagination queries.
VI. Practical Cases and Effect Comparison
- Test Data: 1 million order records, indexed on the
create_timefield. - Traditional Query:
LIMIT 500000, 10takes about 2 seconds (requires 500k back-to-table operations). - Deferred Join: After rewriting, takes about 0.01 seconds (only 10 back-to-table operations).
- Execution Plan Difference: The traditional method shows "Using filesort," while deferred join shows "Using index."
VII. Advanced Optimization Techniques
- Cursor-Based Pagination: Record the sort field value of the last record from the previous page, and use
WHERE create_time < ?condition to avoid offset. - Business Layer Optimization: Restrict users from jumping to excessively deep page numbers, or use pre-computed summary tables.
- Multi-Dimensional Sorting: If the sort field is not unique, add a secondary sort condition (e.g.,
id) to ensure pagination stability.
Through Deferred Join technology, the performance bottleneck of deep pagination can be effectively addressed. The core lies in reducing back-to-table operations and leveraging covering indexes. Practical application requires comprehensive evaluation based on index design and business scenarios.