Database Pagination Query Performance Optimization: Principles and Practices

Database Pagination Query Performance Optimization: Principles and Practices

Problem Description
Pagination query is one of the most common operations in databases (e.g., "query page N with M records per page"). When the dataset is huge, the performance of a simple LIMIT offset, size statement degrades sharply in deep pagination scenarios (where the offset value is very large). This topic systematically analyzes the root causes of performance bottlenecks in deep pagination and delves into the principles and practices of core optimization techniques based on cursors, subqueries, deferred joins, etc.

1. Performance Bottleneck Analysis of Traditional Pagination
Traditional pagination uses the LIMIT offset, size syntax (e.g., SELECT * FROM orders ORDER BY id LIMIT 1000000, 20). Its execution involves the following steps:

  • Step 1: The database locates all records satisfying the WHERE condition via an index (e.g., the primary key index).
  • Step 2: Sorts the result set according to the ORDER BY field (requires a temporary sort if no index is available).
  • Step 3: Sequentially scans from the first record, skipping the first offset records (1 million in this example).
  • Step 4: Returns the next size records (20 in this example).

Key Issue: In step 3, the database needs to physically scan and discard the first 1 million records. Even when using an index for sorting, the massive amount of wasted random I/O or sequential scan operations still leads to significant resource overhead.

2. Cursor-based "Previous/Next Page" Optimization
Suitable for sequential pagination scenarios (e.g., pull-to-load on mobile devices). The core idea is to avoid using offset:

  • Principle: Record the position of the last record from the previous page (e.g., its ID), and change LIMIT offset, size to a conditional query.
  • Example:
    First page: SELECT * FROM orders ORDER BY id LIMIT 20
    Second page: SELECT * FROM orders WHERE id > last_max_id_from_previous_page ORDER BY id LIMIT 20
  • Advantage: Directly locates the starting position via the index, skipping the scan of invalid data.
  • Limitation: Does not support random page jumps (e.g., directly jumping to page 50).

3. Subquery-based Deferred Join Optimization
Suitable for scenarios requiring random page jumps where the sort field is indexed:

  • Principle: First, quickly locate the target record IDs via a subquery (avoiding back-to-table lookups), then retrieve the complete data via primary key joins.
  • Example:
    SELECT * FROM orders  
    INNER JOIN (  
      SELECT id FROM orders  
      ORDER BY create_time  
      LIMIT 1000000, 20  
    ) AS tmp USING(id)  
    
  • Execution Process:
    1. The subquery selects only the id and sort fields (leveraging index coverage to avoid back-to-table lookups), quickly skipping the offset records.
    2. Joins with the original table via the primary key id to precisely fetch the 20 complete records.
  • Advantage: The data volume processed by the subquery is much smaller than the original table (only indexed fields), reducing I/O overhead.

4. Business Design-based Special Optimization Strategies

  • Prohibit Excessively Deep Pagination: Limit the accessible page range at the business level (e.g., only allowing access to the first 100 pages).
  • Data Preloading: Pre-generate and cache pagination results for hot data (e.g., storing the first N pages in Redis).
  • Secondary Index Strategy: Create composite indexes on pagination conditions (e.g., time range) and combine with WHERE conditions to narrow the scan scope.

5. Practical Recommendations and Selection Criteria

  • Sequential Pagination: Prioritize the cursor method (WHERE condition pagination).
  • Random Page Jumps: Use traditional LIMIT when data volume is less than 100k; use deferred joins when greater than 100k.
  • Index Design: Create composite indexes for ORDER BY fields and query conditions, leveraging index coverage as much as possible.
  • Monitoring Tools: Analyze execution plans via EXPLAIN, paying attention to warnings like Using filesort or full table scans.