Analysis of the Optimization Principle of SQL Index Condition Pushdown

Analysis of the Optimization Principle of SQL Index Condition Pushdown

1. Problem Background: Limitations of Composite Index Queries

Assume there is a user table users with the following fields:

CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    city VARCHAR(50),
    KEY idx_age_city (age, city)
);

Query Requirement: Find users older than 20 whose city starts with "Beijing":

SELECT * FROM users WHERE age > 20 AND city LIKE 'Beijing%';

Execution Process Without Using Index Condition Pushdown:

  1. Index Scan: Locate all records with age > 20 along the composite index idx_age_city (the index can quickly locate the age range).
  2. Row Retrieval (Back to Table): For each qualifying index record, immediately access the primary key index (clustered index) to read the full data row.
  3. City Filtering: Apply the city LIKE 'Beijing%' condition to filter the read data rows at the server layer.

Drawback: Even if some records are ultimately filtered out because they do not meet the city condition, unnecessary row retrieval operations are still triggered, causing unnecessary disk I/O.


2. Core Idea of Index Condition Pushdown

Index Condition Pushdown (ICP) pushes the filtering conditions, which would originally be processed at the server layer, down to the storage engine layer for execution:

  • The storage engine directly applies the city LIKE 'Beijing%' condition for filtering during the index scan phase.
  • Row retrieval is performed only for index records that satisfy both age > 20 and city LIKE 'Beijing%'.

3. Workflow of Index Condition Pushdown (Taking MySQL's InnoDB as an Example)

Execution Steps After Enabling ICP:

  1. Index Scan: The storage engine locates the first record with age > 20 from the idx_age_city index.
  2. Pushed-Down Condition Filtering: Check if the index record satisfies city LIKE 'Beijing%':
    • If it does, perform row retrieval, read the full data row, and return it to the server layer.
    • If it does not, skip this index record directly and continue scanning the next one.
  3. Server Layer Filtering: The server layer performs final validation on the returned data rows (e.g., validating other non-index conditions).

Comparison of Effects:

  • Without ICP: Number of row retrievals = Number of all records with age > 20.
  • With ICP: Number of row retrievals = Number of records satisfying both age > 20 and city LIKE 'Beijing%'.

4. Applicable Scenarios and Limitations of Index Condition Pushdown

Applicable Conditions:

  1. The query involves a prefix column of a composite index (e.g., age) for a range query, with subsequent columns (e.g., city) used as filtering conditions.
  2. The column for the pushed-down condition must be an index column (e.g., city is part of the composite index).
  3. Supported condition types: =, >, <, LIKE 'prefix%', etc. (LIKE '%suffix' is not supported).

Limitations:

  • Subqueries or function calls (e.g., city LIKE CONCAT('Bei', '%')) may not be pushable.
  • Full-text indexes or spatial indexes are not applicable to ICP.
  • When accessing the full data row is necessary to evaluate the condition (e.g., the query involves non-index columns), pushdown is not possible.

5. Practical Case Verification

Checking if ICP is Enabled:
Observe the Extra field through the execution plan (Explain):

EXPLAIN SELECT * FROM users WHERE age > 20 AND city LIKE 'Beijing%';
  • Without ICP enabled: Extra shows Using where (filtering only at the server layer).
  • With ICP enabled: Extra shows Using index condition (condition pushed down to the storage engine).

Performance Comparison:
Assume the table has 100,000 records, of which 50,000 satisfy age > 20, and only 1,000 ultimately meet the city condition:

  • Without ICP: 50,000 row retrievals.
  • With ICP: Only 1,000 row retrievals, reducing I/O overhead by approximately 98%.

6. Summary and Optimization Suggestions

  • Advantages: Significantly reduces the number of row retrievals, lowering CPU and disk I/O pressure, especially suitable for scenarios involving range queries + filtering with composite indexes.
  • Configuration: ICP is enabled by default in MySQL 5.6+. It can be controlled via optimizer_switch='index_condition_pushdown=on'.
  • Design Suggestions: Reasonably design the order of composite indexes, place range query columns last, or utilize the ICP feature by including filtering columns in the index.