SQL Index Failure Scenarios and Optimization Strategies
Problem Description
Indexes are a core tool for SQL database performance optimization, but incorrect usage can lead to index failure, triggering full table scans and severely reducing query efficiency. Interviewers often test your understanding of index principles and SQL writing standards through index failure scenarios. This article systematically explains the common causes of index failure and optimization strategies.
Root Causes of Index Failure
The essence of index failure is that the database optimizer deems the cost of using an index higher than a full table scan. Common reasons include:
-
Applying Operations or Functions to Indexed Columns
- Failure Example:
WHERE YEAR(create_time) = 2023 - Principle: Indexes store raw values; calculations on the column prevent using the index tree for positioning.
- Optimization: Rewrite as a range query.
-- Optimized version WHERE create_time >= '2023-01-01' AND create_time < '2024-01-01'
- Failure Example:
-
Implicit Type Conversion
- Failure Example:
WHERE user_id = '123'(whereuser_idis integer type) - Principle: The database needs to convert the entire column to strings for comparison, equivalent to applying a function to the indexed column.
- Optimization: Ensure consistent data types.
WHERE user_id = 123 -- Direct integer match
- Failure Example:
-
Leading Wildcard in LIKE Queries
- Failure Example:
WHERE name LIKE '%张三' - Principle: B+ tree indexes are sorted by prefix and cannot locate values starting with a wildcard character.
- Optimization Solutions:
- Use trailing wildcard:
WHERE name LIKE '张三%' - Use full-text index (e.g., MySQL's FULLTEXT).
- Use trailing wildcard:
- Failure Example:
-
OR Condition Connecting to a Non-Indexed Column
- Failure Example:
WHERE indexed_column = 1 OR non_indexed_column = 2 - Principle: The optimizer needs to scan both the index and the full table, often opting directly for a full table scan.
- Optimization:
-- Split into UNION query SELECT * FROM t WHERE indexed_column = 1 UNION SELECT * FROM t WHERE non_indexed_column = 2
- Failure Example:
-
Violating the Leftmost Prefix Principle
- Failure scenarios for composite index (a, b, c):
- Missing leftmost column:
WHERE b = 1 AND c = 2 - Skipping a column:
WHERE a = 1 AND c = 2(can only use index on column a)
- Missing leftmost column:
- Optimization: Adjust query order or create a covering index.
- Failure scenarios for composite index (a, b, c):
-
Using NOT or != on Indexed Columns
- Failure Example:
WHERE status != 'deleted' - Principle: Requires scanning most index entries; cost may be higher than a full table scan.
- Optimization:
- Change to positive query:
WHERE status IN ('active', 'pending') - For columns with high selectivity, consider forcing index usage.
- Change to positive query:
- Failure Example:
Advanced Optimization Strategies
-
Covering Index Optimization
- Return query columns directly from the index, avoiding table lookups.
- Example: Create index
(category, price)to optimize the query:SELECT category, price FROM products WHERE category = 'electronics'
-
Index Condition Pushdown (ICP)
- Feature available in MySQL 5.6+ that pushes WHERE condition filtering down to the storage engine layer.
- Example: Index
(city, age), queryWHERE city='北京' AND age>20 - Traditional method: First retrieve all records where city='北京', then filter by age after table lookup.
- ICP: Filter age>20 directly at the index layer, reducing table lookups.
-
Index Selectivity Evaluation
- Selectivity = Number of distinct values / Total number of records.
- Indexes on low-selectivity columns (e.g., gender) are less effective; consider composite indexes with other columns.
Practical Checklist
- Use EXPLAIN to analyze the execution plan; focus on the type column (ideally 'index' or 'range').
- Avoid expression calculations on indexed columns in the WHERE clause.
- Composite indexes must follow the leftmost prefix principle; arrange columns in descending order of selectivity.
- Prioritize covering indexes for large datasets.
- Regularly update index statistics using
ANALYZE TABLE.
By understanding these principles and strategies, you can proactively avoid index failures and choose the optimal indexing scheme for specific scenarios.