Database Index Failure Scenarios and Troubleshooting Methods
Problem Description
Database indexes are a key technology for improving query performance. However, in specific scenarios, indexes may become ineffective, leading to a sharp decline in query performance. This discussion explores common scenarios of index failure, underlying principles, and systematic approaches to troubleshoot and prevent such failures.
I. Root Causes of Index Failure
The essence of index failure is that the optimizer considers a full table scan more efficient than an index scan. Main reasons include:
- Skewed Data Distribution: When the conditional value accounts for an excessively high proportion in the table, the optimizer may skip the index.
- Insufficient Index Selectivity: Low distinctiveness of the indexed column leads to index efficiency being lower than a full table scan.
- Computation/Function Transformation: Operations on the indexed column prevent the use of the index tree for positioning.
II. Detailed Analysis of Typical Failure Scenarios
Scenario 1: Implicit Type Conversion
- Example:
SELECT * FROM users WHERE phone = 13800138000(wherephoneis of string type) - Reason for Failure: The number is converted to a string, equivalent to applying the function
CAST(phone AS NUMBER)to the indexed column. - Underlying Principle: B+ tree indexes are sorted based on the original data type; type conversion disrupts direct comparison.
Scenario 2: Violation of the Leftmost Prefix Principle
- Composite index
(name, age, city):- Valid:
WHERE name='A' AND age=20(utilizes the first two columns of the index) - Invalid:
WHERE age=20 AND city='Beijing'(skipping thenamecolumn prevents positioning)
- Valid:
- Analogy: Trying to look up a word in a dictionary by directly flipping to "page 20" without knowing the first letter.
Scenario 3: Index Columns After a Range Query
- Example:
WHERE name='A' AND age>18 AND city='Beijing' - Result: Only the
(name, age)columns of the index can be used; thecitycolumn becomes ineffective. - Principle: After the range query on
age, the values ofcityare no longer in sorted order within the B+ tree.
Scenario 4: Function/Expression Operations on Indexed Columns
- Operations causing failure:
WHERE YEAR(create_time)=2023 -- Should be changed to a range query WHERE amount*2>100 -- Should be rewritten as amount>50 WHERE SUBSTRING(name,1,3)='ABC' -- Should use LIKE 'ABC%' instead
III. Systematic Troubleshooting Methods
Step 1: Execution Plan Analysis
Use EXPLAIN or EXPLAIN ANALYZE to examine:
typecolumn showingALLindicates a full table scan.keycolumn showingNULLindicates no index was used.- Be cautious if the
Extracolumn showsUsing filesortorUsing temporary.
Step 2: Index Usage Statistics
Analyze index hit rates through performance databases (e.g., INFORMATION_SCHEMA.INDEX_STATISTICS). Consider removing indexes that have not been used for a long time.
Step 3: Data Distribution Check
-- Check selectivity
SELECT
COUNT(DISTINCT status)/COUNT(*) as selectivity
FROM orders;
-- Index effect is significant when selectivity < 0.1
IV. Optimization Practice Plans
-
SQL Rewriting Optimization
- Avoid implicit conversion: Ensure conditional values match the column type.
- Place high-selectivity columns first: Position columns with high distinctiveness on the left side of composite indexes.
-
Index Strategy Adjustment
- Use covering indexes: Include all
SELECTfields within the index. - Utilize index pushdown technology: Filter
WHEREconditions at the storage engine layer.
- Use covering indexes: Include all
-
Special Scenario Handling
- For frequent range queries: Use
IN()to replace some range queries. - For fuzzy queries: Use indexes for prefix matching;
%value%requires full-text indexes.
- For frequent range queries: Use
By understanding the mechanisms of index failure and combining them with execution plan analysis, performance pitfalls can be effectively avoided, thereby enhancing query efficiency.