Database Index Failure Scenarios and Troubleshooting Methods

Database Index Failure Scenarios and Troubleshooting Methods

Problem Description
Database indexes are a key technology for improving query performance. However, in specific scenarios, indexes may become ineffective, leading to a sharp decline in query performance. This discussion explores common scenarios of index failure, underlying principles, and systematic approaches to troubleshoot and prevent such failures.

I. Root Causes of Index Failure
The essence of index failure is that the optimizer considers a full table scan more efficient than an index scan. Main reasons include:

Skewed Data Distribution: When the conditional value accounts for an excessively high proportion in the table, the optimizer may skip the index.
Insufficient Index Selectivity: Low distinctiveness of the indexed column leads to index efficiency being lower than a full table scan.
Computation/Function Transformation: Operations on the indexed column prevent the use of the index tree for positioning.

II. Detailed Analysis of Typical Failure Scenarios
Scenario 1: Implicit Type Conversion

Example: SELECT * FROM users WHERE phone = 13800138000 (where phone is of string type)
Reason for Failure: The number is converted to a string, equivalent to applying the function CAST(phone AS NUMBER) to the indexed column.
Underlying Principle: B+ tree indexes are sorted based on the original data type; type conversion disrupts direct comparison.

Scenario 2: Violation of the Leftmost Prefix Principle

Composite index (name, age, city):
- Valid: WHERE name='A' AND age=20 (utilizes the first two columns of the index)
- Invalid: WHERE age=20 AND city='Beijing' (skipping the name column prevents positioning)
Analogy: Trying to look up a word in a dictionary by directly flipping to "page 20" without knowing the first letter.

Scenario 3: Index Columns After a Range Query

Example: WHERE name='A' AND age>18 AND city='Beijing'
Result: Only the (name, age) columns of the index can be used; the city column becomes ineffective.
Principle: After the range query on age, the values of city are no longer in sorted order within the B+ tree.

Scenario 4: Function/Expression Operations on Indexed Columns

Operations causing failure:

WHERE YEAR(create_time)=2023        -- Should be changed to a range query
WHERE amount*2>100                  -- Should be rewritten as amount>50
WHERE SUBSTRING(name,1,3)='ABC'    -- Should use LIKE 'ABC%' instead

III. Systematic Troubleshooting Methods
Step 1: Execution Plan Analysis
Use EXPLAIN or EXPLAIN ANALYZE to examine:

type column showing ALL indicates a full table scan.
key column showing NULL indicates no index was used.
Be cautious if the Extra column shows Using filesort or Using temporary.

Step 2: Index Usage Statistics
Analyze index hit rates through performance databases (e.g., INFORMATION_SCHEMA.INDEX_STATISTICS). Consider removing indexes that have not been used for a long time.

Step 3: Data Distribution Check

-- Check selectivity
SELECT 
  COUNT(DISTINCT status)/COUNT(*) as selectivity 
FROM orders;
-- Index effect is significant when selectivity < 0.1

IV. Optimization Practice Plans

SQL Rewriting Optimization
- Avoid implicit conversion: Ensure conditional values match the column type.
- Place high-selectivity columns first: Position columns with high distinctiveness on the left side of composite indexes.
Index Strategy Adjustment
- Use covering indexes: Include all SELECT fields within the index.
- Utilize index pushdown technology: Filter WHERE conditions at the storage engine layer.
Special Scenario Handling
- For frequent range queries: Use IN() to replace some range queries.
- For fuzzy queries: Use indexes for prefix matching; %value% requires full-text indexes.

By understanding the mechanisms of index failure and combining them with execution plan analysis, performance pitfalls can be effectively avoided, thereby enhancing query efficiency.