Database Query Optimization: Index Optimization
Problem Description:
Assume you have a user order table (orders) containing millions of records. You now need to query a specific user's order data from the last 30 days, but the query is very slow. Please analyze the possible reasons and provide specific optimization ideas and solutions.
Detailed Knowledge Points:
1. Problem Analysis
First, we need to understand why the query is slow. Assume the table structure and query statement are as follows:
-- Table Structure
CREATE TABLE orders (
id INT PRIMARY KEY,
user_id INT,
order_date DATETIME,
amount DECIMAL(10,2),
status VARCHAR(20)
);
-- Query Statement
SELECT * FROM orders
WHERE user_id = 123
AND order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);
Possible reasons for slow query:
- No suitable index, leading to full table scan
- Index exists but is poorly designed
- Data volume is too large, causing index ineffectiveness
2. Basic Principles of Indexes
An index is like the table of contents of a book, allowing for quick data location. Common index types:
- B+Tree Index: Most common, suitable for range queries
- Hash Index: Suitable for equality queries, does not support range queries
3. Index Optimization Solutions
Step 1: Analyze Existing Indexes
Use EXPLAIN to analyze the query execution plan:
EXPLAIN SELECT * FROM orders
WHERE user_id = 123
AND order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);
Observe key metrics:
- type: ALL indicates a full table scan, requiring optimization
- possible_keys: Possible indexes that can be used
- key: The index actually used
Step 2: Design a Suitable Index
For this query, we need to create a composite index:
-- Option 1: Put user_id first
CREATE INDEX idx_user_date ON orders(user_id, order_date);
-- Option 2: If frequent range queries by time, adjust the order
CREATE INDEX idx_date_user ON orders(order_date, user_id);
Selection Principles:
- Place equality query fields (user_id) first
- Place range query fields (order_date) later
- Consider query frequency and selectivity
Step 3: Index Usage Details
Leftmost Prefix Principle of Composite Indexes:
- The
idx_user_dateindex can be used for:- WHERE user_id = 123
- WHERE user_id = 123 AND order_date >= '2023-01-01'
- But NOT for: WHERE order_date >= '2023-01-01' (cannot use index)
Step 4: Avoid Index Invalidation Scenarios
Common scenarios where indexes become ineffective:
-- 1. Performing operations on indexed columns (ineffective)
WHERE YEAR(order_date) = 2023
-- 2. Using inequality conditions (may be ineffective)
WHERE user_id != 123
-- 3. Using OR conditions (needs optimization)
WHERE user_id = 123 OR amount > 100
-- 4. Fuzzy query starting with a wildcard
WHERE user_id LIKE '%123%'
4. Advanced Optimization Techniques
Covering Index Optimization:
If only specific columns are needed, create a covering index:
-- Query only the required columns
SELECT user_id, order_date, amount
FROM orders
WHERE user_id = 123
AND order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);
-- Create an index covering all queried columns
CREATE INDEX idx_covering ON orders(user_id, order_date, amount);
Index Partitioning Strategy:
For extremely large-scale data, consider:
- Partitioning by time
- Hash partitioning by user ID
- Design partitioning strategies based on business characteristics
5. Practical Recommendations
- Use slow query logs to identify problematic SQL statements
- Regularly analyze index usage
- Avoid over-indexing (affects write performance)
- Monitor index fragmentation and perform regular maintenance
Through such systematic optimization, queries that originally took several seconds can be optimized to millisecond levels, significantly improving system performance.