Database Query Optimization: Index Optimization

Database Query Optimization: Index Optimization

Problem Description:
Assume you have a user order table (orders) containing millions of records. You now need to query a specific user's order data from the last 30 days, but the query is very slow. Please analyze the possible reasons and provide specific optimization ideas and solutions.

Detailed Knowledge Points:

1. Problem Analysis
First, we need to understand why the query is slow. Assume the table structure and query statement are as follows:

-- Table Structure
CREATE TABLE orders (
    id INT PRIMARY KEY,
    user_id INT,
    order_date DATETIME,
    amount DECIMAL(10,2),
    status VARCHAR(20)
);

-- Query Statement
SELECT * FROM orders 
WHERE user_id = 123 
  AND order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);

Possible reasons for slow query:

  • No suitable index, leading to full table scan
  • Index exists but is poorly designed
  • Data volume is too large, causing index ineffectiveness

2. Basic Principles of Indexes
An index is like the table of contents of a book, allowing for quick data location. Common index types:

  • B+Tree Index: Most common, suitable for range queries
  • Hash Index: Suitable for equality queries, does not support range queries

3. Index Optimization Solutions

Step 1: Analyze Existing Indexes
Use EXPLAIN to analyze the query execution plan:

EXPLAIN SELECT * FROM orders 
WHERE user_id = 123 
  AND order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);

Observe key metrics:

  • type: ALL indicates a full table scan, requiring optimization
  • possible_keys: Possible indexes that can be used
  • key: The index actually used

Step 2: Design a Suitable Index
For this query, we need to create a composite index:

-- Option 1: Put user_id first
CREATE INDEX idx_user_date ON orders(user_id, order_date);

-- Option 2: If frequent range queries by time, adjust the order
CREATE INDEX idx_date_user ON orders(order_date, user_id);

Selection Principles:

  • Place equality query fields (user_id) first
  • Place range query fields (order_date) later
  • Consider query frequency and selectivity

Step 3: Index Usage Details
Leftmost Prefix Principle of Composite Indexes:

  • The idx_user_date index can be used for:
    • WHERE user_id = 123
    • WHERE user_id = 123 AND order_date >= '2023-01-01'
  • But NOT for: WHERE order_date >= '2023-01-01' (cannot use index)

Step 4: Avoid Index Invalidation Scenarios
Common scenarios where indexes become ineffective:

-- 1. Performing operations on indexed columns (ineffective)
WHERE YEAR(order_date) = 2023

-- 2. Using inequality conditions (may be ineffective)
WHERE user_id != 123

-- 3. Using OR conditions (needs optimization)
WHERE user_id = 123 OR amount > 100

-- 4. Fuzzy query starting with a wildcard
WHERE user_id LIKE '%123%'

4. Advanced Optimization Techniques

Covering Index Optimization:
If only specific columns are needed, create a covering index:

-- Query only the required columns
SELECT user_id, order_date, amount 
FROM orders 
WHERE user_id = 123 
  AND order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);

-- Create an index covering all queried columns
CREATE INDEX idx_covering ON orders(user_id, order_date, amount);

Index Partitioning Strategy:
For extremely large-scale data, consider:

  • Partitioning by time
  • Hash partitioning by user ID
  • Design partitioning strategies based on business characteristics

5. Practical Recommendations

  1. Use slow query logs to identify problematic SQL statements
  2. Regularly analyze index usage
  3. Avoid over-indexing (affects write performance)
  4. Monitor index fragmentation and perform regular maintenance

Through such systematic optimization, queries that originally took several seconds can be optimized to millisecond levels, significantly improving system performance.