How Database Indexes Work and Optimization Strategies
Problem Description:
Explain how database indexes (such as B+ tree indexes) improve query efficiency. Illustrate index optimization strategies and their potential issues using specific scenarios.
Step-by-step Explanation:
1. The Basic Role of Indexes
- Problem Background: When a database table stores a large amount of data, queries without an index require a full table scan (checking each row), resulting in O(N) time complexity and low efficiency.
- Index Definition: An index is a data structure independent of the data table. It accelerates queries by maintaining a sorted copy of specific fields and mapping them to data locations.
- Analogy: Similar to a book's table of contents—using the TOC to quickly locate chapter page numbers, avoiding flipping through every page.
2. The Core Structure: B+ Tree
Taking the most common B+ tree index as an example, its design goal is to reduce disk I/O operations (database data is usually stored on disk, which has slow read/write speeds).
B+ Tree Characteristics:
- Multi-way Balanced Tree: Each node can contain multiple key values (e.g., 100~1000), resulting in a low tree height (typically 3~4 levels can store billions of records).
- Data Stored Only in Leaf Nodes: Internal nodes only store key values and child node pointers, ensuring consistent query path length.
- Leaf Nodes Connected by a Bidirectional Linked List: Supports efficient range queries (e.g.,
WHERE id BETWEEN 100 AND 200).
Query Process Example (Assuming query id=150):
- Start from the root node, find the child node according to the key order (e.g., root node keys are [50, 100, 200], then 150 falls into the 100~200 interval).
- Traverse downwards level by level until reaching a leaf node.
- Locate
id=150within the leaf node and obtain its corresponding data row address (or the data itself if stored directly). - Read the disk data via the address and return the result.
Key Advantage: Billions of records require only 3~4 disk I/Os, whereas a full table scan might require millions.
3. Index Optimization Strategies
(1)Index Selection Principles
- High-Frequency Query Fields: Create indexes for fields used in
WHERE,JOIN,ORDER BYclauses. - High Selectivity Fields: Fields with low duplication rates (e.g., ID number) are more suitable for indexing than those with high duplication (e.g., gender).
- Composite Index Leftmost Prefix Matching:
- A composite index
(A, B, C)is only effective for queries like:WHERE A=1WHERE A=1 AND B=2WHERE A=1 AND B=2 AND C=3
- But it cannot accelerate
WHERE B=2orWHERE C=3(violates the leftmost prefix rule).
- A composite index
(2)Avoiding Common Scenarios of Index Invalidation
- Operations or Functions on Indexed Columns: E.g.,
WHERE YEAR(create_time)=2023cannot use thecreate_timeindex; change it to a range query. - Wildcard at the Beginning of a LIKE Pattern:
LIKE '%abc'cannot use an index, whileLIKE 'abc%'can. - Type Conversion: E.g., using a numeric type for a string field query (
WHERE id='123'vsWHERE id=123may trigger implicit conversion).
4. Potential Issues with Indexes
- Slower Write Operations:
- Each
INSERT/UPDATE/DELETErequires synchronously updating the index, increasing overhead. - Suggestion: Balance the number of indexes for tables with frequent read/write operations.
- Each
- Space Overhead: Indexes require additional storage space (typically 10%~30% of the data size).
- Redundant Indexes:
- For example, creating an
(A)index when(A, B)already exists is redundant (as the composite index can cover the single-field query). - Tool for checking: Use
sys.schema_redundant_indexes(MySQL) to identify redundant indexes.
- For example, creating an
5. Practical Scenario Analysis
Scenario: A users table has country, city, and age fields. Frequent queries are needed for "adult users in a specific country and city".
- Optimization Plan:
- Create a composite index
(country, city, age). - Write the query as:
SELECT * FROM users WHERE country='China' AND city='Beijing' AND age >= 18; - The index directly locates the rows meeting the conditions, avoiding a full table scan.
- Create a composite index
Summary: Indexes reduce disk I/O through data structures like B+ trees, but they require rational design to avoid redundancy and inefficiency. In practice, index strategies need to be dynamically adjusted based on query patterns and data distribution.