How Database Indexes Work and Optimization Strategies
Problem Description
Indexes are a core technology in databases for improving query performance. Their essence is an ordered data structure used to quickly locate data. Interviews often examine the working mechanism of indexes (such as B+ tree structure), index types (clustered/non-clustered indexes, covering indexes, etc.), and how to design and optimize indexes for practical scenarios. A deep understanding of the costs of indexes (degraded write performance, space usage) and optimization principles is required.
Step-by-Step Explanation
1. The Core Role and Costs of Indexes
- Role: Similar to a book's table of contents, avoids full table scans, reducing query time complexity from O(n) to O(log n).
- Costs:
- Slower Write Operations: Each INSERT/UPDATE/DELETE requires updating the index to maintain data structure order.
- Space Usage: Indexes require additional storage space, e.g., B+ trees contain non-leaf nodes.
- Balancing Principle: Index design must weigh the trade-off between query acceleration and write performance/storage costs.
2. Index Data Structures (Using B+ Tree as an Example)
- B+ Tree Characteristics:
- A multi-way balanced search tree where all data is stored in leaf nodes, and non-leaf nodes only store key values (reducing tree depth).
- Leaf nodes are linked via pointers, supporting efficient range queries (e.g.,
WHERE id BETWEEN 10 AND 100).
- Search Process (Using equality query
id=25as an example):- Start from the root node, compare with key values in order to determine the next child node (e.g., if root keys are [10,30,50], then 25 falls into the 10~30 interval).
- Traverse downwards level by level until reaching a leaf node, then use binary search within the leaf node to locate the target data or its address.
- If the target value is not in the leaf node, the data does not exist.
3. Clustered Index vs. Non-Clustered Index
- Clustered Index (e.g., InnoDB's primary key index):
- Leaf nodes directly store row data; table data is physically stored in the order of the index.
- Only one clustered index per table (usually the primary key).
- Non-Clustered Index (Secondary index):
- Leaf nodes store primary key values (not the actual data). Queries require a table lookup: first query the secondary index to get the primary key, then use the primary key index to fetch the data.
- Example: Table
users(id PK, name INDEX, age), querySELECT * FROM users WHERE name='Alice':
① Find the primary key valueid=5corresponding toname='Alice'in thenameindex tree;
② Useid=5to perform a table lookup via the clustered index to get the row data.
4. Covering Index Optimization
- Definition: An index contains all fields required by the query, avoiding table lookups.
- Example: Query
SELECT name FROM users WHERE name='Alice'. If the leaf node of the(name)index already stores thenamevalue, the result can be returned directly. - Optimization Suggestion: Frequently queried fields can be included in the index (e.g., a composite index
(name, age)coveringSELECT name, age FROM users).
5. Index Invalidation Scenarios and Optimization Strategies
- Common Causes of Invalidation:
- Using functions or operations on indexed columns (e.g.,
WHERE UPPER(name)='ALICE'). - Implicit type conversion (e.g., querying a string column with a number
WHERE id='100'). - Leading wildcard in LIKE patterns (
LIKE '%abc'cannot leverage index ordering). - Composite index not adhering to the leftmost prefix principle (for index
(a,b,c), if query condition lacksa, the index is ineffective).
- Using functions or operations on indexed columns (e.g.,
- Optimization Strategies:
- Prefix Indexes: Create an index on the first N characters of a long string, balancing selectivity and space (e.g.,
ALTER TABLE users ADD INDEX (name(10))). - Index Condition Pushdown: A feature since MySQL 5.6. It filters by non-indexed conditions earlier during index traversal, reducing table lookups.
- Index Merge: Scanning multiple single-column indexes separately and merging results (e.g.,
WHERE a=1 OR b=2), but efficiency is often lower than using a composite index.
- Prefix Indexes: Create an index on the first N characters of a long string, balancing selectivity and space (e.g.,
6. Practical Principles for Index Design
- Prioritize High-Frequency Queries: Create indexes for fields in WHERE, JOIN, and ORDER BY clauses.
- Selectivity Principle: Choose columns with high selectivity (high ratio of distinct values). For example, a gender field has low selectivity, making indexes less effective.
- Prefer Shorter Fields: Smaller key lengths allow B+ tree nodes to store more keys, reducing tree height.
- Avoid Over-Indexing: Numerous indexes increase optimizer selection overhead and hurt write performance.
Through the above steps, one can systematically understand how indexes accelerate queries, common optimization techniques, and design trade-offs, enabling efficient application of indexes in real-world databases.