Principles and Optimization of Database Indexes
I. Concept and Purpose of Indexes
An index is a data structure used in databases to quickly locate data, similar to a book's table of contents. Its core function is to reduce the number of disk I/O operations by pre-organizing data (e.g., sorting) to speed up queries. For example, querying a record in a table without an index may require a full table scan (checking each row), while an index can directly locate the data's position using specific algorithms (such as B+ trees).
II. Underlying Data Structures of Indexes
-
B+ Tree (Most Commonly Used):
- Structural Features:
- All data is stored in leaf nodes; non-leaf nodes only store key values (indexes) and pointers.
- Leaf nodes are connected via pointers to form an ordered linked list, supporting range queries.
- Advantages:
- Low tree height (typically 3-4 layers can store billions of records), reducing disk access frequency.
- Leaf nodes are ordered, making them suitable for sorting, grouping, and other operations.
- Structural Features:
-
Hash Index:
- Maps key values to storage locations via a hash function, suitable for equality queries (e.g.,
=), but does not support range queries (e.g.,>).
- Maps key values to storage locations via a hash function, suitable for equality queries (e.g.,
III. Principles for Creating and Using Indexes
-
Scenarios Suitable for Creating Indexes:
- Fields frequently used as query conditions (e.g.,
WHERE user_id = 1001). - Fields requiring sorting or grouping (e.g.,
ORDER BY create_time). - Foreign key fields (to ensure efficiency in join queries).
- Fields frequently used as query conditions (e.g.,
-
Cases Where Indexes Are Not Suitable:
- Fields with high data duplication (e.g., gender), where indexes are less effective.
- Fields updated frequently, as maintaining indexes incurs high costs (e.g., rebalancing B+ trees).
IV. Index Optimization Strategies
-
Covering Index:
- The index includes all fields required for the query, eliminating the need for table lookups (e.g., an index
(user_id, name)covers the querySELECT name FROM users WHERE user_id=1).
- The index includes all fields required for the query, eliminating the need for table lookups (e.g., an index
-
Leftmost Prefix Principle:
- A composite index
(A, B, C)is only effective for queries such as:WHERE A=1WHERE A=1 AND B=2- However, it cannot optimize
WHERE B=2(since the leftmost field A is not used).
- A composite index
-
Common Causes of Index Invalidation:
- Using functions on indexed fields (e.g.,
WHERE UPPER(name)='ABC'). - Fuzzy queries starting with a wildcard (e.g.,
LIKE '%abc'). - Implicit data type conversion (e.g., querying a string field with a number).
- Using functions on indexed fields (e.g.,
V. Practical Examples
Assume the table orders has the following structure:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT,
status VARCHAR(10),
amount DECIMAL(10,2),
create_time DATETIME
);
- Scenario 1: Frequently querying orders by
user_id:CREATE INDEX idx_user_id ON orders(user_id); - Scenario 2: Needing to sort by
statusandcreate_time:
The queryCREATE INDEX idx_status_time ON orders(status, create_time);SELECT * FROM orders WHERE status='paid' ORDER BY create_time DESCcan directly utilize the index.
VI. Summary
Indexes improve query efficiency by trading space for time but require a balance between read and write performance. When designing indexes, consider business query patterns to avoid over-indexing (which affects write speed) or missing critical indexes (leading to slow queries).