SQL Index Optimization Principles
Problem Description
An interviewer might ask: "When database queries are slow, how do you optimize through indexes? Please explain how indexes work and the leftmost prefix matching principle."
Solution Process Explanation
-
The Core Function of Indexes
Indexes are similar to a book's table of contents. By pre-sorting and storing key field values and their location pointers, they help the database quickly locate data and avoid full table scans. Taking the querySELECT * FROM users WHERE name='Alice'as an example:- Without an index: The database needs to scan the entire
userstable row by row, comparing thenamefield (time complexity O(n)). - With an index: Directly searches for
Alicein the B+ tree of the index and quickly accesses the corresponding row via the pointer (time complexity O(log n)).
- Without an index: The database needs to scan the entire
-
The Underlying Structure of Indexes (Using B+ Tree as an Example)
- B+ Tree Characteristics:
- Leaf nodes store all data (or primary key pointers), while non-leaf nodes only store index key values, enabling efficient range queries.
- Leaf nodes are connected via pointers, supporting sequential traversal.
- Search Process:
Starting from the root node, uses binary search to compare layer by layer, eventually locating the target data in a leaf node. For example, searching for a record withage=30:Root node: [20, 40, 60] → Select the subtree where 20 ≤ 30 < 40 → Leaf node: [25,30,35] → Locate the data row corresponding to 30
- B+ Tree Characteristics:
-
The Leftmost Prefix Matching Principle
Assuming a composite index is created on columns(name, age, city), queries must follow the left-to-right matching rule:- Valid Use Cases:
WHERE name='Alice'(uses the first column of the index)WHERE name='Alice' AND age=25(uses the first two columns)WHERE name='Alice' AND age>20 AND city='Beijing'(the first two columns are used for lookup, the third for filtering)
- Ineffective Use Cases:
WHERE age=25(does not use the first columnname, cannot leverage the index sorting structure)WHERE name='Alice' AND city='Beijing'(skips theagecolumn, onlynametakes effect)
- Principle: The sorting rule of the index is similar to a phone book sorted by "last name → first name → city." Skipping a left column causes subsequent columns to be unordered.
- Valid Use Cases:
-
Index Optimization Practical Steps
- Step 1: Analyze Slow Queries
Use theEXPLAINcommand to view the execution plan, focusing on thetypefield:ALL(full table scan) → Need to create an indexref/range(index is effective) → Can further optimize for index coverage
- Step 2: Choose Index Columns
Prioritize creating indexes for columns with high-frequency query conditions and high cardinality (many unique values). Avoid creating separate indexes for low-cardinality columns (e.g., gender). - Step 3: Avoid Index Invalidation Scenarios
- Applying functions to indexed columns (e.g.,
WHERE UPPER(name)='ALICE') - Type conversion (e.g., using
WHERE id=123on a string column whereidis of string type) - Fuzzy queries like
LIKE '%abc'(prefix fuzzy matching cannot use the index)
- Applying functions to indexed columns (e.g.,
- Step 1: Analyze Slow Queries
-
Costs and Trade-offs of Indexes
- Space Cost: Indexes require additional storage space.
- Maintenance Cost: Inserting, deleting, or updating data requires synchronously updating the index, which may reduce write performance.
- Covering Index Optimization: If all queried fields are included in the index (e.g.,
SELECT name FROM users WHERE age=25), data can be returned directly from the index, avoiding a back-to-table lookup.