Database Index Design Principles and Best Practices
Topic Description
Indexing is a core technology for optimizing database query performance. However, improper index design can lead to decreased write performance, wasted storage space, and other issues. This topic will systematically explain how to design efficient indexing schemes based on business scenarios, covering key knowledge points such as index type selection, column order principles, and covering index optimization.
Explanation of the Solution Process
1. Foundational Principles of Index Design
The core objective of index design is to achieve the greatest query performance improvement at the lowest maintenance cost. The following basic principles should be followed:
- Choose columns with high selectivity: The higher the ratio of unique values in an indexed column, the better the filtering effect. For example, gender (2 values) is not suitable for a standalone index, while user ID (unique values) is highly suitable.
- Prioritize columns frequently used as query conditions: Columns often appearing in WHERE, JOIN, ORDER BY, and GROUP BY clauses should be prioritized for indexing.
- Avoid over-indexing: Each index adds overhead to write operations (insert/update operations need to maintain the index structure), requiring a balance between read and write ratios.
2. Column Order Strategy for Multi-Column Indexes
For composite indexes, column order directly impacts index utilization:
- Leftmost Prefix Principle: An index
(A,B,C)can only be used for query conditions containingA,A,B, orA,B,C. It cannot be used ifAis skipped and the query directly usesBorC. - Place equality query columns before range query columns:
If the index is-- Recommended index: (status, create_time) SELECT * FROM orders WHERE status = 'shipped' AND create_time > '2023-01-01';(create_time, status), the range querycreate_time > '2023-01-01'will prevent the subsequentstatusfrom being filtered using the index. - Prioritize high-cardinality columns: However, this needs to be adjusted based on the query condition types. If a high-cardinality column often participates in range queries, it may need to be placed later.
3. Covering Index Optimization
A covering index refers to an index that includes all columns required by the query, eliminating the need for a table lookup:
-- Create a covering index: (category, price) INCLUDE (product_name)
SELECT product_name FROM products WHERE category = 'electronics' AND price < 1000;
Advantages:
- Reduces random I/O (avoids table lookups)
- Particularly important for InnoDB, as the leaf nodes of the primary key index store row data, and secondary indexes require table lookups. If a secondary index includes the query columns, it can avoid primary key lookups.
4. Common Pitfalls Leading to Index Inefficiency
Even with reasonable design, the following operations can still cause indexes to become inefficient:
- Implicit type conversion: For example, using a numeric comparison
WHERE id = 100on a string columnvarchar(should beWHERE id = '100'). - Applying operations or functions to indexed columns:
WHERE YEAR(create_time) = 2023cannot use thecreate_timeindex. It should be changed to a range query. - OR conditions not fully covered:
WHERE a = 1 OR b = 2. If onlyais indexed, a full table scan might occur. Consider using a composite index or splitting the query.
5. Index Selectivity and Prefix Indexes
- Selectivity Calculation:
Selectivity = Number of distinct values / Total number of rows. Columns with selectivity > 0.9 are suitable for standalone indexes. - Prefix Index: For long text columns (e.g.,
VARCHAR(500)), you can index only the first N characters:
Balance index length and selectivity:ALTER TABLE logs ADD INDEX (url(20)); -- Indexes the first 20 charactersSELECT COUNT(DISTINCT LEFT(url, 10)) / COUNT(*)calculates selectivity for different prefix lengths.
6. Index Maintenance and Monitoring Strategies
- Regularly Analyze Index Usage:
-- View unused indexes SELECT * FROM sys.schema_unused_indexes; -- Check for index redundancy (e.g., (A,B) and (A) might be redundant) - Avoid indexing frequently updated columns: For example,
last_login_time, where the index maintenance cost might outweigh the query benefits. - Utilize indexes for sorting:
ORDER BY create_time DESCcan use thecreate_timeindex to avoid additional sorting operations.
Conclusion
Optimal index design requires combining specific query patterns, data distribution, and hardware characteristics. By using EXPLAIN to analyze execution plans, monitoring slow query logs, and continuously iterating and adjusting index strategies, a balance between high performance and low cost can be achieved.