SQL Index Optimization Principles

SQL Index Optimization Principles

Problem Description
An interviewer might ask: "When database queries are slow, how do you optimize through indexes? Please explain how indexes work and the leftmost prefix matching principle."

Solution Process Explanation

  1. The Core Function of Indexes
    Indexes are similar to a book's table of contents. By pre-sorting and storing key field values and their location pointers, they help the database quickly locate data and avoid full table scans. Taking the query SELECT * FROM users WHERE name='Alice' as an example:

    • Without an index: The database needs to scan the entire users table row by row, comparing the name field (time complexity O(n)).
    • With an index: Directly searches for Alice in the B+ tree of the index and quickly accesses the corresponding row via the pointer (time complexity O(log n)).
  2. The Underlying Structure of Indexes (Using B+ Tree as an Example)

    • B+ Tree Characteristics:
      • Leaf nodes store all data (or primary key pointers), while non-leaf nodes only store index key values, enabling efficient range queries.
      • Leaf nodes are connected via pointers, supporting sequential traversal.
    • Search Process:
      Starting from the root node, uses binary search to compare layer by layer, eventually locating the target data in a leaf node. For example, searching for a record with age=30:
      Root node: [20, 40, 60]  
      → Select the subtree where 20 ≤ 30 < 40  
      → Leaf node: [25,30,35] → Locate the data row corresponding to 30
      
  3. The Leftmost Prefix Matching Principle
    Assuming a composite index is created on columns (name, age, city), queries must follow the left-to-right matching rule:

    • Valid Use Cases:
      • WHERE name='Alice' (uses the first column of the index)
      • WHERE name='Alice' AND age=25 (uses the first two columns)
      • WHERE name='Alice' AND age>20 AND city='Beijing' (the first two columns are used for lookup, the third for filtering)
    • Ineffective Use Cases:
      • WHERE age=25 (does not use the first column name, cannot leverage the index sorting structure)
      • WHERE name='Alice' AND city='Beijing' (skips the age column, only name takes effect)
    • Principle: The sorting rule of the index is similar to a phone book sorted by "last name → first name → city." Skipping a left column causes subsequent columns to be unordered.
  4. Index Optimization Practical Steps

    • Step 1: Analyze Slow Queries
      Use the EXPLAIN command to view the execution plan, focusing on the type field:
      • ALL (full table scan) → Need to create an index
      • ref/range (index is effective) → Can further optimize for index coverage
    • Step 2: Choose Index Columns
      Prioritize creating indexes for columns with high-frequency query conditions and high cardinality (many unique values). Avoid creating separate indexes for low-cardinality columns (e.g., gender).
    • Step 3: Avoid Index Invalidation Scenarios
      • Applying functions to indexed columns (e.g., WHERE UPPER(name)='ALICE')
      • Type conversion (e.g., using WHERE id=123 on a string column where id is of string type)
      • Fuzzy queries like LIKE '%abc' (prefix fuzzy matching cannot use the index)
  5. Costs and Trade-offs of Indexes

    • Space Cost: Indexes require additional storage space.
    • Maintenance Cost: Inserting, deleting, or updating data requires synchronously updating the index, which may reduce write performance.
    • Covering Index Optimization: If all queried fields are included in the index (e.g., SELECT name FROM users WHERE age=25), data can be returned directly from the index, avoiding a back-to-table lookup.