SQL Index Failure Scenarios and Optimization Strategies

SQL Index Failure Scenarios and Optimization Strategies

Problem Description
Indexes are a core tool for SQL database performance optimization, but incorrect usage can lead to index failure, triggering full table scans and severely reducing query efficiency. Interviewers often test your understanding of index principles and SQL writing standards through index failure scenarios. This article systematically explains the common causes of index failure and optimization strategies.

Root Causes of Index Failure
The essence of index failure is that the database optimizer deems the cost of using an index higher than a full table scan. Common reasons include:

  1. Applying Operations or Functions to Indexed Columns

    • Failure Example: WHERE YEAR(create_time) = 2023
    • Principle: Indexes store raw values; calculations on the column prevent using the index tree for positioning.
    • Optimization: Rewrite as a range query.
      -- Optimized version
      WHERE create_time >= '2023-01-01' AND create_time < '2024-01-01'
      
  2. Implicit Type Conversion

    • Failure Example: WHERE user_id = '123' (where user_id is integer type)
    • Principle: The database needs to convert the entire column to strings for comparison, equivalent to applying a function to the indexed column.
    • Optimization: Ensure consistent data types.
      WHERE user_id = 123  -- Direct integer match
      
  3. Leading Wildcard in LIKE Queries

    • Failure Example: WHERE name LIKE '%张三'
    • Principle: B+ tree indexes are sorted by prefix and cannot locate values starting with a wildcard character.
    • Optimization Solutions:
      • Use trailing wildcard: WHERE name LIKE '张三%'
      • Use full-text index (e.g., MySQL's FULLTEXT).
  4. OR Condition Connecting to a Non-Indexed Column

    • Failure Example: WHERE indexed_column = 1 OR non_indexed_column = 2
    • Principle: The optimizer needs to scan both the index and the full table, often opting directly for a full table scan.
    • Optimization:
      -- Split into UNION query
      SELECT * FROM t WHERE indexed_column = 1
      UNION
      SELECT * FROM t WHERE non_indexed_column = 2
      
  5. Violating the Leftmost Prefix Principle

    • Failure scenarios for composite index (a, b, c):
      • Missing leftmost column: WHERE b = 1 AND c = 2
      • Skipping a column: WHERE a = 1 AND c = 2 (can only use index on column a)
    • Optimization: Adjust query order or create a covering index.
  6. Using NOT or != on Indexed Columns

    • Failure Example: WHERE status != 'deleted'
    • Principle: Requires scanning most index entries; cost may be higher than a full table scan.
    • Optimization:
      • Change to positive query: WHERE status IN ('active', 'pending')
      • For columns with high selectivity, consider forcing index usage.

Advanced Optimization Strategies

  1. Covering Index Optimization

    • Return query columns directly from the index, avoiding table lookups.
    • Example: Create index (category, price) to optimize the query:
      SELECT category, price FROM products WHERE category = 'electronics'
      
  2. Index Condition Pushdown (ICP)

    • Feature available in MySQL 5.6+ that pushes WHERE condition filtering down to the storage engine layer.
    • Example: Index (city, age), query WHERE city='北京' AND age>20
    • Traditional method: First retrieve all records where city='北京', then filter by age after table lookup.
    • ICP: Filter age>20 directly at the index layer, reducing table lookups.
  3. Index Selectivity Evaluation

    • Selectivity = Number of distinct values / Total number of records.
    • Indexes on low-selectivity columns (e.g., gender) are less effective; consider composite indexes with other columns.

Practical Checklist

  1. Use EXPLAIN to analyze the execution plan; focus on the type column (ideally 'index' or 'range').
  2. Avoid expression calculations on indexed columns in the WHERE clause.
  3. Composite indexes must follow the leftmost prefix principle; arrange columns in descending order of selectivity.
  4. Prioritize covering indexes for large datasets.
  5. Regularly update index statistics using ANALYZE TABLE.

By understanding these principles and strategies, you can proactively avoid index failures and choose the optimal indexing scheme for specific scenarios.