Principles of Database Query Rewrite Optimization Analysis

Principles of Database Query Rewrite Optimization Analysis

Problem Description
Database query rewrite optimization is a core technology of the query optimizer during the logical optimization phase. It transforms user-written SQL statements into equivalent forms with higher execution efficiency based on relational algebra equivalence transformation rules, without modifying the query semantics. This article will delve into the principles, common rules, and practical application scenarios of query rewriting.

I. Basic Concepts of Query Rewriting

  1. Goal: Improve query performance at the logical level, laying the foundation for subsequent physical optimizations (such as index selection).
  2. Input: Query Tree generated after syntax parsing.
  3. Core Principle: Utilize mathematical properties of relational algebra such as commutativity, associativity, and distributivity for equivalent transformations.
  4. Key Feature: The rewritten query results must be completely consistent with the original query (maintaining semantic equivalence).

II. Detailed Explanation of Predicate Pushdown

  1. Scenario Example:
    SELECT * FROM orders JOIN customers 
    ON orders.customer_id = customers.id 
    WHERE customers.country = 'China'
    
  2. Rewriting Process:
    • Original logic: Perform the join operation on the two tables first, then filter records where the country is China.
    • Optimized logic: First filter records where the country is China from the customers table, then perform the join operation.
  3. Performance Improvement Principle:
    • Reduce the amount of data that the join operation needs to process.
    • Potentially utilize indexes on the country field of the customers table.
    • Decrease the size of intermediate result sets.

III. Subquery Optimization Strategies

  1. Subquery Unnesting:

    • Scenario: Convert correlated subqueries into join operations.
    • Rewriting Example:
      -- Original query (correlated subquery)
      SELECT name FROM employees e1 
      WHERE salary > (SELECT AVG(salary) FROM employees e2 
                      WHERE e2.dept_id = e1.dept_id)
      
      -- Rewritten as a join query
      SELECT e1.name FROM employees e1 
      JOIN (SELECT dept_id, AVG(salary) avg_sal FROM employees GROUP BY dept_id) dept_avg
      ON e1.dept_id = dept_avg.dept_id 
      WHERE e1.salary > dept_avg.avg_sal
      
    • Advantage: Avoid executing the subquery once per row of data, converting it into more efficient hash joins or merge joins.
  2. EXISTS Optimization:

    • When a subquery contains EXISTS, the optimizer may rewrite it as a Semi-Join.
    • Semi-Join stops searching immediately after finding the first matching item, making it more efficient than a regular join.

IV. Query Simplification Rules

  1. Constant Expression Calculation:

    -- Before rewriting
    SELECT * FROM products WHERE price > 100*0.9
    
    -- After rewriting
    SELECT * FROM products WHERE price > 90
    
  2. Invalid Condition Elimination:

    -- Before rewriting (condition is always true)
    SELECT * FROM table WHERE 1=1 AND name='John'
    
    -- After rewriting
    SELECT * FROM table WHERE name='John'
    
  3. Join Elimination:

    • When a query involves a primary-foreign key join and only fields from the primary table are needed, the join operation can be eliminated.
    • Example:
      -- Assume orders.customer_id is a foreign key to customers.id
      -- Before rewriting
      SELECT orders.* FROM orders JOIN customers 
      ON orders.customer_id = customers.id
      
      -- After rewriting (query the orders table directly)
      SELECT * FROM orders
      

V. Set Operation Optimization

  1. UNION Rewriting:

    • By default, UNION performs deduplication and sorting, while UNION ALL is more efficient.
    • The optimizer may transform UNION into UNION ALL when it is certain that the results contain no duplicates.
  2. DISTINCT Elimination:

    • When a query includes GROUP BY or the table has a unique constraint, unnecessary DISTINCT can be eliminated.

VI. View Rewriting (View Materialization)

  1. View Merging: Expand and merge the view definition into the main query for holistic optimization.
  2. Materialized View Selection: For complex views, directly querying precomputed materialized views may be possible.

Practical Application Verification
Use the EXPLAIN command to view the rewriting effects:

EXPLAIN SELECT * FROM orders WHERE customer_id IN 
(SELECT id FROM customers WHERE country='China');

Observe whether subquery operations appear in the execution plan to verify if they have been rewritten as join operations.

Summary
Query rewrite optimization is an intelligent transformation process automatically performed by the database. By understanding its principles, developers can:

  1. Write SQL statements that are easier to optimize.
  2. Predict the behavior of the optimizer to avoid writing queries that are difficult to optimize.
  3. Verify optimization effectiveness through execution plans and continuously improve query performance.