Database Query Execution Plan and Execution Order Analysis
Problem Description
A database query execution plan is a step-by-step blueprint generated by the optimizer for executing an SQL query. Understanding the order of execution (such as which table is accessed first in multi-table JOINs, and how indexes are used) is crucial for performance tuning. This topic will explain how to analyze the actual execution order of a query through the execution plan and interpret the execution logic of common operators (such as Nested Loop, Hash Join).
1. Generation and Representation of Execution Plans
- Generation Process: The optimizer generates multiple candidate plans based on the query statement, table statistics, indexes, etc., and selects the plan with the lowest cost.
- Representation Form: Usually displayed as a tree structure, where child nodes execute first, and their results are passed to parent nodes. For example:
This indicates a full table scan onHash Join ├── Seq Scan on table_A └── Seq Scan on table_Btable_Aandtable_Bfirst, followed by a Hash Join.
2. Basic Principles of Execution Order
- From Leaves to Root: In the execution plan tree, the lowest-level operators (such as index scans, full table scans) execute first, and intermediate results are gradually passed upward.
- Data Flow Direction: The output of child nodes serves as input for parent nodes. For example, in a Nested Loop Join, each row from the outer table (driving table) is matched with all rows from the inner table.
3. Execution Logic of Common Operators
-
Nested Loop Join:
- First, execute the outer table (e.g., index scan) to obtain the driving dataset.
- For each row in the outer table, traverse the inner table (possibly using an index) to match data.
- Characteristics: Suitable for scenarios where the driving table has a small amount of data or the inner table has an efficient index.
-
Hash Join:
- First, execute the inner table (right table) scan and build a hash table in memory.
- Scan the outer table (left table) and probe the hash table for each row to find matches.
- Characteristics: Suitable for scenarios with large table data and no indexes, but requires memory support.
-
Merge Join:
- Sort both tables by the join key (skip if already sorted).
- Use a two-pointer traversal of the sorted datasets to merge matching items.
- Characteristics: Suitable for scenarios where data is already sorted or the query requires sorting.
4. Practical Case Analysis
Assume the query:
SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id
WHERE customers.country = 'US';
A possible execution plan:
Hash Join
├── Seq Scan on customers
│ └── Filter: (country = 'US')
└── Seq Scan on orders
Execution Order:
- Perform a full table scan on the
customerstable and filter out data wherecountry='US'. - Perform a full table scan on the
orderstable. - Use the filtered results from
customersas the inner data for the hash table andordersas the outer data, then execute the Hash Join.
5. How to Verify Execution Order
- Use
EXPLAIN ANALYZE(e.g., in PostgreSQL) or actual execution plans (e.g., in SQL Server) to view actual execution time and row counts. - Observe the
actual rowsandloopsof operators: In a Nested Loop, theloopsvalue of the inner table usually equals the row count of the outer table.
6. Performance Tuning Insights
- If the driving table has too much data, consider adding indexes or conditional filters.
- Avoid full table scans on the inner table: Create indexes for join keys or filter conditions.
- Pay attention to data skew: In Hash Join, an oversized hash table may cause memory overflow.
By step-by-step analysis of the structure of the execution plan tree and the characteristics of operators, query bottlenecks can be accurately identified and targeted optimizations can be applied.