Database Query Execution Process and SQL Parsing

Database Query Execution Process and SQL Parsing

Problem Description:
When a database system receives a SQL query statement (for example, SELECT * FROM users WHERE age > 25;), what key steps does it go through from receiving the request to returning the result? Please elaborate on the parsing and execution process of the SQL statement.


Detailed Explanation of the Solution Process:

1. Query Reception and Preliminary Processing

  • After receiving the SQL string sent by the client, the database server first performs syntax validation (e.g., checks for matching parentheses, correct keyword spelling).
  • If a syntax error is found, an error message is immediately returned (e.g., MySQL's "You have an error in your SQL syntax").
  • Upon successful validation, the system converts the SQL string into an internal data structure (e.g., Abstract Syntax Tree, AST) for subsequent processing.

2. Lexical Analysis and Syntax Analysis (Parsing)

  • Lexical Analysis: Splits the SQL string into meaningful "lexical units" (Tokens).
    Example: SELECT * FROM users WHERE age > 25
    → Tokens: [SELECT, *, FROM, users, WHERE, age, >, 25]
  • Syntax Analysis: Organizes the Tokens into a tree structure (AST) according to the database's syntax rules (e.g., SQL standard).
    For example: The root node of the AST is SELECT, with child nodes containing FROM users and WHERE age>25.

3. Semantic Analysis

  • Checks the logical correctness of the AST:
    • Does the table users exist?
    • Does the column age belong to the table users?
    • Does the user have permission to access this data?
  • The system verifies table/column information by querying the data dictionary (e.g., MySQL's information_schema).
  • If a semantic error occurs (e.g., table does not exist), an error message is returned.

4. Query Optimizer Generates Execution Plan

  • Based on the AST and statistical information (e.g., table size, index distribution), the optimizer generates multiple candidate execution plans.
    Possible plans for the example query:
    • Full table scan of users, filtering rows for age > 25;
    • If an index on age exists, prioritize using the index to locate data, then perform a lookup back to the table.
  • The optimizer selects the optimal plan through a cost model (based on estimated I/O and CPU overhead).

5. Query Execution Engine Processing

  • The execution engine operates step-by-step according to the plan selected by the optimizer:
    • Data Access: Calls storage engine interfaces as per the plan (e.g., reading data pages from disk).
    • Apply Filter Conditions: Evaluates age > 25 row by row, adding rows that meet the condition to the result set.
    • Handle Joins/Sorting (if the query includes JOIN or ORDER BY).
  • The process may utilize buffers (e.g., Buffer Pool) to reduce disk I/O.

6. Result Return and Cleanup

  • Returns the final result set to the client (potentially in batches).
  • Releases resources occupied by the query (e.g., temporary tables, memory contexts).

Key Points Summary:

  • Parsing Phase (Lexical/Syntax/Semantic Analysis) ensures the SQL is legal and executable;
  • Optimization Phase selects an efficient path through a cost model;
  • Execution Phase relies on storage engines and buffer management to perform data operations.
    This process demonstrates how database systems balance correctness, efficiency, and resource management.