Materialized Views and Query Rewriting Techniques in Database Query Optimization

Materialized Views and Query Rewriting Techniques in Database Query Optimization

Problem Description

A Materialized View is a special type of view in a database that not only stores the query definition but also pre-computes and stores the query results. Query Rewriting is a technique where the optimizer automatically transforms a user query into an equivalent query that utilizes materialized views. This problem will detail the principles of materialized views, their applicable scenarios, and how query rewriting improves query performance.


1. Basic Concepts of Materialized Views

1. Limitations of Regular Views

  • Regular View: Only saves the query logic (e.g., CREATE VIEW v AS SELECT * FROM t WHERE c1>10), requiring re-execution of the underlying SQL upon each query.
  • Issue: Repeated computation leads to performance overhead when the base table has a large amount of data or the query is complex.

2. Definition of Materialized Views

  • Essence: A materialized view is a table that physically stores query results, for example:
    CREATE MATERIALIZED VIEW mv_sales AS 
    SELECT region, SUM(amount) AS total_sales 
    FROM sales GROUP BY region;
    
  • Difference from Indexes:
    • Indexes speed up queries on specific columns of a single table (e.g., B+ tree indexes).
    • Materialized views can cache results of complex queries involving multi-table joins, aggregations, etc.

3. Maintenance Methods for Materialized Views

  • Complete Refresh: Periodically rebuilds the entire materialized view (e.g., at midnight daily), suitable for scenarios with infrequent data updates.
  • Incremental Refresh: Synchronizes only the changes from the base tables via logs (e.g., Oracle's Materialized View Logs), suitable for high real-time requirements.

2. Working Mechanism of Query Rewriting

1. Core Idea of Rewriting

The optimizer checks if a user query can be "covered" by a materialized view. For example:

  • User Query:
    SELECT region, SUM(amount) FROM sales GROUP BY region;
    
  • After Rewriting: Directly queries mv_sales, avoiding a full table scan of sales.

2. Matching Conditions for Rewriting

  • Syntactic Matching: The query's projection columns, filter conditions, and join order must be compatible with the materialized view's definition.
  • Semantic Matching: Ensures result consistency. For example:
    • If a materialized view definition includes WHERE status='ACTIVE', queries without this condition may need to combine base table data for completeness.
    • Aggregate functions must meet rewritability (e.g., SUM is rewritable, COUNT(DISTINCT) may not be directly reusable).

3. Scenarios Triggering Rewriting

  • Complete Rewrite: The query's requirements are fully covered by a materialized view (as in the aggregation example above).
  • Partial Rewrite: Only part of a subquery or table join can be replaced by a materialized view.
  • Priority Determination: When multiple materialized views are available, the optimizer selects the optimal one based on a cost model.

3. Practical Application Case

Scenario: E-commerce Platform Sales Report Optimization

  1. Problem:

    • The base table sales contains hundreds of millions of records, requiring frequent daily queries for sales by category.
    • Direct GROUP BY queries take minutes.
  2. Creating a Materialized View:

    CREATE MATERIALIZED VIEW mv_category_sales 
    REFRESH FAST ON COMMIT  -- Incremental refresh on commit
    AS 
    SELECT category_id, COUNT(*) AS order_count, SUM(amount) AS revenue 
    FROM sales 
    GROUP BY category_id;
    
  3. Query Rewriting Effect:

    • When a user executes SELECT category_id, SUM(amount) FROM sales GROUP BY category_id, the optimizer automatically rewrites it to SELECT * FROM mv_category_sales.
    • Query time drops from minutes to milliseconds.

4. Notes and Limitations

  1. Storage Cost: Materialized views consume additional disk space.
  2. Data Consistency: Incremental refresh may introduce latency; the refresh strategy must be chosen based on business tolerance.
  3. Applicable Scenarios:
    • Suitable for read-intensive, write-light data warehouses (e.g., OLAP).
    • Not suitable for high-frequency update OLTP systems (due to high maintenance overhead).
  4. Database Support: Oracle, PostgreSQL, Snowflake, etc., support automatic query rewriting; MySQL requires manually specifying queries to materialized views.

Summary

Materialized Views, by pre-computing and storing results of complex queries, combined with Query Rewriting technology, shift runtime overhead to the preprocessing stage. They are an effective means of optimizing large-scale aggregation queries. Practical application requires balancing storage costs, refresh strategies, and the consistency requirements of business needs.