Materialized Views and Query Rewriting Techniques in Database Query Optimization
Problem Description
A Materialized View is a special type of view in a database that not only stores the query definition but also pre-computes and stores the query results. Query Rewriting is a technique where the optimizer automatically transforms a user query into an equivalent query that utilizes materialized views. This problem will detail the principles of materialized views, their applicable scenarios, and how query rewriting improves query performance.
1. Basic Concepts of Materialized Views
1. Limitations of Regular Views
- Regular View: Only saves the query logic (e.g.,
CREATE VIEW v AS SELECT * FROM t WHERE c1>10), requiring re-execution of the underlying SQL upon each query. - Issue: Repeated computation leads to performance overhead when the base table has a large amount of data or the query is complex.
2. Definition of Materialized Views
- Essence: A materialized view is a table that physically stores query results, for example:
CREATE MATERIALIZED VIEW mv_sales AS SELECT region, SUM(amount) AS total_sales FROM sales GROUP BY region; - Difference from Indexes:
- Indexes speed up queries on specific columns of a single table (e.g., B+ tree indexes).
- Materialized views can cache results of complex queries involving multi-table joins, aggregations, etc.
3. Maintenance Methods for Materialized Views
- Complete Refresh: Periodically rebuilds the entire materialized view (e.g., at midnight daily), suitable for scenarios with infrequent data updates.
- Incremental Refresh: Synchronizes only the changes from the base tables via logs (e.g., Oracle's Materialized View Logs), suitable for high real-time requirements.
2. Working Mechanism of Query Rewriting
1. Core Idea of Rewriting
The optimizer checks if a user query can be "covered" by a materialized view. For example:
- User Query:
SELECT region, SUM(amount) FROM sales GROUP BY region; - After Rewriting: Directly queries
mv_sales, avoiding a full table scan ofsales.
2. Matching Conditions for Rewriting
- Syntactic Matching: The query's projection columns, filter conditions, and join order must be compatible with the materialized view's definition.
- Semantic Matching: Ensures result consistency. For example:
- If a materialized view definition includes
WHERE status='ACTIVE', queries without this condition may need to combine base table data for completeness. - Aggregate functions must meet rewritability (e.g.,
SUMis rewritable,COUNT(DISTINCT)may not be directly reusable).
- If a materialized view definition includes
3. Scenarios Triggering Rewriting
- Complete Rewrite: The query's requirements are fully covered by a materialized view (as in the aggregation example above).
- Partial Rewrite: Only part of a subquery or table join can be replaced by a materialized view.
- Priority Determination: When multiple materialized views are available, the optimizer selects the optimal one based on a cost model.
3. Practical Application Case
Scenario: E-commerce Platform Sales Report Optimization
-
Problem:
- The base table
salescontains hundreds of millions of records, requiring frequent daily queries for sales by category. - Direct
GROUP BYqueries take minutes.
- The base table
-
Creating a Materialized View:
CREATE MATERIALIZED VIEW mv_category_sales REFRESH FAST ON COMMIT -- Incremental refresh on commit AS SELECT category_id, COUNT(*) AS order_count, SUM(amount) AS revenue FROM sales GROUP BY category_id; -
Query Rewriting Effect:
- When a user executes
SELECT category_id, SUM(amount) FROM sales GROUP BY category_id, the optimizer automatically rewrites it toSELECT * FROM mv_category_sales. - Query time drops from minutes to milliseconds.
- When a user executes
4. Notes and Limitations
- Storage Cost: Materialized views consume additional disk space.
- Data Consistency: Incremental refresh may introduce latency; the refresh strategy must be chosen based on business tolerance.
- Applicable Scenarios:
- Suitable for read-intensive, write-light data warehouses (e.g., OLAP).
- Not suitable for high-frequency update OLTP systems (due to high maintenance overhead).
- Database Support: Oracle, PostgreSQL, Snowflake, etc., support automatic query rewriting; MySQL requires manually specifying queries to materialized views.
Summary
Materialized Views, by pre-computing and storing results of complex queries, combined with Query Rewriting technology, shift runtime overhead to the preprocessing stage. They are an effective means of optimizing large-scale aggregation queries. Practical application requires balancing storage costs, refresh strategies, and the consistency requirements of business needs.