Skip to content

Automatic Materialized View Query Rewrite

Materialized views are only useful if queries actually hit them. Requiring users to SELECT FROM mv_name defeats the purpose — they’d need to know which MVs exist and rewrite their queries manually. This post covers the automatic query rewriter that transparently redirects matching queries to pre-computed results.


Without automatic rewrite, users must explicitly query the materialized view:

-- User writes this (hits MV):
SELECT * FROM ohlcv_5min WHERE symbol = 1
-- But their original query was this (full scan):
SELECT symbol, xbar(timestamp, 300000000000) AS bar,
sum(volume) AS vol
FROM trades
GROUP BY symbol, xbar(timestamp, 300000000000)

The rewriter makes the second query automatically use the MV — no query changes needed.


The rewriter runs between the query cache check and exec_select():

execute(sql)
→ parse()
→ query cache check
→ MVRewriter::try_rewrite() ← HERE
→ exec_select()

If the rewriter finds a matching MV, it returns pre-computed results directly. The query never reaches the executor’s partition scan path.


MVRewriter::try_rewrite() checks each registered MV definition against the incoming SELECT:

The rewriter only handles simple GROUP BY aggregations. Any of these cause immediate fallback to full scan:

ConditionReason
No GROUP BYNot an aggregation query
Has WHERE clauseMV doesn’t store filtered subsets
Has JOINMV is single-table only
Has HAVINGPost-aggregation filter not in MV
Has DISTINCTDedup logic not in MV
Has subquery/CTEToo complex for this phase

For each registered MV definition:

1. source_table matches? trades == trades ✓
2. GROUP BY columns match? {symbol, xbar} == {symbol, xbar} ✓
(order-insensitive set comparison)
3. xbar_bucket matches? 300000000000 == 300000000000 ✓
4. ALL agg columns match?
sum(volume) → MVAggType::SUM on "volume" ✓
count(*) → MVAggType::COUNT ✓

All four conditions must hold. Partial matches (query uses a subset of MV columns) are not supported in this phase — it’s all-or-nothing.

AVG, VWAP, and STDDEV cause immediate fallback. AVG could theoretically be decomposed into SUM/COUNT (both stored in the MV), but that optimization is future work.

AggFunc::SUM → MVAggType::SUM ✓
AggFunc::COUNT → MVAggType::COUNT ✓
AggFunc::MIN → MVAggType::MIN ✓
AggFunc::MAX → MVAggType::MAX ✓
AggFunc::FIRST → MVAggType::FIRST ✓
AggFunc::LAST → MVAggType::LAST ✓
AggFunc::AVG → fallback ✗
AggFunc::VWAP → fallback ✗

When a match is found, the rewriter calls MaterializedViewManager::query() to get pre-computed buckets. ORDER BY and LIMIT are still applied on the MV result — the rewriter doesn’t skip these clauses.

MV match found
→ MaterializedViewManager::query() → pre-computed buckets
→ apply ORDER BY (if present)
→ apply LIMIT (if present)
→ return QueryResultSet

Query PathCost
Full scan (no MV)O(n) — scan all rows in matching partitions
MV query (explicit)O(buckets) — direct MV lookup
MV rewrite (automatic)O(buckets) + O(MVs) matching overhead

The matching overhead is O(registered MVs) — typically 1-10 views. Each match check is a handful of string comparisons and set intersections. For a database with 3 MVs, the overhead is ~100ns — negligible compared to the O(n) scan it replaces.

For 1M rows with 3,334 time bars, the rewrite turns a ~10ms scan into a ~0.01ms bucket lookup — a 1000× speedup that’s completely transparent to the user.


-- Step 1: Create MV
CREATE MATERIALIZED VIEW ohlcv_5min AS
SELECT symbol, xbar(timestamp, 300000000000) AS bar,
first(price) AS open, max(price) AS high,
min(price) AS low, last(price) AS close,
sum(volume) AS vol
FROM trades
GROUP BY symbol, xbar(timestamp, 300000000000)
-- Step 2: Insert data (MV updated incrementally)
INSERT INTO trades VALUES ('AAPL', 15025, 100, 1711900800000000000)
-- Step 3: Query base table — automatically rewritten to MV
SELECT symbol, xbar(timestamp, 300000000000) AS bar,
sum(volume) AS vol
FROM trades
GROUP BY symbol, xbar(timestamp, 300000000000)
-- ↑ This hits the MV, not the raw partitions

TestVerifies
ExactMatchMV hit returns pre-aggregated data
NoMatchWhereWHERE present → full scan fallback
NoMatchAggMismatchavg() vs sum() MV → no rewrite
NoMVRegisteredNo MVs → normal execution
MultipleAggColumnsMulti-column MV match
NoMatchTableMismatchWrong source table → no rewrite

Transparent rewrite

Users query base tables normally. The rewriter silently redirects to MVs when possible.

Exact matching

All-or-nothing: source table, GROUP BY columns, xbar bucket, and all aggregations must match.

1000× speedup

O(n) partition scan → O(buckets) lookup. 10ms → 0.01ms for typical OHLCV queries.

Safe fallback

Any mismatch falls through to full scan. No incorrect results, no silent data loss.


Related: Storage Tiering and MVs → · Financial Functions → · XBAR Sorted-Scan →