Designing Async Materialized Views in VeloDB (Apache Doris) for Sub-Second Dashboards
Meta description (159 chars):
Learn how to design, schedule, and tune asynchronous materialized views in VeloDB (Apache Doris) for sub-second BI dashboards with safe rewrites and low costs.
Why this matters
Your stakeholders want “live” dashboards, not nightly reports. Async materialized views (MVs) let you precompute joins/aggregations, then route queries to those results automatically—so dashboards feel instant without burning compute on every request. In VeloDB (built on Apache Doris), async MVs support multi-table definitions, incremental refresh, and transparent query rewrite, which is exactly what user-facing analytics need. (VeloDB Docs)
Concept in plain terms
Two flavors of MVs exist:
- Synchronous MV (sync MV): maintained in real-time, single-table only; great when you need strong freshness. (VeloDB Docs)
- Asynchronous MV (async MV): single- or multi-table, refreshed on a schedule or on demand, with options for partition-incremental refresh. That makes it ideal for sub-second reads with minute-level freshness. (VeloDB Docs)
Transparent rewrite: Doris/VeloDB automatically matches eligible queries to your MV (SPJG algorithm), so your BI tools keep the original SQL while the engine serves results from the MV. Use EXPLAIN to confirm rewrites. (Apache Doris)
Quick mental model (and constraints)
- Build & refresh:
BUILD IMMEDIATE | DEFERRED, thenREFRESH COMPLETE | AUTOwithON MANUALorON SCHEDULE EVERY <n> <unit>.AUTOattempts partition-incremental refresh. (Apache Doris) - Partitioning: Choose MV partitions that align with the base table’s time partition (e.g., daily), otherwise you’ll forfeit incremental refresh and pruning. (Apache Doris)
- Grace period: For transparent rewrite during ingest churn, you can allow a bounded staleness window via
grace_period. (Apache Doris) - External sources: Incremental change detection currently covers internal tables and Hive; other sources may require
REFRESH COMPLETE. (Apache Doris)
Sync vs. Async MVs (choose deliberately)
| Feature | Sync MV | Async MV |
|---|---|---|
| Query rewrite | Automatic | Automatic |
| Direct query | Not supported | Supported |
| Tables allowed | Single-table only | Single- or multi-table |
| Freshness | Real-time | Scheduled or manual (eventual) |
| Incremental refresh | N/A | Partition-incremental via AUTO |
Source: VeloDB/Doris MV overviews and async MV docs. (VeloDB Docs)
A practical blueprint (orders → 1-minute rollup)
Goal: live minute-bucket KPIs (orders, revenue) refreshed every 5 minutes with partition-incremental updates.
1) Base events (already time-partitioned)
CREATE TABLE orders_events (
event_time DATETIME NOT NULL,
order_id BIGINT NOT NULL,
customer_id BIGINT NOT NULL,
status VARCHAR(16),
amount_usd DECIMAL(12,2)
)
DUPLICATE KEY(event_time, order_id)
PARTITION BY RANGE(event_time) (
FROM ("2025-01-01") TO ("2026-01-01") INTERVAL 1 DAY
)
DISTRIBUTED BY HASH(order_id) BUCKETS 16;
Ingest continuously (Kafka → Routine Load or Flink connector). (Apache Doris)
2) Async MV (minute rollup, scheduled refresh)
CREATE MATERIALIZED VIEW mv_orders_1min
BUILD IMMEDIATE
REFRESH AUTO ON SCHEDULE EVERY 5 MINUTE
PARTITION BY date_trunc('day', event_time)
DISTRIBUTED BY RANDOM BUCKETS 16
AS
SELECT
date_trunc('minute', event_time) AS ts_minute,
COUNT(*) AS orders,
SUM(amount_usd) AS revenue,
SUM(CASE WHEN status='FAILED' THEN 1 ELSE 0 END) AS failed
FROM orders_events
GROUP BY date_trunc('minute', event_time);
REFRESH AUTOenables partition-incremental refresh (changed base partitions only).PARTITION BY date_trunc('day', …)aligns MV partitions with the table’s time window, preserving pruning and incremental behavior. (Apache Doris)
3) Verify rewrites (and that your MV is “hit”)
EXPLAIN MEMO PLAN
SELECT date_trunc('minute', event_time) AS ts_minute,
COUNT(*), SUM(amount_usd)
FROM orders_events
WHERE event_time >= now() - INTERVAL 1 HOUR
GROUP BY date_trunc('minute', event_time);
Look for the MATERIALIZATIONS section showing the chosen MV. (Apache Doris)
4) Operate and observe
- Check MV status:
SELECT *
FROM mv_infos('database'='analytics')
WHERE Name = 'mv_orders_1min';
- Force refresh:
REFRESH MATERIALIZED VIEW mv_orders_1min AUTO;or... COMPLETE; - Pause/Resume:
PAUSE MATERIALIZED VIEW ...;/RESUME MATERIALIZED VIEW ...; - Zero-downtime change:
ALTER MATERIALIZED VIEW mv_old REPLACE WITH MATERIALIZED VIEW mv_new;(atomic swap). (Apache Doris)
Best practices (what actually works)
- Design to the query, not the schema. Start from your hot BI queries and mirror their shape (grouping keys, filters, joins). The closer the MV to the query, the bigger the speedup—at the cost of generality. Balance both. (Apache Doris)
- Partition for incremental refresh. Pick daily (or hourly) partitions that match your SLA; misalignment kills pruning and forces full refresh. (Apache Doris)
- Use
grace_periodto tolerate ingestion lag. It lets rewrites proceed within a small staleness window, maintaining snappy dashboards during heavy writes. (Apache Doris) - Limit MV sprawl. Fewer, well-used MVs beat dozens of near-duplicates. Validate with
EXPLAINand usage metrics before creating another. (Apache Doris) - Mind external sources. If your MV reads JDBC/external tables, Doris can’t always detect changes—prefer
COMPLETEor land the data internally/Hive for incremental detection. (Apache Doris) - Schedule by business need, not habit. Start at 5–15 minutes for product dashboards; tighten only if users truly need it. (Compute isn’t free.)
Common pitfalls (and how to avoid them)
- Wrong MV type: Trying to sync multi-table logic with sync MVs won’t fly—use async for joins/rollups. (VeloDB Docs)
- Over-wide GROUP BY: MV cardinality explodes; aggregate at the grain users actually slice by.
- Unaligned time buckets: If users query minute buckets but your MV is hourly, rewrites often miss. Align
date_trunc()with query buckets. (Apache Doris) - One giant nightly refresh: Prefer incremental (
AUTO) plus a periodic full (e.g., daily) to heal drift. (Apache Doris)
Optional: speed up data retrieval from Python
If you extract MV results to services or notebooks, use Arrow Flight SQL with the Python ADBC driver for high-throughput reads:
import adbc_driver_flightsql.dbapi as flightsql
conn = flightsql.connect(uri="grpc://<fe-host>:<port>", db="analytics")
cur = conn.cursor()
cur.execute("""
SELECT ts_minute, orders, revenue
FROM mv_orders_1min
WHERE ts_minute >= now() - interval 1 hour
""")
rows = cur.fetchall()
Docs: Arrow Flight SQL connection guides for Doris/VeloDB. (Apache Doris)
Conclusion & takeaways
Async MVs are the right tool when you need fast, predictable dashboards with minute-level freshness. Design your MV from the top query, align partitions and time buckets, enable incremental refresh, and set a grace period to ride out ingest spikes. Then use EXPLAIN and mv_infos to verify results, and evolve definitions with atomic replacement—no downtime. (Apache Doris)
Call to action: Start with one high-value dashboard, ship a single async MV, confirm rewrite, then iterate on schedule and grain based on real usage.
Internal link ideas (official-docs friendly)
- “Async MV overview and creation syntax” (VeloDB/Doris) (VeloDB Docs)
- “Transparent query rewriting details” (Doris) (Apache Doris)
- “Routine Load from Kafka” (Doris) (Apache Doris)
- “Arrow Flight SQL connection guide” (Doris/VeloDB) (Apache Doris)
Image prompt (for DALL·E / Midjourney)
“A clean, modern diagram of VeloDB (Apache Doris) async materialized views: base event table → scheduled incremental refresh → MV partitions → transparent rewrite to BI queries. Minimalist, high-contrast, isometric style.”
Tags
#VeloDB #ApacheDoris #MaterializedViews #RealTimeAnalytics #DataEngineering #BI #QueryOptimization #Streaming #Kafka #ArrowFlightSQL










Leave a Reply