Materialized Views in ClickHouse: Incremental vs Refreshable (and When to Use Neither)
Meta description (159 chars):
Choose the right ClickHouse materialized view. Compare incremental vs refreshable MVs, with code, trade-offs, backfill tips, and when to avoid MVs entirely.
Why this matters
You’ve got fast-growing event data, dashboards that must stay snappy, and SLAs on both freshness and cost. ClickHouse gives you two flavors of materialized view to precompute results: incremental and refreshable. Pick well and you’ll offload heavy work from reads, control write cost, and avoid storage blow-ups. Pick poorly and you’ll fight lag, MV staleness, and runaway insert overhead.
What each option actually is
Incremental Materialized Views (IMV)
- How they work: In ClickHouse, an incremental MV is effectively an insert trigger: your
SELECTruns only on the newly inserted block, and the result is written to a target table. Merge engines then combine partial states asynchronously. (ClickHouse) - When they shine: Aggregations over a single high-volume table where you want near-real-time results and tiny read latency. Scales to petabytes because each run processes only the arriving block. (ClickHouse)
Minimal example (daily rollups):
-- Source (events)
CREATE TABLE events (
ts DateTime64(3, 'UTC'),
user_id UInt64,
bytes UInt64
) ENGINE = MergeTree ORDER BY (ts, user_id);
-- Target (daily sums)
CREATE TABLE daily_bytes (
day Date,
bytes UInt64
) ENGINE = SummingMergeTree
ORDER BY day;
-- Incremental MV (runs on each insert block to events)
CREATE MATERIALIZED VIEW daily_bytes_mv
TO daily_bytes AS
SELECT
toDate(ts) AS day,
sum(bytes) AS bytes
FROM events
GROUP BY day;
Key properties: real-time updates, small targets, and work shifted to inserts. (ClickHouse)
Join caveat: with IMVs, only the left-most source table (the one receiving inserts) triggers updates; right-side tables are read but changes there don’t trigger the MV. Plan enrichment carefully. (ClickHouse)
Refreshable Materialized Views (RMV)
- How they work: You schedule the query to re-run over the full dataset; the result is stored in a target table. Supports complex queries (UNION, JOINs, table functions) and can REPLACE or APPEND on refresh. (ClickHouse)
- When they shine: Cached “top-N”, denormalized lookups, or heavy JOIN logic where near-real-time isn’t required. Incremental MVs are usually preferred for rollups, but RMVs fit when incremental is awkward or correctness needs full recompute. (ClickHouse)
Minimal example (hourly cache with REPLACE):
-- Target table
CREATE TABLE top_pages (
hour DateTime,
page LowCardinality(String),
views UInt64
) ENGINE = MergeTree ORDER BY (hour, page);
-- Refresh every 1 hour, atomically replace contents
CREATE MATERIALIZED VIEW top_pages_mv
REFRESH EVERY 1 HOUR
TO top_pages AS
SELECT
toStartOfHour(ts) AS hour,
page,
sum(1) AS views
FROM page_views
GROUP BY hour, page
ORDER BY views DESC
LIMIT 1000;
You can force or manage schedules via SYSTEM REFRESH|STOP|START VIEW and inspect status in system.view_refreshes. (ClickHouse)
Snapshot history with APPEND:
CREATE MATERIALIZED VIEW top_pages_history_mv
REFRESH EVERY 10 MINUTE APPEND
TO top_pages_history AS
SELECT now() AS snapshot_ts, page, views
FROM current_top_pages; -- any SELECT, including views/joins
APPEND keeps a timeline of refreshes; REPLACE is the default. (ClickHouse)
Side-by-side: which fits your use case?
| Criterion | Incremental MV | Refreshable MV |
|---|---|---|
| Trigger | On insert to source (block-level) | Scheduled refresh (EVERY/AFTER) |
| Freshness | Near real-time | Stale between refreshes |
| Query complexity | Best for single-table rollups; joins limited by left-table trigger | Full SQL (JOIN/UNION/table functions) allowed |
| Compute cost | Paid at write time | Paid at refresh time |
| Backfill | Use duplicate tables / INSERT … SELECT into targets; avoid POPULATE | Just re-run schedule; can SYSTEM REFRESH VIEW |
| Failure/consistency | Insert-path errors can leave partial blocks; plan idempotency | Atomic replace (REPLACE) or accumulating (APPEND) |
| Scale pattern | Excellent for huge streams | Great for caches, denorm DAGs, periodic rebuilds |
Sources: IMV/trigger semantics, POPULATE guidance, RMV syntax & modes. (ClickHouse)
Real example patterns
1) Enrich stream with a slowly changing dim (IMV + JOIN carefully)
CREATE TABLE dim_users (
user_id UInt64, tier LowCardinality(String)
) ENGINE = MergeTree ORDER BY user_id;
CREATE TABLE hits (...) ENGINE = MergeTree ORDER BY (ts, user_id);
CREATE TABLE daily_hits_by_tier (
day Date, tier LowCardinality(String), c UInt64
) ENGINE = SummingMergeTree ORDER BY (day, tier);
CREATE MATERIALIZED VIEW daily_hits_by_tier_mv
TO daily_hits_by_tier AS
SELECT
toDate(h.ts) AS day,
d.tier,
count() AS c
FROM hits AS h
LEFT JOIN dim_users AS d USING (user_id)
GROUP BY day, tier;
IMV triggers on hits inserts; changes in dim_users alone won’t backfill old rows—accept that or layer a periodic RMV to rebuild a denormalized table. (ClickHouse)
2) Periodic denormalization with dependencies (RMV DAG)
-- First view refreshes minute facts
CREATE MATERIALIZED VIEW minute_facts_mv
REFRESH EVERY 1 MINUTE
TO minute_facts AS
SELECT * FROM raw_events_transformed;
-- Downstream hourly rollup waits for minute_facts
CREATE MATERIALIZED VIEW hourly_rollup_mv
REFRESH EVERY 1 HOUR
DEPENDS ON minute_facts_mv
TO hourly_rollup AS
SELECT toStartOfHour(ts) AS hour, sum(metric) AS m
FROM minute_facts GROUP BY hour;
Use DEPENDS ON to coordinate multi-stage refresh pipelines. (ClickHouse)
Backfill without shooting yourself in the foot
- Avoid
POPULATEfor anything but small, paused datasets; it can miss rows during creation. Prefer explicitINSERT … SELECTinto the MV’s target (or duplicate target thenMOVE PARTITION). (ClickHouse) - Duplicate-tables strategy: create
_v2source/targets, run backfill, move partitions, drop duplicates—keeps live ingest unharmed. (ClickHouse) - Null engine trick for large recomputes: write only the columns needed to a
ENGINE = Nulltable to drive the MV incrementally without storing raw backfill. (ClickHouse)
Best practices & common pitfalls
Do this
- Choose a MergeTree target that matches your aggregation (e.g.,
SummingMergeTree,AggregatingMergeTreestoring states then…Merge()at read time). (ClickHouse) - For RMVs, set refresh so the query finishes well before the next tick; otherwise you’ll permanently lag. Tune/alter schedules with
ALTER TABLE … MODIFY REFRESH. (ClickHouse) - Monitor
system.view_refreshesand query logs to spot regressions. (ClickHouse)
Avoid this
- “10× storage anti-pattern.” If your
GROUP BYexplodes row counts, you’re building an expensive index, not an MV. Validate aggregation ratio before creating the MV. (ClickHouse) - Expecting IMVs to react to UPDATE/DELETE on the source; they don’t. Plan a periodic RMV or explicit maintenance if source data mutates. (ClickHouse)
- Relying on IMVs to auto-recompute complex multi-table joins—use RMVs or rethink the model. (ClickHouse)
When to use neither MV
Sometimes the right answer is no MV:
- Projections: For single-table performance (different sort orders, pre-agg on same table) that’s transparent to queries, consider projections instead of managing target tables. (ClickHouse)
- Query cache / condition cache: If the pain is repeated identical (or similar) queries, caches might remove the need for an MV altogether. (ClickHouse)
- Data-skipping indexes / better PK: If you’re scanning too much, fix table design and add skip indexes first. MVs won’t save a bad primary key. (ClickHouse)
Quick decision guide
- Need real-time rollups of a firehose? → Incremental MV.
- Complex join or denorm, freshness can lag minutes+? → Refreshable MV.
- Optimizing single-table read paths with no schema fan-out? → Projection, not MV. (ClickHouse)
SEO-ready code snippets you can paste today
Check last refresh & control RMVs
SELECT database, view, status, last_success_time, next_refresh_time
FROM system.view_refreshes
ORDER BY last_refresh_time DESC;
SYSTEM REFRESH VIEW top_pages_mv;
ALTER TABLE top_pages_mv MODIFY REFRESH EVERY 5 MINUTE;
Cascading IMVs with Aggregating states
-- Stage 1 (monthly states)
CREATE TABLE monthly_stats (
month Date,
domain String,
bytes_state AggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree
ORDER BY (domain, month);
CREATE MATERIALIZED VIEW monthly_stats_mv
TO monthly_stats AS
SELECT
toStartOfMonth(ts) AS month,
domain,
sumState(bytes) AS bytes_state
FROM raw_hourly
GROUP BY month, domain;
-- Stage 2 (yearly summary from states)
CREATE TABLE yearly_stats (
year UInt16, domain String, bytes UInt64
) ENGINE = SummingMergeTree ORDER BY (year, domain);
CREATE MATERIALIZED VIEW yearly_stats_mv
TO yearly_stats AS
SELECT
toYear(month) AS year,
domain,
sumMerge(bytes_state) AS bytes
FROM monthly_stats
GROUP BY year, domain;
Summary & call to action
- Default to Incremental MV for streaming rollups; it’s cheaper at read time and scales naturally.
- Use Refreshable MV for scheduled denormalization and complex queries—especially when freshness can lag and correctness demands a full recompute.
- If you’re optimizing one table’s read path, projections or indexes may beat any MV.
If you want, share your schema and query patterns—I’ll sketch the exact MV (or non-MV) design, target engines, and a safe backfill plan tailored to your workload.
Internal link ideas (official docs)
- Incremental materialized view (concepts & examples). (ClickHouse)
- Refreshable materialized view (scheduling, APPEND/REPLACE, ops). (ClickHouse)
- CREATE VIEW (syntax for both MV types, POPULATE warnings, SQL security). (ClickHouse)
- Backfilling Data (duplicate tables, Null engine, move partitions). (ClickHouse)
- Cascading Materialized Views (multi-stage pipelines). (ClickHouse)
- Materialized Views vs Projections (choose MV vs projection). (ClickHouse)
- Query cache / Condition cache (reduce repeats without MVs). (ClickHouse)
- Data-skipping indexes (optimize scans first). (ClickHouse)
Image prompt
“A clean, modern architecture diagram comparing incremental vs refreshable materialized views in ClickHouse—left shows insert-triggered block processing into a Summing/AggregatingMergeTree; right shows scheduled full-dataset recompute with REPLACE/APPEND. Minimalist, high contrast, 3D isometric style.”
Tags
#ClickHouse #MaterializedView #IncrementalMV #RefreshableMV #DataEngineering #RealTimeAnalytics #Projections #DatabaseDesign #Performance #Backfilling













Leave a Reply