Materialized Views in ClickHouse: Incremental vs Refreshable (and When to Use Neither)

Meta description (159 chars):
Choose the right ClickHouse materialized view. Compare incremental vs refreshable MVs, with code, trade-offs, backfill tips, and when to avoid MVs entirely.

Why this matters

You’ve got fast-growing event data, dashboards that must stay snappy, and SLAs on both freshness and cost. ClickHouse gives you two flavors of materialized view to precompute results: incremental and refreshable. Pick well and you’ll offload heavy work from reads, control write cost, and avoid storage blow-ups. Pick poorly and you’ll fight lag, MV staleness, and runaway insert overhead.

What each option actually is

Incremental Materialized Views (IMV)

How they work: In ClickHouse, an incremental MV is effectively an insert trigger: your SELECT runs only on the newly inserted block, and the result is written to a target table. Merge engines then combine partial states asynchronously. (ClickHouse)
When they shine: Aggregations over a single high-volume table where you want near-real-time results and tiny read latency. Scales to petabytes because each run processes only the arriving block. (ClickHouse)

Minimal example (daily rollups):

-- Source (events)
CREATE TABLE events (
  ts DateTime64(3, 'UTC'),
  user_id UInt64,
  bytes UInt64
) ENGINE = MergeTree ORDER BY (ts, user_id);

-- Target (daily sums)
CREATE TABLE daily_bytes (
  day Date,
  bytes UInt64
) ENGINE = SummingMergeTree
ORDER BY day;

-- Incremental MV (runs on each insert block to events)
CREATE MATERIALIZED VIEW daily_bytes_mv
TO daily_bytes AS
SELECT
  toDate(ts) AS day,
  sum(bytes) AS bytes
FROM events
GROUP BY day;

Key properties: real-time updates, small targets, and work shifted to inserts. (ClickHouse)

Join caveat: with IMVs, only the left-most source table (the one receiving inserts) triggers updates; right-side tables are read but changes there don’t trigger the MV. Plan enrichment carefully. (ClickHouse)

Refreshable Materialized Views (RMV)

How they work: You schedule the query to re-run over the full dataset; the result is stored in a target table. Supports complex queries (UNION, JOINs, table functions) and can REPLACE or APPEND on refresh. (ClickHouse)
When they shine: Cached “top-N”, denormalized lookups, or heavy JOIN logic where near-real-time isn’t required. Incremental MVs are usually preferred for rollups, but RMVs fit when incremental is awkward or correctness needs full recompute. (ClickHouse)

Minimal example (hourly cache with REPLACE):

-- Target table
CREATE TABLE top_pages (
  hour DateTime,
  page LowCardinality(String),
  views UInt64
) ENGINE = MergeTree ORDER BY (hour, page);

-- Refresh every 1 hour, atomically replace contents
CREATE MATERIALIZED VIEW top_pages_mv
REFRESH EVERY 1 HOUR
TO top_pages AS
SELECT
  toStartOfHour(ts) AS hour,
  page,
  sum(1) AS views
FROM page_views
GROUP BY hour, page
ORDER BY views DESC
LIMIT 1000;

You can force or manage schedules via SYSTEM REFRESH|STOP|START VIEW and inspect status in system.view_refreshes. (ClickHouse)

Snapshot history with APPEND:

CREATE MATERIALIZED VIEW top_pages_history_mv
REFRESH EVERY 10 MINUTE APPEND
TO top_pages_history AS
SELECT now() AS snapshot_ts, page, views
FROM current_top_pages;  -- any SELECT, including views/joins

APPEND keeps a timeline of refreshes; REPLACE is the default. (ClickHouse)

Side-by-side: which fits your use case?

Criterion	Incremental MV	Refreshable MV
Trigger	On insert to source (block-level)	Scheduled refresh (EVERY/AFTER)
Freshness	Near real-time	Stale between refreshes
Query complexity	Best for single-table rollups; joins limited by left-table trigger	Full SQL (JOIN/UNION/table functions) allowed
Compute cost	Paid at write time	Paid at refresh time
Backfill	Use duplicate tables / `INSERT … SELECT` into targets; avoid POPULATE	Just re-run schedule; can `SYSTEM REFRESH VIEW`
Failure/consistency	Insert-path errors can leave partial blocks; plan idempotency	Atomic replace (REPLACE) or accumulating (APPEND)
Scale pattern	Excellent for huge streams	Great for caches, denorm DAGs, periodic rebuilds

Sources: IMV/trigger semantics, POPULATE guidance, RMV syntax & modes. (ClickHouse)

Real example patterns

1) Enrich stream with a slowly changing dim (IMV + JOIN carefully)

CREATE TABLE dim_users (
  user_id UInt64, tier LowCardinality(String)
) ENGINE = MergeTree ORDER BY user_id;

CREATE TABLE hits (...) ENGINE = MergeTree ORDER BY (ts, user_id);

CREATE TABLE daily_hits_by_tier (
  day Date, tier LowCardinality(String), c UInt64
) ENGINE = SummingMergeTree ORDER BY (day, tier);

CREATE MATERIALIZED VIEW daily_hits_by_tier_mv
TO daily_hits_by_tier AS
SELECT
  toDate(h.ts) AS day,
  d.tier,
  count() AS c
FROM hits AS h
LEFT JOIN dim_users AS d USING (user_id)
GROUP BY day, tier;

IMV triggers on hits inserts; changes in dim_users alone won’t backfill old rows—accept that or layer a periodic RMV to rebuild a denormalized table. (ClickHouse)

2) Periodic denormalization with dependencies (RMV DAG)

-- First view refreshes minute facts
CREATE MATERIALIZED VIEW minute_facts_mv
REFRESH EVERY 1 MINUTE
TO minute_facts AS
SELECT * FROM raw_events_transformed;

-- Downstream hourly rollup waits for minute_facts
CREATE MATERIALIZED VIEW hourly_rollup_mv
REFRESH EVERY 1 HOUR
DEPENDS ON minute_facts_mv
TO hourly_rollup AS
SELECT toStartOfHour(ts) AS hour, sum(metric) AS m
FROM minute_facts GROUP BY hour;

Use DEPENDS ON to coordinate multi-stage refresh pipelines. (ClickHouse)

Backfill without shooting yourself in the foot

Avoid POPULATE for anything but small, paused datasets; it can miss rows during creation. Prefer explicit INSERT … SELECT into the MV’s target (or duplicate target then MOVE PARTITION). (ClickHouse)
Duplicate-tables strategy: create _v2 source/targets, run backfill, move partitions, drop duplicates—keeps live ingest unharmed. (ClickHouse)
Null engine trick for large recomputes: write only the columns needed to a ENGINE = Null table to drive the MV incrementally without storing raw backfill. (ClickHouse)

Best practices & common pitfalls

Do this

Choose a MergeTree target that matches your aggregation (e.g., SummingMergeTree, AggregatingMergeTree storing states then …Merge() at read time). (ClickHouse)
For RMVs, set refresh so the query finishes well before the next tick; otherwise you’ll permanently lag. Tune/alter schedules with ALTER TABLE … MODIFY REFRESH. (ClickHouse)
Monitor system.view_refreshes and query logs to spot regressions. (ClickHouse)

Avoid this

“10× storage anti-pattern.” If your GROUP BY explodes row counts, you’re building an expensive index, not an MV. Validate aggregation ratio before creating the MV. (ClickHouse)
Expecting IMVs to react to UPDATE/DELETE on the source; they don’t. Plan a periodic RMV or explicit maintenance if source data mutates. (ClickHouse)
Relying on IMVs to auto-recompute complex multi-table joins—use RMVs or rethink the model. (ClickHouse)

When to use neither MV

Sometimes the right answer is no MV:

Projections: For single-table performance (different sort orders, pre-agg on same table) that’s transparent to queries, consider projections instead of managing target tables. (ClickHouse)
Query cache / condition cache: If the pain is repeated identical (or similar) queries, caches might remove the need for an MV altogether. (ClickHouse)
Data-skipping indexes / better PK: If you’re scanning too much, fix table design and add skip indexes first. MVs won’t save a bad primary key. (ClickHouse)

Quick decision guide

Need real-time rollups of a firehose? → Incremental MV.
Complex join or denorm, freshness can lag minutes+? → Refreshable MV.
Optimizing single-table read paths with no schema fan-out? → Projection, not MV. (ClickHouse)

SEO-ready code snippets you can paste today

Check last refresh & control RMVs

SELECT database, view, status, last_success_time, next_refresh_time
FROM system.view_refreshes
ORDER BY last_refresh_time DESC;

SYSTEM REFRESH VIEW top_pages_mv;
ALTER TABLE top_pages_mv MODIFY REFRESH EVERY 5 MINUTE;

(ClickHouse)

Cascading IMVs with Aggregating states

-- Stage 1 (monthly states)
CREATE TABLE monthly_stats (
  month Date,
  domain String,
  bytes_state AggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree
ORDER BY (domain, month);

CREATE MATERIALIZED VIEW monthly_stats_mv
TO monthly_stats AS
SELECT
  toStartOfMonth(ts) AS month,
  domain,
  sumState(bytes) AS bytes_state
FROM raw_hourly
GROUP BY month, domain;

-- Stage 2 (yearly summary from states)
CREATE TABLE yearly_stats (
  year UInt16, domain String, bytes UInt64
) ENGINE = SummingMergeTree ORDER BY (year, domain);

CREATE MATERIALIZED VIEW yearly_stats_mv
TO yearly_stats AS
SELECT
  toYear(month) AS year,
  domain,
  sumMerge(bytes_state) AS bytes
FROM monthly_stats
GROUP BY year, domain;

(ClickHouse)

Summary & call to action

Default to Incremental MV for streaming rollups; it’s cheaper at read time and scales naturally.
Use Refreshable MV for scheduled denormalization and complex queries—especially when freshness can lag and correctness demands a full recompute.
If you’re optimizing one table’s read path, projections or indexes may beat any MV.

If you want, share your schema and query patterns—I’ll sketch the exact MV (or non-MV) design, target engines, and a safe backfill plan tailored to your workload.

Internal link ideas (official docs)

Incremental materialized view (concepts & examples). (ClickHouse)
Refreshable materialized view (scheduling, APPEND/REPLACE, ops). (ClickHouse)
CREATE VIEW (syntax for both MV types, POPULATE warnings, SQL security). (ClickHouse)
Backfilling Data (duplicate tables, Null engine, move partitions). (ClickHouse)
Cascading Materialized Views (multi-stage pipelines). (ClickHouse)
Materialized Views vs Projections (choose MV vs projection). (ClickHouse)
Query cache / Condition cache (reduce repeats without MVs). (ClickHouse)
Data-skipping indexes (optimize scans first). (ClickHouse)

Image prompt

“A clean, modern architecture diagram comparing incremental vs refreshable materialized views in ClickHouse—left shows insert-triggered block processing into a Summing/AggregatingMergeTree; right shows scheduled full-dataset recompute with REPLACE/APPEND. Minimalist, high contrast, 3D isometric style.”