Materialized Views in ClickHouse

Materialized Views in ClickHouse: Incremental vs Refreshable (and When to Use Neither)

Meta description (159 chars):
Choose the right ClickHouse materialized view. Compare incremental vs refreshable MVs, with code, trade-offs, backfill tips, and when to avoid MVs entirely.


Why this matters

You’ve got fast-growing event data, dashboards that must stay snappy, and SLAs on both freshness and cost. ClickHouse gives you two flavors of materialized view to precompute results: incremental and refreshable. Pick well and you’ll offload heavy work from reads, control write cost, and avoid storage blow-ups. Pick poorly and you’ll fight lag, MV staleness, and runaway insert overhead.


What each option actually is

Incremental Materialized Views (IMV)

  • How they work: In ClickHouse, an incremental MV is effectively an insert trigger: your SELECT runs only on the newly inserted block, and the result is written to a target table. Merge engines then combine partial states asynchronously. (ClickHouse)
  • When they shine: Aggregations over a single high-volume table where you want near-real-time results and tiny read latency. Scales to petabytes because each run processes only the arriving block. (ClickHouse)

Minimal example (daily rollups):

-- Source (events)
CREATE TABLE events (
  ts DateTime64(3, 'UTC'),
  user_id UInt64,
  bytes UInt64
) ENGINE = MergeTree ORDER BY (ts, user_id);

-- Target (daily sums)
CREATE TABLE daily_bytes (
  day Date,
  bytes UInt64
) ENGINE = SummingMergeTree
ORDER BY day;

-- Incremental MV (runs on each insert block to events)
CREATE MATERIALIZED VIEW daily_bytes_mv
TO daily_bytes AS
SELECT
  toDate(ts) AS day,
  sum(bytes) AS bytes
FROM events
GROUP BY day;

Key properties: real-time updates, small targets, and work shifted to inserts. (ClickHouse)

Join caveat: with IMVs, only the left-most source table (the one receiving inserts) triggers updates; right-side tables are read but changes there don’t trigger the MV. Plan enrichment carefully. (ClickHouse)


Refreshable Materialized Views (RMV)

  • How they work: You schedule the query to re-run over the full dataset; the result is stored in a target table. Supports complex queries (UNION, JOINs, table functions) and can REPLACE or APPEND on refresh. (ClickHouse)
  • When they shine: Cached “top-N”, denormalized lookups, or heavy JOIN logic where near-real-time isn’t required. Incremental MVs are usually preferred for rollups, but RMVs fit when incremental is awkward or correctness needs full recompute. (ClickHouse)

Minimal example (hourly cache with REPLACE):

-- Target table
CREATE TABLE top_pages (
  hour DateTime,
  page LowCardinality(String),
  views UInt64
) ENGINE = MergeTree ORDER BY (hour, page);

-- Refresh every 1 hour, atomically replace contents
CREATE MATERIALIZED VIEW top_pages_mv
REFRESH EVERY 1 HOUR
TO top_pages AS
SELECT
  toStartOfHour(ts) AS hour,
  page,
  sum(1) AS views
FROM page_views
GROUP BY hour, page
ORDER BY views DESC
LIMIT 1000;

You can force or manage schedules via SYSTEM REFRESH|STOP|START VIEW and inspect status in system.view_refreshes. (ClickHouse)

Snapshot history with APPEND:

CREATE MATERIALIZED VIEW top_pages_history_mv
REFRESH EVERY 10 MINUTE APPEND
TO top_pages_history AS
SELECT now() AS snapshot_ts, page, views
FROM current_top_pages;  -- any SELECT, including views/joins

APPEND keeps a timeline of refreshes; REPLACE is the default. (ClickHouse)


Side-by-side: which fits your use case?

CriterionIncremental MVRefreshable MV
TriggerOn insert to source (block-level)Scheduled refresh (EVERY/AFTER)
FreshnessNear real-timeStale between refreshes
Query complexityBest for single-table rollups; joins limited by left-table triggerFull SQL (JOIN/UNION/table functions) allowed
Compute costPaid at write timePaid at refresh time
BackfillUse duplicate tables / INSERT … SELECT into targets; avoid POPULATEJust re-run schedule; can SYSTEM REFRESH VIEW
Failure/consistencyInsert-path errors can leave partial blocks; plan idempotencyAtomic replace (REPLACE) or accumulating (APPEND)
Scale patternExcellent for huge streamsGreat for caches, denorm DAGs, periodic rebuilds

Sources: IMV/trigger semantics, POPULATE guidance, RMV syntax & modes. (ClickHouse)


Real example patterns

1) Enrich stream with a slowly changing dim (IMV + JOIN carefully)

CREATE TABLE dim_users (
  user_id UInt64, tier LowCardinality(String)
) ENGINE = MergeTree ORDER BY user_id;

CREATE TABLE hits (...) ENGINE = MergeTree ORDER BY (ts, user_id);

CREATE TABLE daily_hits_by_tier (
  day Date, tier LowCardinality(String), c UInt64
) ENGINE = SummingMergeTree ORDER BY (day, tier);

CREATE MATERIALIZED VIEW daily_hits_by_tier_mv
TO daily_hits_by_tier AS
SELECT
  toDate(h.ts) AS day,
  d.tier,
  count() AS c
FROM hits AS h
LEFT JOIN dim_users AS d USING (user_id)
GROUP BY day, tier;

IMV triggers on hits inserts; changes in dim_users alone won’t backfill old rows—accept that or layer a periodic RMV to rebuild a denormalized table. (ClickHouse)

2) Periodic denormalization with dependencies (RMV DAG)

-- First view refreshes minute facts
CREATE MATERIALIZED VIEW minute_facts_mv
REFRESH EVERY 1 MINUTE
TO minute_facts AS
SELECT * FROM raw_events_transformed;

-- Downstream hourly rollup waits for minute_facts
CREATE MATERIALIZED VIEW hourly_rollup_mv
REFRESH EVERY 1 HOUR
DEPENDS ON minute_facts_mv
TO hourly_rollup AS
SELECT toStartOfHour(ts) AS hour, sum(metric) AS m
FROM minute_facts GROUP BY hour;

Use DEPENDS ON to coordinate multi-stage refresh pipelines. (ClickHouse)


Backfill without shooting yourself in the foot

  • Avoid POPULATE for anything but small, paused datasets; it can miss rows during creation. Prefer explicit INSERT … SELECT into the MV’s target (or duplicate target then MOVE PARTITION). (ClickHouse)
  • Duplicate-tables strategy: create _v2 source/targets, run backfill, move partitions, drop duplicates—keeps live ingest unharmed. (ClickHouse)
  • Null engine trick for large recomputes: write only the columns needed to a ENGINE = Null table to drive the MV incrementally without storing raw backfill. (ClickHouse)

Best practices & common pitfalls

Do this

  • Choose a MergeTree target that matches your aggregation (e.g., SummingMergeTree, AggregatingMergeTree storing states then …Merge() at read time). (ClickHouse)
  • For RMVs, set refresh so the query finishes well before the next tick; otherwise you’ll permanently lag. Tune/alter schedules with ALTER TABLE … MODIFY REFRESH. (ClickHouse)
  • Monitor system.view_refreshes and query logs to spot regressions. (ClickHouse)

Avoid this

  • “10× storage anti-pattern.” If your GROUP BY explodes row counts, you’re building an expensive index, not an MV. Validate aggregation ratio before creating the MV. (ClickHouse)
  • Expecting IMVs to react to UPDATE/DELETE on the source; they don’t. Plan a periodic RMV or explicit maintenance if source data mutates. (ClickHouse)
  • Relying on IMVs to auto-recompute complex multi-table joins—use RMVs or rethink the model. (ClickHouse)

When to use neither MV

Sometimes the right answer is no MV:

  • Projections: For single-table performance (different sort orders, pre-agg on same table) that’s transparent to queries, consider projections instead of managing target tables. (ClickHouse)
  • Query cache / condition cache: If the pain is repeated identical (or similar) queries, caches might remove the need for an MV altogether. (ClickHouse)
  • Data-skipping indexes / better PK: If you’re scanning too much, fix table design and add skip indexes first. MVs won’t save a bad primary key. (ClickHouse)

Quick decision guide

  • Need real-time rollups of a firehose?Incremental MV.
  • Complex join or denorm, freshness can lag minutes+?Refreshable MV.
  • Optimizing single-table read paths with no schema fan-out?Projection, not MV. (ClickHouse)

SEO-ready code snippets you can paste today

Check last refresh & control RMVs

SELECT database, view, status, last_success_time, next_refresh_time
FROM system.view_refreshes
ORDER BY last_refresh_time DESC;

SYSTEM REFRESH VIEW top_pages_mv;
ALTER TABLE top_pages_mv MODIFY REFRESH EVERY 5 MINUTE;

(ClickHouse)

Cascading IMVs with Aggregating states

-- Stage 1 (monthly states)
CREATE TABLE monthly_stats (
  month Date,
  domain String,
  bytes_state AggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree
ORDER BY (domain, month);

CREATE MATERIALIZED VIEW monthly_stats_mv
TO monthly_stats AS
SELECT
  toStartOfMonth(ts) AS month,
  domain,
  sumState(bytes) AS bytes_state
FROM raw_hourly
GROUP BY month, domain;

-- Stage 2 (yearly summary from states)
CREATE TABLE yearly_stats (
  year UInt16, domain String, bytes UInt64
) ENGINE = SummingMergeTree ORDER BY (year, domain);

CREATE MATERIALIZED VIEW yearly_stats_mv
TO yearly_stats AS
SELECT
  toYear(month) AS year,
  domain,
  sumMerge(bytes_state) AS bytes
FROM monthly_stats
GROUP BY year, domain;

(ClickHouse)


Summary & call to action

  • Default to Incremental MV for streaming rollups; it’s cheaper at read time and scales naturally.
  • Use Refreshable MV for scheduled denormalization and complex queries—especially when freshness can lag and correctness demands a full recompute.
  • If you’re optimizing one table’s read path, projections or indexes may beat any MV.

If you want, share your schema and query patterns—I’ll sketch the exact MV (or non-MV) design, target engines, and a safe backfill plan tailored to your workload.


Internal link ideas (official docs)

  • Incremental materialized view (concepts & examples). (ClickHouse)
  • Refreshable materialized view (scheduling, APPEND/REPLACE, ops). (ClickHouse)
  • CREATE VIEW (syntax for both MV types, POPULATE warnings, SQL security). (ClickHouse)
  • Backfilling Data (duplicate tables, Null engine, move partitions). (ClickHouse)
  • Cascading Materialized Views (multi-stage pipelines). (ClickHouse)
  • Materialized Views vs Projections (choose MV vs projection). (ClickHouse)
  • Query cache / Condition cache (reduce repeats without MVs). (ClickHouse)
  • Data-skipping indexes (optimize scans first). (ClickHouse)

Image prompt

“A clean, modern architecture diagram comparing incremental vs refreshable materialized views in ClickHouse—left shows insert-triggered block processing into a Summing/AggregatingMergeTree; right shows scheduled full-dataset recompute with REPLACE/APPEND. Minimalist, high contrast, 3D isometric style.”

Tags

#ClickHouse #MaterializedView #IncrementalMV #RefreshableMV #DataEngineering #RealTimeAnalytics #Projections #DatabaseDesign #Performance #Backfilling

Leave a Reply

Your email address will not be published. Required fields are marked *