ClickHouse – Data/ML Engineer Blog

ClickHouse for Data Engineers: Practical Architecture, Modeling & Performance Tuning

Why this matters (a quick story)

Your product team just shipped a new feed. Traffic spiked, logs exploded, dashboards crawl. OLTP stores wheeze; your warehouse is fast-ish but costs a kidney for real-time. ClickHouse can slice billions in milliseconds—if you design tables, ingestion, and queries the ClickHouse way. This guide gives you the exact mental model and recipes you’ll use in production.

Core concepts you must internalize

1) Columnar + parts/merges (MergeTree family)

ClickHouse stores columns separately and writes immutable “parts.” A background process merges parts for better compression and read speed. Most serious tables use the MergeTree family (Replacing/Summing/Aggregating/…MergeTree). (ClickHouse)

2) Sorting key vs primary key (they’re not the same)

Disk is ordered by the sorting key (ORDER BY). If you don’t explicitly define a primary key, it defaults to the sorting key; if you define both, the primary key must be a prefix of the sorting key. This impacts pruning and I/O. (ClickHouse)

3) Partitioning is an ops tool, not a speed hack

PARTITION BY organizes parts into logical chunks (often by month/day). Use it for retention, movement, and maintenance (drop/move/cold storage)—not to accelerate individual queries. (ClickHouse)

4) Distributed queries, replicated storage

Distributed engine tables don’t hold data; they fan out queries to shards and parallelize reads. (ClickHouse)
Use ReplicatedMergeTree for HA. In production, coordinate with ClickHouse Keeper (ZooKeeper replacement) and run Keeper on dedicated small hosts. (ClickHouse)

5) Materialized views (MVs) shift cost to writes

Incremental MVs run queries on inserted blocks and persist results for fast reads (think: rollups/denorm). They’re triggers, not query caches. Use judiciously. (ClickHouse)

6) Secondary (data-skipping) indexes

Skip indexes (minmax/bloom/token/gram) help prune blocks when the primary/sorting key can’t. They’re powerful but easy to misuse—test on real data. (ClickHouse)

7) TTL & storage policies

Use TTL to drop, roll up, or move data across hot/warm/cold volumes (e.g., local → S3) over time. This is how you manage cost at scale. (ClickHouse)

A small, realistic blueprint (DDL you can use)

Scenario: Clickstream with real-time dashboards + daily product analytics.

-- 1) Raw events (hot) → monthly partitions, read patterns partition-friendly
CREATE TABLE analytics.events_raw
(
  event_time   DateTime64(3, 'UTC'),
  user_id      UInt64,
  session_id   UUID,
  event_type   LowCardinality(String),
  url          String,
  status       UInt16,
  props        JSON
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, event_time)
TTL event_time + INTERVAL 30 DAY TO VOLUME 'warm',
    event_time + INTERVAL 90 DAY DELETE;

-- Data-skipping index for selective filters on status or event_type
ALTER TABLE analytics.events_raw
  ADD INDEX idx_status bloom_filter(status) GRANULARITY 4;

-- 2) Incremental rollup via MV: per-minute counts by type
CREATE TABLE analytics.events_minute
(
  ts_min     DateTime('UTC'),
  event_type LowCardinality(String),
  cnt        UInt64
)
ENGINE = SummingMergeTree
PARTITION BY toYYYYMM(ts_min)
ORDER BY (event_type, ts_min);

CREATE MATERIALIZED VIEW analytics.mv_events_minute
TO analytics.events_minute AS
SELECT
  toStartOfMinute(event_time) AS ts_min,
  event_type,
  count() AS cnt
FROM analytics.events_raw
GROUP BY ts_min, event_type;

-- 3) Optional: sharded fanout reader (no storage here)
CREATE TABLE analytics.events_raw_dist AS analytics.events_raw
ENGINE = Distributed('prod_cluster', 'analytics', 'events_raw', user_id);

Why it works:

ORDER BY (user_id, event_time) prunes by user fast and keeps time locality for merges.
Partitions by month make retention trivial. (Ops > speed.) (ClickHouse)
Bloom filter helps when filtering status or narrow event_type sets not covered by the sorting key. Test its granularity. (ClickHouse)
MV shifts heavy group-bys to write time; reads stay snappy. (ClickHouse)
Distributed table lets you scale read throughput horizontally. (ClickHouse)

Ingestion patterns that won’t bite you later

Batch size matters most. Aim for 10k–100k rows per insert (or bigger) to reduce part churn and boost throughput. (ClickHouse)
Prefer append flows (Kafka → MV → MergeTree). Reserve mutations for fixes.
If you hit “Too many parts,” slow insert cadence, increase batch size, or tweak merge thresholds—don’t just raise limits. (See “Too many parts” docs.) (ClickHouse)
On object storage (S3), tune insert block size and parallelism for better throughput. (ClickHouse)

Query design: fast patterns

Filter early, everywhere. Push predicates into subqueries used in JOINs; don’t rely on automatic pushdown. (ClickHouse)
Prefer denormalized reads or pre-joins via MVs for hot paths; JOINs are supported and fast, but still costlier. (ClickHouse)
Use PREWHERE for selective columns, SAMPLE for approximate analytics when OK. (ClickHouse)
Be careful with FINAL; it forces expensive merges on read.

Ops and scale

Replication: ReplicatedMergeTree with Keeper; keep Keeper on dedicated nodes for resilience. (ClickHouse)
Distributed DDL: ON CLUSTER to keep schemas in sync. (ClickHouse)
Kafka ingestion: Use the Kafka table engine + MVs, or ClickPipes in Cloud for simpler ops. (ClickHouse)
Monitoring: Watch system.parts, system.mutations, and system.data_skipping_indices for health. (ClickHouse)

Common pitfalls (and succinct fixes)

Pitfall	Symptom	Fix
ORDER BY doesn’t match filters	Huge scanned bytes	Align sorting key with your common predicates; reconsider schema
Over-partitioning (by day/hour)	“Too many parts”, slow merges	Partition by month (usually) and batch inserts
MV everywhere	Fast writes, slow inserts	Only accelerate true hot queries; keep MV logic simple
Skip indexes as a hammer	No speedup or slower	Add only after PK is right; test `bloom_filter/tokenbf/ngrambf` on real data
Mutations like OLTP	Long-running background I/O	Prefer insert-only + Replacing/Summing engines; schedule batched corrections
Keeper co-located	Replica flaps during load	Move Keeper to dedicated small nodes

Quick field checklist

Keys: Is ORDER BY prefixing your filter columns?
Partitions: Are you using them for lifecycle ops (drop/move), not speed? (ClickHouse)
Inserts: Are batches ≥ 10k rows? (ClickHouse)
MVs: Are you precomputing only what’s undeniably hot? (ClickHouse)
Indexes: Added only when they meaningfully prune? (ClickHouse)
Storage: Do TTLs move cold data off hot disks? (ClickHouse)

Bonus: vectors in ClickHouse (when you need “AI search” next quarter)

ClickHouse supports exact and approximate vector search (e.g., HNSW) with SQL, so you can combine ANN with classic filters/aggregations in one place—useful for recommendations or semantic search alongside analytics. (ClickHouse)

Internal link ideas (official docs only)

MergeTree family overview → ClickHouse Docs (Table Engines › MergeTree) (ClickHouse)
Choosing a primary (ordering) key → Best Practices (ClickHouse)
Distributed engine → Table Engines › Special/Distributed (ClickHouse)
Replication & Keeper → Replicated* engines + Keeper guide (ClickHouse)
Materialized views (incremental vs refreshable) → Docs (ClickHouse)
Data-skipping indexes → Optimize › Skipping Indexes + Examples (ClickHouse)
Partitioning best practices → Best Practices + Managing data (ClickHouse)
TTL & storage policies → Guides + ALTER TTL reference (ClickHouse)
Inserts & “Too many parts” → Bulk inserts + Tips & Tricks (ClickHouse)
Kafka ingestion & ClickPipes → Integrations (ClickHouse)

Summary

ClickHouse is a write-light, read-heavy OLAP beast. You’ll get millisecond-level scans on billions by: (1) picking the right ORDER BY and partitions, (2) batching inserts, (3) using MVs and skip indexes only where they pay rent, and (4) scaling with ReplicatedMergeTree + Distributed correctly. Treat it like an analytics engine, not an OLTP database.

Call to action: Take your hottest dashboard and implement the MV pattern above. Measure scanned bytes, latency, and cost before/after—you’ll know in an afternoon whether you’ve unlocked “ClickHouse speed.”

Image prompt (for DALL·E/Midjourney)

“A clean, modern architecture diagram of a sharded ClickHouse cluster: users → Kafka → MergeTree (hot) with MV to rollup table, ReplicatedMergeTree on each shard, Distributed query fan-out, Keeper quorum, and TTL moving data to cold object storage. Minimalistic, high contrast, isometric 3D.”

Data/ML Engineer Blog