ClickHouse Primary vs Sorting Key: How to Design ORDER BY for 10× Less I/O
Meta description (158 chars):
A practical guide to ClickHouse primary vs sorting keys—how ORDER BY works, when to narrow the primary key, and real patterns that slash scanned bytes.
The quick why (a relatable moment)
Your dashboard queries are “fast” in dev but scan gigabytes in prod. The culprit isn’t ClickHouse—it’s your key design. Get ORDER BY, primary key, and partitions right, and those same queries drop from seconds to tens of milliseconds.
Concepts first (clear mental model)
Columnar + parts + sparse index
ClickHouse writes immutable parts and keeps a sparse primary index over data sorted by the sorting key. The index points to granules (row ranges), not individual rows—so ClickHouse can jump to the right ranges and skip the rest. (ClickHouse)
Sorting key vs primary key (not the same thing)
- Sorting key: the physical on-disk order (
ORDER BY). - Primary key: the sparse index used for pruning.
If you don’t specifyPRIMARY KEY, it defaults to the sorting key. If you specify both, the primary key must be a prefix of the sorting key. (ClickHouse)
Why you sometimes narrow the primary key
A narrower primary key loads a smaller in-memory index while keeping a richer sort order for compression/merges. This trade-off is useful when filters mostly hit the leading columns. (ClickHouse)
You can change the sorting key (carefully)
ALTER TABLE … MODIFY ORDER BY changes the sorting key metadata; the primary key remains unchanged. It’s powerful but constrained—read the docs before you refactor. (ClickHouse)
Practical patterns (and when to use them)
Goal: align the leftmost key columns with your most selective, most common filters. Keep the rest for locality and compression.
| Use case | Recommended design | Why it works |
|---|---|---|
| User-centric analytics (drill by user, then time) | PRIMARY KEY (user_id, event_time); ORDER BY (user_id, event_time, event_type) | Prunes by user, maintains time locality; extra column helps compression & merges. (ClickHouse) |
| Time-first dashboards (global filters by time, optional narrow dims) | PRIMARY KEY (event_time); ORDER BY (event_time, dim_low_card, …) | Time slicing is king; keep low-card dimensions for locality without bloating the primary index. (ClickHouse) |
| Multi-tenant logs (isolate tenant, then time) | PRIMARY KEY (tenant_id, ts); ORDER BY (tenant_id, ts, level) | Tenant-first pruning prevents cross-tenant scans; sort keeps append-friendly writes. (ClickHouse) |
| High-cardinality search on a few fields | PRIMARY KEY (ts); ORDER BY (ts, …) + data-skipping indexes on the search fields | When filters can’t be on the left, add skipping indexes to prune blocks. (ClickHouse) |
Real DDL you can adapt
-- Events table: user-first queries with time windows
CREATE TABLE app.events
(
event_time DateTime64(3, 'UTC'),
user_id UInt64,
event_type LowCardinality(String),
url String,
status UInt16
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
PRIMARY KEY (user_id, event_time)
ORDER BY (user_id, event_time, event_type);
-- Variant: time-first dashboards, lighter in-memory index
CREATE TABLE app.events_time_first
(
event_time DateTime64(3, 'UTC'),
event_type LowCardinality(String),
user_id UInt64,
status UInt16
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
PRIMARY KEY (event_time) -- Narrower PK for a smaller sparse index
ORDER BY (event_time, event_type, user_id);
Notes:
- Rows are physically ordered by the sorting key; pruning uses the primary key. Defaults tie them together; explicit
PRIMARY KEYlets you narrow the index for memory efficiency. (ClickHouse) - Use monthly partitions mainly for lifecycle ops (drop/move/TTL), not for query speed. (ClickHouse)
A quick “why is my query slow?” checklist
- Leftmost mismatch: Your common filters aren’t on the left of the key → reorder keys.
- Over-wide primary key: You copied the whole
ORDER BYintoPRIMARY KEY→ consider narrowing. (ClickHouse) - Too many tiny parts: Fix ingestion batch size; key design can’t save write amplification. (See MergeTree parts behavior.) (ClickHouse)
- Hard-to-index filters: Add data-skipping indexes for columns not covered by the primary key. (ClickHouse)
- Key refactor needed: Read
MODIFY ORDER BYcaveats before changing sort order in prod. (ClickHouse)
Decision guide: choosing ORDER BY, PRIMARY KEY, and partitions
- Start with filters. Rank the top 3 predicates from real queries; put the most selective first in
ORDER BY. - Keep
PRIMARY KEYas a prefix. If memory is tight, narrow it to just the truly selective prefix. Defaults are OK if you’re unsure. (ClickHouse) - Partition for operations. Use
PARTITION BYfor retention and movement (e.g., monthly). Don’t over-partition to “make queries fast.” (ClickHouse)
Small, targeted examples
Example 1 — Narrow primary key for memory
-- Rich sort order for compression; narrow PK for smaller in-memory index
PRIMARY KEY (account_id)
ORDER BY (account_id, toDate(event_time), event_time, action)
This still sorts by all four columns, but the index only tracks (account_id) granules—often enough for pruning. (ClickHouse)
Example 2 — Adding skip indexes when filters don’t fit the key
ALTER TABLE app.events
ADD INDEX idx_status bloom_filter(status) GRANULARITY 4;
Lets ClickHouse skip blocks when the primary key can’t help. Test on real data. (ClickHouse)
Common pitfalls (and blunt fixes)
- Cargo-cult keys: Throwing every column into
ORDER BYbloats files and rarely helps. Keep it focused. (ClickHouse) - Primary ≠ OLTP uniqueness: Don’t treat ClickHouse primary keys like OLTP PKs; they’re for pruning, not row identity. (ClickHouse)
- Over-partitioning: Daily/hourly partitions cause part explosions. Prefer monthly unless you truly need finer ops. (ClickHouse)
- Unsafe refactors:
MODIFY ORDER BYisn’t a toy; understand constraints and test with copies. (ClickHouse)
Summary (what to remember)
- The sorting key (
ORDER BY) defines on-disk order; the primary key is the sparse index that prunes I/O. Defaults tie them together; you can narrow the primary key for memory. (ClickHouse) - Put your most selective, most common filters leftmost in
ORDER BY. - Use partitions for lifecycle, not speed. Add data-skipping indexes only when the primary key can’t carry you. (ClickHouse)
Call to action: Pull your slowest dashboard query. Create a staging table with the “narrow PK, rich ORDER BY” pattern, reload one day of data, and compare scanned bytes—you’ll know if you earned that 10×.
Internal link ideas (official docs)
- MergeTree engine overview (parts, merges, keys). (ClickHouse)
- Choosing a primary key (differences from OLTP). (ClickHouse)
- Sparse primary indexes guide (narrow PK pattern). (ClickHouse)
- Data-skipping indexes (when PK can’t help). (ClickHouse)
- Choosing a partitioning key (ops vs speed). (ClickHouse)
ALTER … MODIFY ORDER BY(changing sort keys safely). (ClickHouse)
Image prompt
“A crisp architecture diagram showing a ClickHouse MergeTree table: disk layout sorted by ORDER BY, sparse primary index over granules, queries pruning ranges, with an inset contrasting a narrow primary key vs a wider sorting key. Minimalistic, high contrast, isometric 3D.”
Tags
#ClickHouse #PrimaryKey #SortingKey #OrderBy #DataEngineering #OLAP #PerformanceTuning #MergeTree #Indexes #BestPractices













Leave a Reply