Avoiding the “Too Many Parts” Trap: Insert Strategies that Scale on S3 (ClickHouse)
Meta description (159 chars):
Practical ClickHouse guide to fix “Too many parts” on S3: batch sizes, async inserts, backfill windows, and merge-safe settings for high-throughput ingestion.
The problem (and why you hit it on S3)
You kick off a big backfill from S3. Ten minutes later: DB::Exception: Too many parts (N). Merges are slower than inserts. Your cluster’s busy creating thousands of tiny parts per partition; background merges can’t keep up. Result: throttled inserts, rising storage ops, and dashboards slowing to a crawl. ClickHouse is fast—if you feed it fewer, bigger parts and let merges breathe. (ClickHouse)
Mental model: parts, merges, and the S3 twist
- Every INSERT creates a new part (per partition key). Background threads later merge small parts into larger ones; that’s how MergeTree stays efficient. (ClickHouse)
- On S3-backed storage, insert block size ⇒ initial part size. Bigger blocks mean fewer parts, fewer merges, and less I/O churn. (ClickHouse)
- The “Too many parts” error is gated by MergeTree settings:
parts_to_delay_insert(slow inserts above this)parts_to_throw_insert(throw exception above this) (ClickHouse)
Key: You don’t “fix” the error by just raising thresholds—you reduce part creation and let merges catch up. (ClickHouse)
The short list: what actually works
1) Batch like you mean it
ClickHouse has constant overhead per insert. Batch size is the single most important lever. Aim for 10k–100k rows per insert (more is fine if memory allows). (ClickHouse)
2) Use server-side batching with async inserts
When many clients send small payloads (logs/metrics), enable:
SET async_insert = 1, wait_for_async_insert = 1; -- durable batching
-- For fire-and-forget throughput (less strict durability):
-- SET wait_for_async_insert = 0;
Async inserts buffer on the server and coalesce small writes into bigger parts—exactly what you want under fan-in. (ClickHouse)
3) Right-size insert blocks for S3
For INSERT … SELECT (e.g., from S3 table functions), tune block formation to emit fewer, bigger parts:
INSERT INTO analytics.events
SELECT *
FROM s3('https://bucket/logs/2025/11/*.parquet', 'AWS_KEY', 'AWS_SECRET')
SETTINGS
max_insert_threads = 8,
min_insert_block_size_rows = 0,
min_insert_block_size_bytes = 256000000; -- ~256 MiB target
Larger blocks produce larger initial parts; ClickHouse spends less time merging and more time serving queries. (Adjust sizes to your dataset and memory headroom.) (ClickHouse)
4) Windowed backfills by partition
Because one part is created per partition in the block, broad scans spanning many partitions explode part counts. Backfill per month/day (your PARTITION BY) and run jobs in parallel across partitions, not across the whole table. (ClickHouse)
5) Treat thresholds as safety rails, not a cure
For planned backfills you can temporarily relax:
ALTER TABLE analytics.events
MODIFY SETTING parts_to_delay_insert = 500, parts_to_throw_insert = 2000;
…but revert after the load. The real cure is bigger, fewer parts. (ClickHouse)
6) Don’t rely on OPTIMIZE … FINAL as a broom
It forces heavy merges and isn’t a general-purpose fix—reserve for admin cases, not routine ingestion. (ClickHouse)
A concrete pattern: S3 → ClickHouse, safely
Table design (example)
CREATE TABLE analytics.events
(
event_time DateTime64(3, 'UTC'),
user_id UInt64,
event_type LowCardinality(String),
status UInt16,
payload JSON
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, event_time);
Why it helps:
- Monthly partitions make ops (drop/move/TTL) manageable and constrain backfill windows.
- Sorted by
(user_id, event_time)aligns with common filters and keeps merges efficient.
Backfill playbook (repeatable)
- Partition windows. Load one
YYYYMMat a time. - Bigger blocks. Use
min_insert_block_size_*/max_insert_threadsto emit fat parts. (ClickHouse) - Async inserts for many producers. (ClickHouse)
- Watch part churn as you go:
-- Parts per partition
SELECT partition, sum(rows) rows, count() parts
FROM system.parts
WHERE table = 'events' AND active
GROUP BY partition
ORDER BY partition; -- shrinking 'parts' = healthy
And for async behavior:
SELECT *
FROM system.asynchronous_insert_log
ORDER BY event_time DESC
LIMIT 50;
(Use system.part_log and system.parts to verify merge progress over time.) (ClickHouse)
S3 specifics: storage, cache, and read path
- Store MergeTree data on S3 via an S3 disk or policy; you can also enable a local filesystem cache to cut repeated GET latency. (ClickHouse)
- ClickHouse reads/writes parts while the OS/page cache and optional uncompressed block cache keep hot data close; still, the best latency win is fewer, larger parts. (ClickHouse)
Tuning checklist (copy/paste)
- Batching
- Aim for 10k–100k rows per insert (or higher), or use
async_insert. (ClickHouse)
- Aim for 10k–100k rows per insert (or higher), or use
- Block sizing for S3 loads
- Tune
min_insert_block_size_rows/bytes,max_insert_threads; verify memory headroom. (ClickHouse)
- Tune
- Backfill strategy
- Load by partition windows; avoid cross-month scans in one job. (ClickHouse)
- Thresholds
parts_to_delay_insert,parts_to_throw_insert—use sparingly for migrations. (ClickHouse)
- Ops visibility
- Track
system.parts,system.part_log,system.asynchronous_insert_log. (ClickHouse)
- Track
- Don’ts
- Don’t lean on
OPTIMIZE … FINALto “clean up” routine ingestion. (ClickHouse)
- Don’t lean on
Troubleshooting: symptom → action
| Symptom | Likely cause | Fast action |
|---|---|---|
Too many parts (…); merges slower than inserts | Small inserts, wide partition span | Increase batch/block size; run by partition; consider async inserts. (ClickHouse) |
| Inserts throttled but not failing | Hit parts_to_delay_insert | Temporarily raise delay threshold during backfills; revert after. (ClickHouse) |
| Part count stable but not shrinking | Merges starved | Pause producers or window tighter; avoid OPTIMIZE FINAL as routine fix. (ClickHouse) |
| Many tiny parts on S3 | Small insert blocks | Increase min_insert_block_size_bytes and threads (watch memory). (ClickHouse) |
Best practices & common pitfalls
Best practices
- Prefer server-side batching (
async_insert=1) when you can’t batch in clients. (ClickHouse) - Backfill with bounded time windows matching
PARTITION BY. (ClickHouse) - Set realistic batch targets and validate with
system.partsduring a dry run. (ClickHouse)
Pitfalls
- Cranking thresholds and calling it done—it hides symptoms and bites later. (ClickHouse)
- Loading a full year in one
INSERT … SELECT—explodes partitions/parts. (ClickHouse) - Cleaning with
OPTIMIZE … FINALafter every load—expensive and unnecessary. (ClickHouse)
Internal link ideas (official docs)
- Bulk inserts: why batch size dominates. (ClickHouse)
- Async inserts: mechanics & monitoring. (ClickHouse)
- S3 insert/read performance: block → part mechanics. (ClickHouse)
- MergeTree parts & system tables:
system.parts,part_log. (ClickHouse) - “Too many parts” thresholds:
parts_to_delay_insert,parts_to_throw_insert. (ClickHouse) - Avoid Optimize FINAL: when not to use it. (ClickHouse)
Summary & call to action
On S3, the fastest path out of “Too many parts” is simple: bigger blocks ⇒ bigger parts ⇒ fewer merges. Batch aggressively (or enable async inserts), backfill by partition windows, and treat part thresholds as guardrails—not duct tape. Measure part counts and merge progress as you load.
Try this today: pick one hot table, run a one-month backfill with the block settings above, and watch system.parts before/after. If parts/partition drop and insert latency flattens, you’re on the right track. (ClickHouse)
Image prompt
“A clean, modern diagram of ClickHouse on S3 showing client inserts and S3 INSERT … SELECT flows forming large in-memory blocks, creating fewer large parts per partition, background merges consolidating parts, and async inserts buffering small writes — minimalistic, high-contrast, isometric 3D.”
Tags
#ClickHouse #TooManyParts #BulkInserts #S3 #AsyncInserts #MergeTree #DataEngineering #Backfill #Performance #OLAP













Leave a Reply