Operating QuestDB in Production: Capacity Planning, Write Amplification, Storage Sizing, and HA with Enterprise Replication
Meta description (159 chars):
A pragmatic, mid-level guide to running QuestDB in production: capacity planning, write-amplification monitoring, storage sizing, TTL retention, and Enterprise HA.
Introduction — the 2 a.m. page you’d rather avoid
Your on-call rings: ingest is “green,” but dashboards are stale, disk climbs faster than finance approvals, and failover isn’t… failing over. This guide shows how to plan capacity, watch write amplification, right-size storage, enforce retention, and ship HA replication on QuestDB — in practical, copy-pasteable steps.
We’ll stick to official QuestDB guidance and call out gotchas before they bite you.
What “production-ready” means for QuestDB
- Provision the right IOPS/throughput, CPU, and RAM for your workload.
- Monitor the built-in Prometheus metrics and alert on early signals (WAL lag, write amplification, suspended tables). (QuestDB)
- Control growth with TTL and/or partition drops. (QuestDB)
- Design HA using Enterprise primary-replica replication and point-in-time recovery (PITR). (QuestDB)
Capacity planning (CPU, RAM, disk, filesystem)
Disk: IOPS & throughput first
For cloud block storage (e.g., AWS gp3), target at least ~7,000 IOPS and ~500 MB/s throughput; scale up to ~16,000 IOPS and ~1 GB/s for heavy loads. NVMe beats SATA when you control the hardware. (QuestDB)
Filesystem & compression
QuestDB supports several filesystems, but production compression requires ZFS, and NFS (or similar distributed FS) is not supported for the database directory. (NFS can be used later as an object store for replication — different path, different purpose.) (QuestDB)
RAM & page sizes
Start at 8 GB (small) and 32 GB (serious). Out-of-order (O3) ingestion across many columns can benefit from smaller O3 memory pages; tune cairo.o3.column.memory.size (128 KB–8 MB) and consider reducing writer page size for “wide-and-shallow” schemas. (QuestDB)
Write amplification: what it is and how to watch it
Definition (built-in metrics):write_amplification = questdb_physically_written_rows_total / questdb_committed_rows_total
Higher values mean the engine had to rewrite more rows than it committed — common with O3 ingestion and large partitions. (QuestDB)
Monitoring endpoint:
Enable metrics and scrape /metrics on port 9003:
# server.conf
metrics.enabled=true
(Or QDB_METRICS_ENABLED=TRUE in Docker.) (QuestDB)
Key metrics to graph/alert:
questdb_committed_rows_total,questdb_physically_written_rows_total(compute the ratio)- WAL apply progress:
questdb_wal_seq_txnvs.questdb_wal_writer_txn— a widening gap = ingestion can’t keep up, or a table is suspended. (QuestDB)
Prometheus rule (example):
groups:
- name: questdb
rules:
- record: questdb:write_amp
expr: rate(questdb_physically_written_rows_total[5m])
/ rate(questdb_committed_rows_total[5m])
- alert: HighWriteAmplification
expr: questdb:write_amp > 2.5
for: 15m
labels: { severity: warning }
annotations:
summary: "QuestDB high write amplification (>2.5x)"
description: "Likely out-of-order or oversized partitions. Investigate partitioning/O3."
(Metric names and /metrics configuration are from QuestDB’s official Prometheus guide.) (QuestDB)
How to reduce write amp fast:
- Use smaller time partitions when O3 is heavy (e.g., DAY → HOUR).
- Since 7.2, QuestDB can split partitions during heavy O3 merges — still, right-sizing partitions up front helps. (QuestDB)
Storage sizing that won’t explode your budget
Know your tiers (Enterprise)
QuestDB uses a row-based WAL write path and a columnar read path. With Enterprise, you can tier older partitions to Parquet (locally or to object storage) and keep them fully queryable — a big win for cost per TB. (QuestDB)
Back-of-napkin sizing formula (start here):
Raw/day ≈ rows_per_sec × row_bytes × 86,400
Hot tier ≈ Raw/day × hot_days × (compression_factor₁)
Cold tier≈ Raw/day × cold_days × (compression_factor₂_parquet)
WAL headroom ≈ ingest_rate × average_TX_size × safety_factor
Then check real compression on your data and adjust. (Use ZFS compression for native format; Parquet uses its own.) (QuestDB)
Operational guardrails:
- Watch open file limits when table/partition counts soar (raise
ulimit/LimitNOFILE). (QuestDB) - Keep symbol cardinality reasonable and sample queries to validate memory/cache behavior (see “Design for performance”). (QuestDB)
Retention: TTL vs. scheduled partition drops
Easiest: table TTL (built-in, automatic)
Set a Time-To-Live so expired partitions are dropped automatically during commits (or on restart if idle).
-- New table, 30-day TTL
CREATE TABLE ticks(
ts TIMESTAMP, symb SYMBOL, px DOUBLE
) timestamp(ts)
PARTITION BY DAY
TTL 30 DAYS;
-- Existing table
ALTER TABLE ticks SET TTL 30 DAYS;
Notes: TTL works on partitioned WAL tables and evaluates at partition granularity. (QuestDB)
Manual control: DROP PARTITION
For custom schedules or ad-hoc cleanup:
-- Drop partitions older than 90 days
ALTER TABLE ticks DROP PARTITION
WHERE ts < dateadd('d', -90, now());
Remember: dropping is destructive; the newest partition can’t be dropped. (QuestDB)
High availability with QuestDB Enterprise replication
Architecture in one minute
Primary uploads WAL segments to an object store (S3, Azure Blob, NFS, GCS). Any number of replicas download and apply WAL continuously (hot) or later (cold), enabling read scaling and PITR. Replicas are eventually consistent and read-only. (QuestDB)
Minimal config (server.conf excerpts)
Primary:
replication.role=primary
replication.object.store=s3::bucket=<BUCKET>;root=<DB_INSTANCE_NAME>;region=<AWS_REGION>;
cairo.snapshot.instance.id=<PRIMARY_UUID>
Replica:
replication.role=replica
replication.object.store=s3::bucket=<BUCKET>;root=<DB_INSTANCE_NAME>;region=<AWS_REGION>;
cairo.snapshot.instance.id=<REPLICA_UUID>
Take a snapshot on the primary, restore it on the replica, then start the replica to catch up from WAL. (Object store strings differ for Azure/NFS/GCS; see docs.) (QuestDB)
Hot vs. cold availability
- Hot: keep one or more replicas continuously applying WAL for quick cutover.
- Cold: rebuild a new primary from the latest snapshot + WAL when needed — cheaper, slower RTO. (QuestDB)
Operational alarms for HA
- Replication freshness: alert if WAL-apply lag grows (see WAL counters earlier). (QuestDB)
- Suspended tables:
questdb_suspended_tables > 0⇒ investigate. (QuestDB) - Object store hygiene: apply lifecycle rules to expire old WAL blobs and snapshots to control costs. (QuestDB)
Runbook: the concise checklist
Plan
- Choose NVMe / gp3 with ≥7k IOPS & ≥500 MB/s to start. (QuestDB)
- Use ZFS (if you want compression) for the DB volume; don’t put the DB on NFS. (QuestDB)
- Partition by DAY (default) — go HOUR if O3 is heavy. (QuestDB)
Monitor
- Enable
/metricsand scrape 9003. Alert on write_amp > 2–3x, WAL lag, suspended tables. (QuestDB)
Retain
- Prefer
TTL n DAYS; fall back to scheduledDROP PARTITIONfor custom windows. (QuestDB)
Replicate (Enterprise)
- Configure primary + replicas via
replication.roleand a commonreplication.object.store. - Decide Hot (lower RTO) vs Cold (lower cost) availability.
- Test PITR regularly. (QuestDB)
Common pitfalls (and blunt fixes)
- Spiking write amplification because partitions are too coarse → shrink partition granularity; watch the ratio trend, not single points. (QuestDB)
- Metrics disabled → you’re flying blind. Turn on
metrics.enabled=trueand ship a dashboard before launch. (QuestDB) - Replica lag mistaken for data loss → check WAL counters; lag ≠ loss. Tune disks/threads and verify no suspended tables. (QuestDB)
- Using NFS for the DB → not supported; only use NFS (optionally) as the replication object store, not the database filesystem. (QuestDB)
Conclusion & takeaways
If you provision sane IOPS/throughput, watch write amplification, and let TTL clean up behind you, QuestDB stays fast and predictable. Pair it with Enterprise replication for either hot read scaling or cold but cheap recovery — and practice your cutover/PITR path until it’s boring.
Call to action: Pick one table, set TTL 30 DAYS, wire write_amp alerts, and configure a hot replica. Then measure again.
Internal link ideas (for your site)
- “QuestDB WAL & Out-of-Order Ingestion: A Field Guide”
- “Tuning Partition Sizes for O3-Heavy Streams”
- “Building a Grafana Dashboard for QuestDB’s Prometheus Metrics”
- “Designing Snapshots & Lifecycle Policies for QuestDB Replication”
- “From Parquet Tiering to Cheap Long-Term Storage in QuestDB Enterprise”
Image prompt (for DALL·E / Midjourney)
“A crisp, isometric architecture diagram of a production QuestDB deployment: NVMe storage, WAL write path, partitioned tables, Prometheus metrics, TTL cleanup, and Enterprise primary→object store→replicas. Minimalist, high contrast, engineering style.”
Tags
#QuestDB #TimeSeries #CapacityPlanning #WriteAmplification #DataRetention #TTL #HA #Replication #Prometheus #DataEngineering
References (official)
- Capacity planning (IOPS/throughput, filesystem, write amplification, partition splitting, OS limits). (QuestDB)
- Prometheus metrics & /metrics on 9003 (metric names and setup). (QuestDB)
- Monitoring guide (WAL lag counters, suspended tables). (QuestDB)
- Storage engine and Parquet tiering (Enterprise). (QuestDB)
- Replication concepts & operations (primary/replica, object store, snapshots, PITR). (QuestDB)
- Data retention via DROP PARTITION. (QuestDB)
- TTL (concept + SQL). (QuestDB)









Leave a Reply