QuestDB Schema Design Playbook

QuestDB Schema Design Playbook: Symbols vs Strings, Indexes, Partitions, TTL, and Dedup Keys

Meta description (159 chars):
A practical QuestDB schema design guide: when to use SYMBOL vs STRING, how to index, choose partitions, set TTL, and design dedup UPSERT keys—with checklists.


Why this matters

Time-series systems live or die by schema choices. In QuestDB, five knobs determine performance and cost: SYMBOL vs STRING, indexes, partitioning, TTL, and dedup keys. Get these right and you’ll hit sub-second scans with lean storage. Get them wrong and you’ll fight heap pressure, slow filters, and noisy duplicates.


Symbols vs Strings (and when to use each)

What is a SYMBOL?
SYMBOL stores repeating strings as interned integers with a side dictionary. This boosts equality filters and reduces storage; you can also cache a symbol’s dictionary in heap or keep it off-heap (NOCACHE). You may set capacity to hint expected cardinality. (QuestDB)

When to choose which

Use casePickWhy
Device IDs, ticker symbols, statuses, enumsSYMBOLFast = filters/joins, compact storage
Free text, long messages, high-entropy payloadsSTRINGNo dictionary overhead; full text retained
Very high cardinality labels (hundreds of thousands+)SYMBOL NOCACHE or STRINGAvoid heap blow-ups; off-heap map or raw text (QuestDB)

Helpful syntax

-- Symbol with capacity hint and no heap cache
sym SYMBOL CAPACITY 100000 NOCACHE

Checklist — Symbols vs Strings

  • Will I filter/join on this text column? → SYMBOL.
  • Is the value space bounded or slowly growing? → SYMBOL with capacity.
  • Is cardinality huge (≥100k) and heap tight? → NOCACHE. (QuestDB)
  • Is it long free text or never filtered? → STRING.

Gotchas

  • Underestimate symbol capacity → dictionary resizes + perf dips; grossly overestimate → wasted memory/disk. Tune, don’t guess. (QuestDB)
  • Only SYMBOL columns can be indexed in QuestDB (not STRING). (QuestDB)

Index selection (and capacity)

QuestDB indexes are inverted lists on SYMBOL columns. They speed equality filters at the cost of extra writes and some storage. You can add an index during CREATE TABLE or later via ALTER TABLE ... ADD INDEX. (QuestDB)

Trade-offs you must accept

  • Writes: each row also updates the index.
  • Space: stores row ID lists by symbol value. (QuestDB)

Index capacity
There’s an “index capacity” setting that governs how many row IDs go in a block. Defaults are sane—don’t tweak unless you know why; oversizing wastes disk, undersizing adds block hops. (QuestDB)

Helpful syntax

-- Create with index (preferred for hot filters)
CREATE TABLE ticks (
  ts TIMESTAMP, sym SYMBOL, px DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY, INDEX(sym);

-- Add index later
ALTER TABLE ticks ALTER COLUMN sym ADD INDEX;

(QuestDB)

Checklist — Indexes

  • Do dashboards filter WHERE sym = ? or IN (...)? → Index it.
  • Is the column STRING? → convert to SYMBOL first if you need an index.
  • Massive write rates and rarely filtered? → skip the index.

Gotchas

  • Indexing a label users rarely filter wastes CPU/disk.
  • You can add an index later, but changing some low-level capacities post-creation may be constrained—plan ahead. (QuestDB)

Partition sizing (HOUR, DAY, WEEK, MONTH, YEAR)

Partitioning is available only when you have a designated timestamp. Valid intervals: NONE, YEAR, MONTH, WEEK, DAY, HOUR. In practice, most production tables pick DAY. (QuestDB)

How to choose the partition

  • HOUR: sub-hour TTL or very spiky writes; many files.
  • DAY: default sweet spot—good pruning, manageable files.
  • WEEK/MONTH: fewer files; good for slow data and long TTLs.

Helpful syntax

CREATE TABLE ticks (
  ts TIMESTAMP, sym SYMBOL, px DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY;

(QuestDB)

Checklist — Partitions

  • Do queries filter recent ranges (WHERE ts > now() - 7d)? → DAY.
  • Need TTL like “keep 36 hours”? → HOUR (TTL must align; see next).
  • Mostly batch analytics over months? → MONTH.

Gotchas

  • You cannot add partitioning later without copying; choose carefully up front. (Partitions are defined at create time.) (QuestDB)

TTL strategies (automatic retention)

QuestDB can enforce Time To Live (TTL): it drops entire partitions whose time window is older than the TTL. Because drops are partition-granular, TTL must be a whole multiple of the partition size (e.g., DAY table → TTL in days/weeks/months). (QuestDB)

The engine considers the partition’s time window, not each row’s latest timestamp—so a partition is removed only after the entire window is past the TTL horizon. (QuestDB)

Helpful syntax

-- Set TTL on an existing table
ALTER TABLE ticks SET TTL 14 DAYS;

-- Table partitioned by HOUR requires TTL in hours/days/weeks (not months)
ALTER TABLE fine_ticks SET TTL 36 HOURS;

(QuestDB)

Checklist — TTL

  • Pick TTL first, then choose a matching partition.
  • Retain a safety margin for backfills and late events.
  • Automate: treat TTL as code (migration scripts).

Gotchas

  • TTL doesn’t delete partial windows; if you need finer trimming, pick smaller partitions. (QuestDB)

Deduplication keys (UPSERT KEYS) for ingest

QuestDB supports storage-level dedup on WAL tables. You enable it and define UPSERT KEYS—the columns that identify a duplicate. The designated timestamp must be included in the key list. (QuestDB)

Helpful syntax

CREATE TABLE prices (
  ts TIMESTAMP,
  sym SYMBOL,
  px  DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY WAL
  DEDUP UPSERT KEYS (ts, sym);

-- Enable later
ALTER TABLE prices DEDUP ENABLE UPSERT KEYS (ts, sym);

(QuestDB)

Checklist — Dedup keys

  • Define the business notion of uniqueness (e.g., one price per sym per ts).
  • Always include the timestamp in UPSERT KEYS.
  • Keep keys small (int/symbol preferred) to minimize lookup cost.

Gotchas

  • Dedup only affects new inserts after enabling; old dupes remain unless you reprocess. (QuestDB)
  • Over-broad keys (e.g., missing sym) will overwrite good data.

A working template (putting it all together)

CREATE TABLE metrics (
  ts       TIMESTAMP,
  service  SYMBOL CAPACITY 2000,    -- enum-like; filtered often
  host     SYMBOL NOCACHE,          -- high cardinality; avoid heap
  region   SYMBOL,                  -- low cardinality
  status   SYMBOL,                  -- "ok|warn|crit"
  value    DOUBLE
) TIMESTAMP(ts)
  PARTITION BY DAY
  WAL
  DEDUP UPSERT KEYS (ts, host, service);

-- Index the hot filters
ALTER TABLE metrics ALTER COLUMN service ADD INDEX;
ALTER TABLE metrics ALTER COLUMN region  ADD INDEX;

-- Retention: keep last 30 days
ALTER TABLE metrics SET TTL 30 DAYS;

Field checklists (one-pager)

Symbols vs Strings

  • Will you filter/join on it? → SYMBOL
  • Many distinct values and low heap? → SYMBOL NOCACHE
  • Long free text, no filters? → STRING (QuestDB)

Indexes

  • Index only SYMBOLs used in WHERE/IN/JOIN
  • Accept write overhead; measure before/after
  • Leave index capacity at default unless you’re an expert (QuestDB)

Partitions

  • Choose by TTL + query window: HOUR/DAY/MONTH
  • Partitions defined at create time—decide early (QuestDB)

TTL

  • Must be a whole multiple of partition size
  • Drops whole partitions; pick smaller partitions for finer expiry (QuestDB)

Dedup

  • WAL table only; include timestamp in UPSERT KEYS
  • Enable early; old dupes persist (QuestDB)

Common pitfalls (and fixes)

  • Heap spikes on SYMBOL caches → switch hot, bounded labels to CACHE; switch high-cardinality labels to NOCACHE. (QuestDB)
  • Indexing everything → index only columns used in equality filters; otherwise let partitions do the work. (QuestDB)
  • Misaligned TTL → “30 hours” on DAY partitions won’t work; use HOUR partitions or a 2-day TTL. (QuestDB)
  • Dedup keys missing ts → duplicates slip through or valid rows overwrite; always include ts. (QuestDB)

Summary & call-to-action

Design for your query shapes and retention first; the rest follows. Use SYMBOL (carefully cached) for filterable labels, indexes only where they pay for themselves, DAY partitions for most workloads, TTL aligned to partition size, and UPSERT KEYS that match real-world uniqueness. Start with the template above, benchmark, then iterate.

Next step: create a test table with your real labels, load 24h of traffic, and benchmark “p95 filter + aggregate” with and without indexes. Keep what wins.


Internal link ideas (official)

  • QuestDB Docs → Symbol, Indexes, Partitions, TTL, Deduplication, CREATE TABLE, ALTER TABLE … ADD INDEX, ALTER TABLE … SET TTL. (QuestDB)

Image prompt (for DALL·E / Midjourney)

“A clean, modern diagram of a QuestDB time-series table: SYMBOL vs STRING columns, an indexed SYMBOL path, daily partitions on disk, TTL pruning old partitions, and a WAL-backed dedup path with UPSERT KEYS. Minimalist, high-contrast, isometric style.”


Tags

#QuestDB #TimeSeries #SchemaDesign #DatabaseIndexes #DataRetention #TTL #Deduplication #DataEngineering #Performance #Symbols

Leave a Reply

Your email address will not be published. Required fields are marked *