PostgreSQL Partitioning in Practice

PostgreSQL Partitioning in Practice: Monthly, Hash, and Hybrid Patterns

Blueprints + automation snippets

Why this matters (a quick hook)

Your events table just crossed 2B rows. Nightly ETL crawls. Queries that used to fly now slog. Index rebuilds take hours. You don’t need a bigger box—you need pruning. PostgreSQL’s native partitioning (PG ≥10, with major improvements in 11–16) lets the planner skip whole chunks of data. Done right, partitioning turns “full table” work into a set of small, cache-friendly scans you can maintain independently.


What is partitioning (and what it buys you)

Partitioning splits one logical table into many physical child tables. The planner prunes partitions that can’t match your predicate. Benefits:

  • Faster queries via partition pruning (e.g., WHERE ts >= '2025-11-01' → only November partition).
  • Smaller, targeted maintenance: vacuum/analyze, reindex, CLUSTER, REPACK per partition.
  • Cheap retention: DROP TABLE events_2023_07 instead of DELETE + VACUUM.
  • Better parallelism and autovacuum isolation.

Trade-offs:

  • More objects and DDL to manage.
  • No global unique indexes; uniqueness must include the partition key.
  • Some features differ across versions (e.g., FK/perf nuances).

Choosing a pattern

PatternWhen it shinesPartitioning keyProsCons
Monthly (Range)Time-series, logs, metrics with time filtersts (timestamp/date)Natural retention; excellent pruning; human-friendly namesHot partition can get “too hot”; cross-month queries touch multiple partitions
HashUniform spread when queries aren’t time-bound (e.g., user-centric workloads)account_id (or similar)Balances write hotspots; simple shard-like spreadWeaker pruning unless predicates include the hash key
Hybrid (Range → Hash)High write volume time-series with skew by tenant/accountts then subpartition by account_idKeeps time-based retention + reduces hot-partition contentionMore objects; more complex DDL and ops

Blueprint 1 — Monthly range partitioning (time-series workhorse)

Schema

-- Parent: metadata only
CREATE TABLE events (
  event_id      BIGINT GENERATED BY DEFAULT AS IDENTITY,
  ts            TIMESTAMPTZ NOT NULL,
  account_id    BIGINT NOT NULL,
  payload       JSONB,
  PRIMARY KEY (event_id, ts)  -- include ts to satisfy uniqueness across partitions
) PARTITION BY RANGE (ts);

-- One partition per month
CREATE TABLE events_2025_11 PARTITION OF events
FOR VALUES FROM ('2025-11-01') TO ('2025-12-01');

CREATE TABLE events_2025_12 PARTITION OF events
FOR VALUES FROM ('2025-12-01') TO ('2026-01-01');

-- Optional “catch-all” to prevent insert errors when you forget to create next month
CREATE TABLE events_default PARTITION OF events DEFAULT;

Local indexes (per partition)

-- Parent-level index declarations auto-create local indexes on new partitions
CREATE INDEX ON events (ts);
CREATE INDEX ON events (account_id, ts DESC);
CREATE INDEX ON events USING GIN (payload jsonb_path_ops);

Query patterns that prune

-- Prunes to November partition only
SELECT count(*) FROM events
WHERE ts >= '2025-11-01' AND ts < '2025-12-01';

-- Also prunes with BETWEEN and immutable date_trunc windowing
SELECT * FROM events
WHERE ts >= date_trunc('month', now())
  AND ts <  (date_trunc('month', now()) + interval '1 month');

Automation: create next month, drop old

-- Create the next monthly partition (idempotent-ish)
DO $$
DECLARE
  start_ts date := date_trunc('month', now())::date + interval '1 month';
  end_ts   date := (start_ts + interval '1 month');
  part_name text := format('events_%s', to_char(start_ts, 'YYYY_MM'));
BEGIN
  IF NOT EXISTS (
    SELECT 1 FROM pg_class WHERE relname = part_name
  ) THEN
    EXECUTE format($f$
      CREATE TABLE %I PARTITION OF events
      FOR VALUES FROM (%L) TO (%L)$f$, part_name, start_ts, end_ts);
  END IF;
END$$;

-- Retention: keep 13 months
DO $$
DECLARE keep_after date := date_trunc('month', now())::date - interval '13 months';
BEGIN
  -- Drop all monthly partitions that end before keep_after
  -- (Scan catalog to find them; example uses a naming convention)
  PERFORM pg_sleep(0); -- placeholder; in practice SELECT and EXECUTE for each match
END$$;

Tip: use pg_cron or your scheduler to run these DO blocks monthly.
Tip: always pre-create the next partition before month-end traffic.


Blueprint 2 — Hash partitioning (uniform writes; tenant scatter)

Schema

CREATE TABLE kv_items (
  key         BIGINT NOT NULL,
  ts          TIMESTAMPTZ NOT NULL,
  value       BYTEA,
  PRIMARY KEY (key, ts)
) PARTITION BY HASH (key);

-- 8-way hash
CREATE TABLE kv_items_p0 PARTITION OF kv_items FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE kv_items_p1 PARTITION OF kv_items FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... p2..p7

Notes

  • Pruning is effective if your predicate includes key (or equality join on it).
  • Great for smoothing write hotspots across partitions.
  • Retention is not as natural—consider TTL via DELETE … WHERE ts < … per partition + autovacuum tuning.

Blueprint 3 — Hybrid: Range by month, subpartition by hash (PG ≥11)

Schema

CREATE TABLE metrics (
  account_id  BIGINT NOT NULL,
  ts          TIMESTAMPTZ NOT NULL,
  v           DOUBLE PRECISION NOT NULL,
  PRIMARY KEY (account_id, ts)
) PARTITION BY RANGE (ts);

-- Parent partitions by month
CREATE TABLE metrics_2025_11 PARTITION OF metrics
FOR VALUES FROM ('2025-11-01') TO ('2025-12-01')
PARTITION BY HASH (account_id);

-- Subpartitions per month (4-way hash)
CREATE TABLE metrics_2025_11_h0 PARTITION OF metrics_2025_11
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE metrics_2025_11_h1 PARTITION OF metrics_2025_11
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... h2, h3

-- Repeat for 2025_12 etc. (automate!)

Why hybrid?

  • You still get time-based pruning and retention,
  • while distributing the hottest month across several child partitions to reduce contention and index bloat.

Operational best practices (hard-won lessons)

Keys & constraints

  • Put the partition key into every unique index/PK. Global unique indexes don’t exist.
  • Prefer identity over sequence if you need surrogate keys; but natural PKs that include the partition key are cleaner.

Indexing

  • Define indexes on the parent so they auto-create on new partitions.
  • Keep per-partition index count minimal; every index multiplies write cost.
  • For time-series, favor (ts) + (account_id, ts) and a selective GIN on JSONB if needed.

Row movement

  • Native partitioning routes inserts automatically—no triggers needed.
  • UPDATE that changes the partition key becomes delete+insert; avoid it.

Maintenance

  • Use per-partition VACUUM/ANALYZE for predictability.
  • Reindex only the hot partitions.
  • For big deletes, prefer DROP PARTITION (range) over DELETE.

Planner & queries

  • Keep predicates sargable: avoid wrapping ts in non-immutable functions that block pruning.
  • Multi-month scans? Consider enable_partitionwise_aggregate and enable_partitionwise_join (on in modern PG) to parallelize.

Foreign keys

  • FK to partitioned parents is supported in recent PG, but can be slower on massive child sets. Validate on quieter hours.

How many partitions?

  • Hundreds to low thousands are fine. Tens of thousands start to hurt planning.
  • Monthly is a sweet spot for most; go daily only if your partitions still exceed tens of GBs each.

Automation snippets (real-world glue)

1) Create next month & indexes with pg_cron

-- Enable extension once
CREATE EXTENSION IF NOT EXISTS pg_cron;

-- Run at 00:05 on the 25th monthly: pre-create next partition (adjust schedule)
SELECT cron.schedule('create_next_events_partition',
  '5 0 25 * *',
$$
DO $$
DECLARE
  start_ts date := date_trunc('month', now())::date + interval '1 month';
  end_ts   date := (start_ts + interval '1 month');
  pname    text := format('events_%s', to_char(start_ts, 'YYYY_MM'));
BEGIN
  IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = pname) THEN
    EXECUTE format('CREATE TABLE %I PARTITION OF events FOR VALUES FROM (%L) TO (%L);',
                   pname, start_ts, end_ts);
    -- Indexes declared on parent are auto-created; add extras here if needed
  END IF;
END$$;
$$);

2) Retention: drop partitions older than N months

SELECT cron.schedule('events_retention_drop',
  '10 0 1 * *',  -- monthly
$$
DO $$
DECLARE cutoff date := date_trunc('month', now())::date - interval '13 months';
DECLARE r record;
BEGIN
  FOR r IN
    SELECT inhrelid::regclass AS child
    FROM   pg_inherits i
    JOIN   pg_class   c ON c.oid = i.inhrelid
    WHERE  i.inhparent = 'events'::regclass
    AND    c.relname ~ '^events_\d{4}_\d{2}$'
  LOOP
    -- Get partition bounds from pg_partitioned/pg_range? Simplify via name:
    IF to_date(substring(r.child::text, '(\d{4}_\d{2})'), 'YYYY_MM') < cutoff THEN
      EXECUTE format('DROP TABLE %s;', r.child);
    END IF;
  END LOOP;
END$$;
$$);

3) Hash fanout change (8→16) with ATTACH/DETACH

You can re-shard by creating a new set of 16 hash partitions and INSERT INTO new SELECT FROM old per remainder group, then DETACH/ATTACH under a maintenance window. Keep it rare—fanout churn is operationally expensive.


Migrating an existing big table (zero-ish downtime plan)

  1. Create a new partitioned table with the desired layout (Range/Hash/Hybrid).
  2. Backfill in chunks (INSERT … SELECT … WHERE ts BETWEEN …) oldest to newest.
  3. Dual-write via application or logical replication (e.g., pglogical) during cutover window.
  4. Swap: rename old → _old, new → original name; update sequences/synonyms.
  5. Keep _old read-only for a week, then archive/drop.

Be honest: true zero-downtime needs app cooperation and rehearsal. Don’t wing it on prod Friday night.


Common pitfalls (and how to dodge them)

  • Forgetting next month’s partition → inserts hit DEFAULT and bloat it. Pre-create on a schedule.
  • Unique index doesn’t include partition key → won’t be enforceable across partitions.
  • Too many tiny partitions → planner overhead > pruning win.
  • Functions around the key (e.g., WHERE date(ts) = …) → kills pruning. Use range predicates.
  • Reindexing everything “just in case” → wasteful. Target hot, large partitions.

Conclusion & takeaways

  • Start with Monthly Range for time-series.
  • Use Hash when your access is tenant/key oriented without strong time filters.
  • Combine them (Hybrid) for very hot current data with multi-tenant skew.
  • Automate creation and retention via pg_cron (or your scheduler).
  • Keep constraints/indexes partition-key aware to preserve correctness and pruning.

Next step: pick a pilot table, measure before/after with EXPLAIN (ANALYZE, BUFFERS) and pg_stat_statements, and iterate.


Internal link ideas

  • PostgreSQL Indexing Strategy: B-Tree vs GIN for JSONB
  • Autovacuum & Vacuum Tuning for Large Tables
  • Designing Time-Series Schemas in Relational vs NoSQL
  • Query Hygiene: Writing Prune-Friendly Predicates

Image prompt

“A clean, modern data architecture diagram showing a PostgreSQL table partitioned by monthly range with hash subpartitions for the current month. Include arrows indicating partition pruning for time filters and a separate path for retention drops. Minimalistic, high contrast, isometric 3D, blue/gray palette.”

Tags

#PostgreSQL #Partitioning #DataEngineering #TimeSeries #Performance #Scalability #pgcron #SQL #DatabaseDesign #Ops

Leave a Reply

Your email address will not be published. Required fields are marked *