PostgreSQL – Data/ML Engineer Blog

PostgreSQL at Scale for Data Engineers: Partitioning, Indexes, and Query Plans That Actually Work

Why this matters

You ship pipelines, not whitepapers. When a fact table jumps from 50M to 5B rows, “we’ll add an index later” turns into 2 a.m. firefighting. PostgreSQL can absolutely power serious analytics and mixed workloads—if you respect its planner, choose the right indexes, and partition with intent. This guide is the pragmatic, mid-level path from “it runs” to “it scales.”

PostgreSQL mental model (for builders)

Think of Postgres as:

Row store with rich types, transactions, MVCC, and a very smart cost-based planner.
Disk-first: I/O patterns, statistics, and visibility maps dominate performance.
Modular: extensions (e.g., pg_stat_statements), FDWs, logical replication.
Honest: if your stats or indexes are wrong, it will do exactly what you told it—even if that’s a table scan from hell.

Query planning you can trust (and verify)

If you don’t read plans, you’re guessing.

-- Always check actual timing, buffers, and parallelism decisions
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT user_id, sum(amount)
FROM payments
WHERE created_at >= now() - interval '7 days'
GROUP BY 1;

What to look for:

Seq Scan vs Index Scan: did Postgres use the right path?
Rows: estimated vs actual—if off by 10×, your stats need work (ANALYZE, default_statistics_target).
Buffers: how much I/O? If it’s all shared reads, increase cache hit ratio or narrow the scans.
Parallel workers: did it go parallel? If not, is max_parallel_workers_per_gather too low or the plan not parallel-safe?

The index types that matter (and when)

Index	Best for	Avoid when	Notes
B-tree (default)	Equality/range on sortable columns	Highly selective full-text/arrays	Most OLTP/OLAP filter/join workhorses
Hash	Pure equality	Anything else	Rarely needed; B-tree often wins
GIN	Full-text (`tsvector`), arrays, JSONB keys	Heavy writes	Great for `@>`, `?`, text search
GiST	Spatial, ranges, nearest-neighbor	High-churn tables	Used by PostGIS; flexible tree
BRIN	Massive, naturally ordered data (timestamps, IDs)	Small/chaotic tables	Micro-indexes; skip-scan large ranges

Practical tips

Composite indexes follow leftmost prefix rules: CREATE INDEX ON events (tenant_id, created_at DESC); -- Helps WHERE tenant_id = ? AND created_at > ? -- Won't help WHERE created_at > ? alone
Add covering columns to avoid lookups: CREATE INDEX ON orders (customer_id, status) INCLUDE (total_amount);
For JSONB, index paths you filter: CREATE INDEX ON sessions USING GIN (payload jsonb_path_ops); -- WHERE payload ? 'device' OR payload @> '{"country":"US"}'

Partitioning that pays for itself

Partition when one or more are true:

Table > ~200–300 GB or > ~1–2 B rows.
Predicates are time-bounded or tenant-bounded.
You routinely drop old data.

Range (time) partitioning

CREATE TABLE fact_events (
  event_ts timestamptz NOT NULL,
  tenant_id int NOT NULL,
  payload jsonb
) PARTITION BY RANGE (event_ts);

CREATE TABLE fact_events_2025_11 PARTITION OF fact_events
FOR VALUES FROM ('2025-11-01') TO ('2025-12-01');

-- Add next month ahead of time
CREATE TABLE fact_events_2025_12 PARTITION OF fact_events
FOR VALUES FROM ('2025-12-01') TO ('2026-01-01');

Why it works

Pruning: planner touches only relevant partitions.
Maintenance: VACUUM/ANALYZE stats stay accurate per partition.
Lifecycle: drop old partitions fast (metadata-only).

Gotchas

Keep default partition to catch stragglers.
Ensure partition key appears in your common filters, or pruning won’t trigger.
Index each partition (or use global indexes in Postgres 17+ if available in your environment).

Ingestion that doesn’t melt disks

Use `COPY`, not a million inserts

COPY staging_payments FROM '/data/payments_2025_11.csv' CSV HEADER;
-- Or from app host:
\copy staging_payments FROM 'payments.csv' CSV HEADER

Batch into the target with constraints

INSERT INTO payments (id, user_id, amount, created_at)
SELECT id, user_id, amount, created_at
FROM staging_payments
ON CONFLICT (id) DO UPDATE
SET amount = EXCLUDED.amount;

Throughput boosters

Disable unneeded synchronous commit for bulk loads: SET LOCAL synchronous_commit = OFF;
Pre-sort by partition key to improve locality.
Keep indexes minimal on hot write tables; add read-only indexes post-load or off-peak.

Replication & HA you won’t regret

Physical (streaming) replication: byte-level, great for HA/readonly replicas.
Logical replication: table-level, schema-aware, supports selective replication and version-skewed upgrades.

-- On publisher
CREATE PUBLICATION pub_payments FOR TABLE payments;

-- On subscriber
CREATE SUBSCRIPTION sub_payments
CONNECTION 'host=primary dbname=app user=replicator password=***'
PUBLICATION pub_payments;

Use logical for: blue/green cutovers, feeding downstream services, multi-tenant splits.
Use physical for: simple HA, read scaling, disaster recovery.

Observability and tuning that move the needle

Always install these extensions

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pg_trgm;      -- for LIKE/ILIKE on text

Daily habits

Track top normalized queries (pg_stat_statements).
Pin a slow-query budget (e.g., p95 < 200 ms) and tune to it.
Vacuum debt is real: monitor pg_stat_all_tables.vacuum_count, bloat estimates, and autovacuum lag.

Sensible starting points (not dogma)

Setting	Why it matters	Starting heuristic
`work_mem`	Sort/hash per node	32–128 MB; size to avoid disk spill in typical plans
`shared_buffers`	Page cache	25% of RAM (cap sensibly)
`maintenance_work_mem`	`VACUUM/CREATE INDEX`	512 MB–2 GB during maintenance
`effective_cache_size`	Planner hint	~50–75% of RAM
`max_parallel_workers_per_gather`	Parallelism	2–4 for analytics nodes
`autovacuum_vacuum_scale_factor`	Bloat control	Lower for big tables (e.g., 0.05)

Brutal truth: if you never read EXPLAIN (ANALYZE, BUFFERS) and you never tune autovacuum, your “scaling problem” is self-inflicted.

Real example: tenant-sharded + time-partitioned fact table

-- 1) Parent table
CREATE TABLE fact_pageviews (
  ts timestamptz NOT NULL,
  tenant_id int NOT NULL,
  user_id bigint,
  page text,
  attrib jsonb,
  PRIMARY KEY (tenant_id, ts, user_id)
) PARTITION BY RANGE (ts);

-- 2) Monthly partitions + per-partition indexes
-- (repeat per month; automate this)
CREATE TABLE fact_pageviews_2025_11 PARTITION OF fact_pageviews
FOR VALUES FROM ('2025-11-01') TO ('2025-12-01');

CREATE INDEX ON fact_pageviews_2025_11 (tenant_id, ts DESC);
CREATE INDEX ON fact_pageviews_2025_11 USING BRIN (ts);

-- 3) Query pattern that prunes + uses index
EXPLAIN (ANALYZE, BUFFERS)
SELECT tenant_id, date_trunc('day', ts) AS d, count(*) AS views
FROM fact_pageviews
WHERE tenant_id = 42
  AND ts >= '2025-11-15' AND ts < '2025-11-22'
GROUP BY 1, 2;

Why this scales

Tenant filter hits the leading index column.
Time filter prunes to one partition + BRIN skip-scans.
Daily rollups remain index-friendly without full table scans.

Common pitfalls (seen in the wild)

Singleton inserts for bulk loads → use COPY or batched inserts.
JSONB everywhere → model hot filters as proper columns; index only what you query.
“One giant index to rule them all” → composite indexes tuned to query shapes beat kitchen-sink indexes.
No partition lifecycle → forgetting to create/drop partitions turns pruning into pain.
Vacuum starvation → long-running transactions block cleanup and cause bloat; watch for idle in transaction.

Quick checklist for new tables

Define primary access patterns (filters, joins, sort).
Choose minimal composite indexes to satisfy them.
Decide partitioning (usually monthly by time for event data).
Wire EXPLAIN into CI for heavy queries.
Install and review pg_stat_statements weekly.
Automate partition creation and retention.
Confirm autovacuum thresholds fit table churn.

(When publishing, link each bullet to the corresponding page on postgresql.org.)

Summary & call to action

PostgreSQL isn’t “just OLTP.” With sane indexes, partitioning, and a habit of reading plans, it comfortably powers multi-billion-row warehouses, event pipelines, and mixed workloads. Start with your query shapes, keep your stats honest, and automate the boring lifecycle knobs.

CTA: Pick one painful query today. Run EXPLAIN (ANALYZE, BUFFERS), add the one index it truly needs, and set up pg_stat_statements to keep yourself honest next week.

Image prompt (for DALL·E / Midjourney)

“A clean, modern data architecture diagram illustrating a PostgreSQL cluster with partitioned tables, read replicas, and query plans flowing through indexes — minimalistic, high contrast, 3D isometric style.”

Pitch ideas (choose one next)

“Parallel Query and Work Mem: Making Postgres Use All Those Cores” — practical knobs and before/after plans.

“Designing Composite Indexes in PostgreSQL: A Data Engineer’s Playbook” — deep on leftmost rules, INCLUDE columns, and plan reading.

“From Heap to Healthy: Vacuum, Bloat, and MVCC Explained for Humans” — practical autovacuum tuning with measurable thresholds.

“PostgreSQL Partitioning in Practice: Monthly, Hash, and Hybrid Patterns” — blueprints + automation snippets.

“Logical vs Physical Replication: Blue/Green, Zero-Downtime Upgrades, and Read Scaling” — real cutover runbooks.

“Ingesting a Billion Rows: COPY, Staging Tables, and Idempotent Upserts” — reproducible ingestion pipelines.

“Query Plans that Don’t Lie: Building a Performance Review with pg_stat_statements” — set targets, track regressions.

“JSONB Without Regret: When to Model vs Index Paths” — decision tree + costed examples.

Data/ML Engineer Blog

PostgreSQL at Scale for Data Engineers: Partitioning, Indexes, and Query Plans That Actually Work

Why this matters

PostgreSQL mental model (for builders)

Query planning you can trust (and verify)

The index types that matter (and when)

Partitioning that pays for itself

Range (time) partitioning

Ingestion that doesn’t melt disks

Use `COPY`, not a million inserts

Batch into the target with constraints

Replication & HA you won’t regret

Observability and tuning that move the needle

Always install these extensions

Sensible starting points (not dogma)

Real example: tenant-sharded + time-partitioned fact table

Common pitfalls (seen in the wild)

Quick checklist for new tables

Internal link ideas (official docs only)

Summary & call to action

Image prompt (for DALL·E / Midjourney)

Tags

Pitch ideas (choose one next)

YOU MAY HAVE MISSED

Monitoring 101 for Data Engineers

Materialized Views in the Real World

Kafka Ingestion with Apache Doris Routine Load

Structured Logging 101

Data/ML Engineer Blog

PostgreSQL at Scale for Data Engineers: Partitioning, Indexes, and Query Plans That Actually Work

Why this matters

PostgreSQL mental model (for builders)

Query planning you can trust (and verify)

The index types that matter (and when)

Partitioning that pays for itself

Range (time) partitioning

Ingestion that doesn’t melt disks

Use COPY, not a million inserts

Batch into the target with constraints

Replication & HA you won’t regret

Observability and tuning that move the needle

Always install these extensions

Sensible starting points (not dogma)

Real example: tenant-sharded + time-partitioned fact table

Common pitfalls (seen in the wild)

Quick checklist for new tables

Internal link ideas (official docs only)

Summary & call to action

Image prompt (for DALL·E / Midjourney)

Tags

Pitch ideas (choose one next)

YOU MAY HAVE MISSED

Use `COPY`, not a million inserts