Query Plans that Don’t Lie: Building a Performance Review with `pg_stat_statements`

Hook: Your app got slower after the last release, and everyone’s got a theory: “it’s the network,” “it’s JIT,” “it’s that new index.” Opinions don’t fix latency. Plans and measurements do. In PostgreSQL, the combo of pg_stat_statements + disciplined plan reviews gives you a repeatable way to set performance targets, spot regressions, and prove the fix—without guesswork.

Why this matters

Speed is a feature. Users feel p95, not averages.
Regressions hide in plain sight. One small query change can dominate total database time.
Query plans never lie—estimates do. Use EXPLAIN (ANALYZE, BUFFERS) to validate reality.
pg_stat_statements is your X-ray. It fingerprints queries, aggregates time, I/O, and call counts across the cluster.

Architecture: a lightweight “Performance Review” loop

Collect: Enable pg_stat_statements. Snapshot its metrics every N minutes into a “metrics” schema.
Target: Define SLOs per query family (p95 latency, total DB time budget, rows/second).
Compare: Daily/weekly rollups. Flag WoW deltas > X% in p95 or total_exec_time.
Inspect: For flagged queries, run EXPLAIN (ANALYZE, BUFFERS) on a representative input—plans don’t lie.
Fix: Indexing, rewrite, planner nudges, stats refresh, or config changes (work_mem, jit, parallel).
Prove: Re-run the same plan; watch the metric roll back below target.
Prevent: Gate PRs with a small load-test plus a SQL baseline comparison.

This loop is simple enough for a single engineer, robust enough for an SRE team.

Setup: turn on `pg_stat_statements`

-- postgresql.conf (or ALTER SYSTEM SET)
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all          -- include nested
pg_stat_statements.track_utility = off  -- focus on DML/SELECT
pg_stat_statements.save = on

-- restart or reload as required, then:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Tip: Set pg_stat_statements.normalize = on (default) so parameterized queries get the same fingerprint.

What to watch (cheat sheet)

Field (view)	What it tells you	Use it to…
`queryid`, `query`	Fingerprint + normalized SQL	Group and label “query families”
`calls`	Frequency	Separate “hot” from “rare”
`total_exec_time`	Time sink in ms	Find top offenders by total DB time
`mean_exec_time`, `min/max`	Averages & spread	Track shifts vs. baseline
`rows`	Rows returned	Spot row explosion
`shared_blks_read/hit`	Buffer activity	Distinguish I/O-bound vs cache-hot
`temp_blks_read/written`	Temp files	Identify sort/hash spilling
`wal_bytes`	Write amplification	Evaluate write-heavy patterns

Versions differ slightly; use your cluster’s columns accordingly.

Create a durable metrics pipeline

1) Snapshot table + view

CREATE SCHEMA IF NOT EXISTS perf;

CREATE TABLE IF NOT EXISTS perf.pgss_snapshot (
  captured_at      timestamptz NOT NULL DEFAULT now(),
  userid           oid,
  dbid             oid,
  queryid          bigint,
  query            text,
  calls            bigint,
  total_exec_time  double precision,
  mean_exec_time   double precision,
  rows             bigint,
  shared_blks_hit  bigint,
  shared_blks_read bigint,
  temp_blks_read   bigint,
  temp_blks_written bigint,
  wal_bytes        numeric
);

-- Periodic snapshot (e.g., every 5–10 min via cron or pg_cron):
INSERT INTO perf.pgss_snapshot
SELECT now(), userid, dbid, queryid, query, calls,
       total_exec_time, mean_exec_time, rows,
       shared_blks_hit, shared_blks_read,
       temp_blks_read, temp_blks_written, wal_bytes
FROM pg_stat_statements;

2) Daily rollup for trend analysis

CREATE MATERIALIZED VIEW IF NOT EXISTS perf.pgss_daily AS
SELECT date_trunc('day', captured_at) AS day,
       queryid,
       max(query) FILTER (WHERE query IS NOT NULL) AS sample_query,
       sum(calls) AS calls,
       sum(total_exec_time) AS total_exec_ms,
       percentile_disc(0.95) WITHIN GROUP (ORDER BY mean_exec_time) AS p95_ms,
       sum(shared_blks_read) AS reads,
       sum(shared_blks_hit)  AS hits,
       sum(temp_blks_written) AS temp_writes
FROM perf.pgss_snapshot
GROUP BY 1,2;

-- Refresh nightly

3) Regression detector (week-over-week)

WITH curr AS (
  SELECT day, queryid, p95_ms, total_exec_ms
  FROM perf.pgss_daily
  WHERE day >= date_trunc('week', now()) - interval '0 week'
),
prev AS (
  SELECT day, queryid, p95_ms, total_exec_ms
  FROM perf.pgss_daily
  WHERE day >= date_trunc('week', now()) - interval '1 week'
    AND day <  date_trunc('week', now())
),
agg AS (
  SELECT c.queryid,
         percentile_disc(0.95) WITHIN GROUP (ORDER BY c.p95_ms) AS curr_p95,
         percentile_disc(0.95) WITHIN GROUP (ORDER BY p.p95_ms) AS prev_p95,
         sum(c.total_exec_ms) AS curr_total_ms,
         sum(p.total_exec_ms) AS prev_total_ms
  FROM curr c JOIN prev p USING (queryid)
  GROUP BY 1
)
SELECT queryid,
       round((curr_p95 - prev_p95) / NULLIF(prev_p95,0) * 100, 1) AS p95_delta_pct,
       round((curr_total_ms - prev_total_ms)/NULLIF(prev_total_ms,0)*100, 1) AS total_ms_delta_pct
FROM agg
WHERE (curr_p95 > prev_p95 * 1.25) OR (curr_total_ms > prev_total_ms * 1.25)
ORDER BY p95_delta_pct DESC NULLS LAST;

Start with a 25% threshold; tune to your noise level.

Make plans tell the truth

When a query flags as regressed, capture a real plan:

EXPLAIN (ANALYZE, BUFFERS, WAL, VERBOSE, TIMING)
SELECT ...
WHERE ...
ORDER BY ...
LIMIT ...;

Read it like a pro:

Node that dominates time: The tallest bar in the plan tree.
Rows vs. Rows Removed: Misestimates show up as “expected vs actual” row counts drifting orders of magnitude.
Loops: Nested Loop with high loop count + seq scan inside = pain.
Buffers: shared read spikes = I/O-bound; temp written = spills.
Filters/Join conditions: Look for non-sargable predicates and implicit casts.

Real example: fixing a silent scan

Symptom: WoW detector flags queryid=… with +60% p95 and +90% total_exec_ms.
Plan: Seq Scan on orders with filter created_at >= now() - interval '7 days' and status = 'PAID'::text.

Root cause: Missing composite index + poor selectivity estimate after a data skew.

Fix:

CREATE INDEX CONCURRENTLY idx_orders_paid_recent
ON orders (status, created_at DESC)
INCLUDE (customer_id, total);

Proof: Rerun EXPLAIN (ANALYZE, BUFFERS). Look for Index Only Scan and a 10–100x drop in actual time and shared reads.
Guardrail: Add a test that asserts p95 under your target for a 7-day hot window workload.

SLOs that don’t wiggle

Per query family (by queryid or a curated label):
- p95 latency (e.g., < 40 ms)
- Total DB time per day (e.g., < 8% of cluster time)
- No temp spills on hot paths
- Cache efficiency: hit/read ratio above X
Release gate: Reject if any hot query exceeds +25% p95 vs baseline on a 10–15 min synthetic load.

Best practices

Budget by total time, not just p95. One “fast” but ultra-hot query can eat half your CPU.
Label query families. Store a curated mapping table perf.query_labels(queryid, label, owner).
Reset wisely. Use SELECT pg_stat_statements_reset(); after a deploy to isolate effects—but only after snapshotting.
Vacuum/Analyze discipline. Stale stats = wrong plans. Analyze big tables after bulk changes.
Guard work_mem. Unbounded sorts/hash joins spill; prefer narrower SELECTs and right-sized work_mem.
Parameter stability. Avoid plan flip-flops: stabilize with plan_cache_mode = force_custom_plan only for outliers, or rewrite to keep cardinalities predictable.
JIT & parallelism. Test toggles per query. Some OLTP paths hate JIT; some analytics love parallel.
Composite indexes > Swiss-army knives. Build for access patterns, not for “maybe someday.”

Common pitfalls

Chasing estimates. The planner’s estimates are hints. Only ANALYZE time is truth.
Averages hide pain. Track p95/p99 and total_exec_time.
Ignoring temp IO. Spills show up as temp blocks and crush latency under load.
One-shot fire drills. If you don’t snapshot, you can’t prove the fix—or the regression.
Mystery parameters. Implicit casts (text vs varchar, timestamp vs timestamptz) can nuke indexes.

Minimal Python collector (optional)

import psycopg2, datetime as dt

INSERT_SQL = """
INSERT INTO perf.pgss_snapshot
(captured_at, userid, dbid, queryid, query, calls, total_exec_time,
 mean_exec_time, rows, shared_blks_hit, shared_blks_read, temp_blks_read,
 temp_blks_written, wal_bytes)
SELECT now(), userid, dbid, queryid, query, calls, total_exec_time,
       mean_exec_time, rows, shared_blks_hit, shared_blks_read,
       temp_blks_read, temp_blks_written, wal_bytes
FROM pg_stat_statements;
"""

with psycopg2.connect(dsn="dbname=app user=collector sslmode=require") as conn:
    with conn.cursor() as cur:
        cur.execute(INSERT_SQL)

Run every 5–10 minutes via cron/systemd/pg_cron.

Internal link ideas (add on your site)

“Indexing Strategies that Actually Move the Needle”
“Vacuum, Autovacuum, and Bloat: A Practical Guide”
“Connection Pooling with pgbouncer for OLTP at Scale”
“Write-Path Tuning: Fsync, WAL, and Checkpointing”

Conclusion & Takeaways

Plan truth > estimate guesses. Always validate with EXPLAIN (ANALYZE, BUFFERS).
pg_stat_statements turns noise into numbers. Track p95 and total DB time by query family.
Automate snapshots and weekly diffs. Regressions surface themselves.
Prove every fix. Same query, same input, better plan, lower time—or it didn’t happen.

Call to action: Start by enabling pg_stat_statements today. Ship the snapshot table, add the WoW regression query, and pick one hot query to fix by end of week. Plans don’t lie—make them work for you.

Image prompt

“A clean, modern data architecture dashboard showing a PostgreSQL cluster with pg_stat_statements metrics, and a side-by-side EXPLAIN ANALYZE plan—minimalistic, high contrast, isometric 3D, subtle blue/green palette, crisp labels for p95, total time, buffers.”

Data/ML Engineer Blog