Logical vs Physical Replication in PostgreSQL: Blue/Green, Zero-Downtime Upgrades, and Read Scaling (with Real Cutover Runbooks)

Ever scheduled a “maintenance window” that turns into a midnight firefight? You want safer releases, faster failovers, and painless upgrades—without telling the business to stop clicking. This guide shows exact PostgreSQL runbooks for blue/green, major-version upgrades with near-zero downtime, and read scaling—plus when to pick physical vs logical replication.

TL;DR

Physical replication (WAL shipping) is your default for HA + read replicas. Fast, byte-for-byte, minimal overhead.
Logical replication is for cross-version upgrades, selective table sync, multi-region fan-out, and online migrations.
Blue/green cutovers are safer with logical (controlled, table-level), but physical is simpler for pure switchover/failover.
Real runbooks below: copy/paste ready and opinionated.

Why this matters
Replication models in plain English
Physical replication: when and how
Logical replication: when and how
Blue/Green deployments
Zero-downtime major upgrades
Read scaling patterns
Real cutover runbooks (physical + logical)
Best practices & pitfalls
Summary & next steps

1) Why This Matters

SLAs don’t care that VACUUM ran long.
Major upgrades (e.g., 13 → 16) shouldn’t require hours of downtime.
Read scaling avoids overprovisioning the primary.
Blue/green gives you reversible releases.

2) Replication Models in Plain English

Aspect	Physical Replication	Logical Replication
Mechanism	WAL block-level shipping/replay	Row-level change streams (publications/subscriptions)
Scope	Whole cluster/DB (binary)	Selected tables/schemas
Version Constraints	Same major version (usually)	Cross-version supported
Use Cases	HA, read replicas, DR	Upgrades, partial migration, multi-tenant fan-out
Latency	Very low	Low to moderate (depends on apply delay)
DDL Handling	All structure copied via WAL	DDL not replicated; you manage it
Sequences	Replayed WAL (in sync)	Not replicated; handle manually
Large Objects	Handled	Not replicated via logical (requires workaround)
Complexity	Lower	Higher (DDL/PK/identity, slot mgmt)

3) Physical Replication: When and How

Use it for:

HA + read replicas in same major version.
Low-latency read scaling.
Simple blue/green within same PG major.

Core setup (primary):

# postgresql.conf
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
wal_keep_size = '2GB'      # tune for network hiccups

# pg_hba.conf
host  replication  repl_user  10.0.0.0/16  md5

Create replica (standby):

# On the standby (empty data dir)
pg_basebackup -h PRIMARY_HOST -U repl_user -D /pgdata -X stream -P
# Start the standby; recovery.conf is now in postgresql.auto.conf via standby.signal

Monitor:

-- On primary
select pid,state,client_addr, sent_lsn, write_lsn, flush_lsn, replay_lsn
from pg_stat_replication;

-- On standby
select now() - pg_last_xact_replay_timestamp() as replay_delay;

4) Logical Replication: When and How

Use it for:

Major version upgrades (e.g., 12 → 16).
Migrating a subset of tables.
Multi-region selective fan-out.

Requirements & gotchas:

Every replicated table must have a PRIMARY KEY or a REPLICA IDENTITY FULL.
DDL not auto-replicated—run schema changes on both sides.
Sequences & large objects need explicit handling.

Enable and create a publication (source):

# postgresql.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

-- On source DB
create role sub_user with login replication password '***';
grant select on all tables in schema public to sub_user;

-- Publication (all tables or selective)
create publication pub_all for all tables; 
-- or: create publication pub_sel for table public.orders, public.customers;

Create subscription (target):

-- Target DB (possibly higher major version)
create subscription sub_migrate
  connection 'host=SRC user=sub_user password=*** dbname=appdb'
  publication pub_all
  with (copy_data = true, create_slot = true, enabled = true);

Lag + health:

select * from pg_stat_subscription;

5) Blue/Green Deployments

Blue = current production (source of truth).
Green = new stack (new PG version or new hardware).
Use logical to stream data from Blue → Green while you validate.
Switch traffic via connection string flip or load balancer.
Keep Blue in read-only for a short soak; if issues, reverse (Green → Blue) if you planned for it.

6) Zero-Downtime Major Upgrades

Two main paths:

pg_upgrade (fast, requires downtime; can be minutes with link mode, but still some downtime).
Logical replication (near-zero downtime; more moving parts).

This article focuses on logical because you asked for zero-downtime.

7) Read Scaling Patterns

Primary + hot standbys (physical) behind a read router (PgBouncer/HAProxy).
Pin read-heavy queries to replicas; avoid long transactions that block VACUUM on standbys.
Use default_transaction_read_only=on on read pools.
Beware replica lag for read-your-writes scenarios; route those to primary.

8) Real Cutover Runbooks

A) Physical Switchover (Same Version, Blue/Green Infra)

Goal: Move writes from Primary (Blue) to Standby (Green) with seconds of downtime.

Pre-checks

pg_stat_replication shows 0–X seconds lag.
synchronous_commit = on with sync replica (optional for strict RPO=0).
Client retry logic tested.

Steps

Freeze writes briefly
- Put app in read-only/maintenance for ~30–60s (or block at DB):
alter system set default_transaction_read_only = on; select pg_reload_conf();
Ensure replica caught up
- On standby: select pg_last_wal_replay_lsn();
- On primary: ensure sent_lsn == replay_lsn on standby.
Promote standby (Green) pg_ctl promote -D /pgdata
Repoint clients to Green.
Demote Blue to standby (optional):
- Stop Blue, pg_rewind Blue from Green, start as follower.
Unfreeze writes alter system reset default_transaction_read_only; select pg_reload_conf();

Rollback

If promote fails, keep Blue as primary and reattempt after fixing replication.

B) Logical Blue/Green Cutover (Cross-Version or Selective Tables)

Goal: Keep Blue (old version) live, stream to Green (new version), cut over with <30s write pause.

Prep

On Green, create schema manually (DDL not replicated).
Ensure primary keys exist for all tables.
Migrate sequences: pre-seed on Green with high ranges or move to generated always as identity in v10+; capture last values before cut.

Steps

Start replication -- On Blue create publication pub_all for all tables; -- On Green create subscription sub_all connection 'host=BLUE user=sub_user password=*** dbname=appdb' publication pub_all with (copy_data = true, create_slot = true);
Wait to catch up
- select * from pg_stat_subscription; (should show minimal latency).
Quiesce writes (short freeze)
- Put app in maintenance or block writes by revoking permissions temporarily:
revoke insert, update, delete on all tables in schema public from app_role;
Apply final DDL & sequence sync
- Recheck schema drift.
- For sequences:
-- On Blue select 'select setval('''||relname||''','||last_value||');' from pg_sequences where schemaname='public'; -- Run results on Green
Stop subscription on Green alter subscription sub_all disable;
Flip traffic to Green
- Update service DNS/connection strings.
Re-enable writes on Green grant insert, update, delete on all tables in schema public to app_role;
Keep Blue read-only for soak (optional). alter system set default_transaction_read_only = on; select pg_reload_conf();

Rollback

If errors on Green, point traffic back to Blue, re-enable subscription (or recreate), and investigate.

C) Read Scaling with Physical Replicas

Goal: Add read replicas safely and route analytics there.

Steps

Create standby via pg_basebackup.
Tag replica pool in your router (PgBouncer: separate database section).
Ensure long-running queries don’t block WAL replay: set hot_standby_feedback=on (with caution—can bloat primary) or cap query timeouts on replicas.
Add monitoring on replica lag and conflicts:

select * from pg_stat_database_conflicts;

9) Best Practices & Common Pitfalls

Physical

Keep wal_keep_size big enough for network stalls; otherwise replicas fall off the cliff.
Use replication slots to prevent WAL removal—but monitor disk!
Test pg_rewind regularly.

Logical

DDL drift kills cutovers. Use a migrations gate to apply schema to both sides.
Sequences are not replicated. Plan ranges or a final sync step.
Replica identity: ensure PK or REPLICA IDENTITY FULL before enabling.
Large initial copies can throttle the primary—schedule outside peak hours and consider max_sync_workers_per_subscription.

Blue/Green

Treat connection cutover as its own change with rehearsals.
Keep a reversible path for at least one day.

Observability

Dashboards for pg_stat_replication, pg_stat_subscription, WAL usage, replication slot lag, conflicts, and disk.
Alert on apply lag and WAL retention headroom.

Security

Separate least-privilege roles for replication vs application.
Restrict pg_hba.conf to exact subnets/hosts. TLS if crossing networks.

10) Conclusion & Takeaways

Default to physical replication for HA and read replicas.
Use logical replication for cross-version upgrades and precise blue/green migrations.
Practice the cutover. The first time should not be production.

Your action plan this week:

Stand up a test cluster and rehearse both runbooks.
Automate checks for DDL drift and sequence sync.
Script a 1-click cutover that toggles app permissions and connection strings.

Internal Link Ideas (for your blog/wiki)

“PostgreSQL Indexing Deep Dive: B-tree vs BRIN vs GIN”
“Designing Idempotent Migrations with Sqitch/Flyway”
“Connection Pooling: PgBouncer Transaction Mode vs Session Mode”
“Disaster Recovery: RPO/RTO Targets and WAL Archival to S3”
“Kafka + Postgres CDC: When Logical Decoding Beats Triggers”

Appendix: Handy Commands & Queries

# Create replication role
createuser --replication repl_user
psql -c "alter role repl_user with replication login password '***';"

# WAL size insight
psql -c "select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));"

-- Who’s replicating from me?
select application_name, client_addr, state, sync_state, 
       sent_lsn, write_lsn, flush_lsn, replay_lsn
from pg_stat_replication;

-- Subscription lag
select subname, status, bytes_lag, lags.apply_lag
from pg_stat_subscription lags;

-- Conflicts on replicas
select datname, confl_tablespace, confl_lock, confl_snapshot, confl_bufferpin
from pg_stat_database_conflicts;

Image Prompt

“A clean, modern diagram contrasting physical vs logical replication in PostgreSQL. Show a Blue (primary) environment streaming WAL to read replicas, and a Green environment receiving logical changes via publication/subscription. Include a cutover switch. Minimalistic, high contrast, 3D isometric style.”

Data/ML Engineer Blog