Logical vs Physical Replication in PostgreSQL: Blue/Green, Zero-Downtime Upgrades, and Read Scaling (with Real Cutover Runbooks)
Ever scheduled a “maintenance window” that turns into a midnight firefight? You want safer releases, faster failovers, and painless upgrades—without telling the business to stop clicking. This guide shows exact PostgreSQL runbooks for blue/green, major-version upgrades with near-zero downtime, and read scaling—plus when to pick physical vs logical replication.
TL;DR
- Physical replication (WAL shipping) is your default for HA + read replicas. Fast, byte-for-byte, minimal overhead.
- Logical replication is for cross-version upgrades, selective table sync, multi-region fan-out, and online migrations.
- Blue/green cutovers are safer with logical (controlled, table-level), but physical is simpler for pure switchover/failover.
- Real runbooks below: copy/paste ready and opinionated.
Table of Contents
- Why this matters
- Replication models in plain English
- Physical replication: when and how
- Logical replication: when and how
- Blue/Green deployments
- Zero-downtime major upgrades
- Read scaling patterns
- Real cutover runbooks (physical + logical)
- Best practices & pitfalls
- Summary & next steps
1) Why This Matters
- SLAs don’t care that VACUUM ran long.
- Major upgrades (e.g., 13 → 16) shouldn’t require hours of downtime.
- Read scaling avoids overprovisioning the primary.
- Blue/green gives you reversible releases.
2) Replication Models in Plain English
| Aspect | Physical Replication | Logical Replication |
|---|---|---|
| Mechanism | WAL block-level shipping/replay | Row-level change streams (publications/subscriptions) |
| Scope | Whole cluster/DB (binary) | Selected tables/schemas |
| Version Constraints | Same major version (usually) | Cross-version supported |
| Use Cases | HA, read replicas, DR | Upgrades, partial migration, multi-tenant fan-out |
| Latency | Very low | Low to moderate (depends on apply delay) |
| DDL Handling | All structure copied via WAL | DDL not replicated; you manage it |
| Sequences | Replayed WAL (in sync) | Not replicated; handle manually |
| Large Objects | Handled | Not replicated via logical (requires workaround) |
| Complexity | Lower | Higher (DDL/PK/identity, slot mgmt) |
3) Physical Replication: When and How
Use it for:
- HA + read replicas in same major version.
- Low-latency read scaling.
- Simple blue/green within same PG major.
Core setup (primary):
# postgresql.conf
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
wal_keep_size = '2GB' # tune for network hiccups
# pg_hba.conf
host replication repl_user 10.0.0.0/16 md5
Create replica (standby):
# On the standby (empty data dir)
pg_basebackup -h PRIMARY_HOST -U repl_user -D /pgdata -X stream -P
# Start the standby; recovery.conf is now in postgresql.auto.conf via standby.signal
Monitor:
-- On primary
select pid,state,client_addr, sent_lsn, write_lsn, flush_lsn, replay_lsn
from pg_stat_replication;
-- On standby
select now() - pg_last_xact_replay_timestamp() as replay_delay;
4) Logical Replication: When and How
Use it for:
- Major version upgrades (e.g., 12 → 16).
- Migrating a subset of tables.
- Multi-region selective fan-out.
Requirements & gotchas:
- Every replicated table must have a PRIMARY KEY or a REPLICA IDENTITY FULL.
- DDL not auto-replicated—run schema changes on both sides.
- Sequences & large objects need explicit handling.
Enable and create a publication (source):
# postgresql.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
-- On source DB
create role sub_user with login replication password '***';
grant select on all tables in schema public to sub_user;
-- Publication (all tables or selective)
create publication pub_all for all tables;
-- or: create publication pub_sel for table public.orders, public.customers;
Create subscription (target):
-- Target DB (possibly higher major version)
create subscription sub_migrate
connection 'host=SRC user=sub_user password=*** dbname=appdb'
publication pub_all
with (copy_data = true, create_slot = true, enabled = true);
Lag + health:
select * from pg_stat_subscription;
5) Blue/Green Deployments
- Blue = current production (source of truth).
- Green = new stack (new PG version or new hardware).
- Use logical to stream data from Blue → Green while you validate.
- Switch traffic via connection string flip or load balancer.
- Keep Blue in read-only for a short soak; if issues, reverse (Green → Blue) if you planned for it.
6) Zero-Downtime Major Upgrades
Two main paths:
- pg_upgrade (fast, requires downtime; can be minutes with link mode, but still some downtime).
- Logical replication (near-zero downtime; more moving parts).
This article focuses on logical because you asked for zero-downtime.
7) Read Scaling Patterns
- Primary + hot standbys (physical) behind a read router (PgBouncer/HAProxy).
- Pin read-heavy queries to replicas; avoid long transactions that block VACUUM on standbys.
- Use
default_transaction_read_only=onon read pools. - Beware replica lag for read-your-writes scenarios; route those to primary.
8) Real Cutover Runbooks
A) Physical Switchover (Same Version, Blue/Green Infra)
Goal: Move writes from Primary (Blue) to Standby (Green) with seconds of downtime.
Pre-checks
pg_stat_replicationshows 0–X seconds lag.synchronous_commit = onwith sync replica (optional for strict RPO=0).- Client retry logic tested.
Steps
- Freeze writes briefly
- Put app in read-only/maintenance for ~30–60s (or block at DB):
alter system set default_transaction_read_only = on; select pg_reload_conf(); - Ensure replica caught up
- On standby:
select pg_last_wal_replay_lsn(); - On primary: ensure
sent_lsn == replay_lsnon standby.
- On standby:
- Promote standby (Green)
pg_ctl promote -D /pgdata - Repoint clients to Green.
- Demote Blue to standby (optional):
- Stop Blue,
pg_rewindBlue from Green, start as follower.
- Stop Blue,
- Unfreeze writes
alter system reset default_transaction_read_only; select pg_reload_conf();
Rollback
- If promote fails, keep Blue as primary and reattempt after fixing replication.
B) Logical Blue/Green Cutover (Cross-Version or Selective Tables)
Goal: Keep Blue (old version) live, stream to Green (new version), cut over with <30s write pause.
Prep
- On Green, create schema manually (DDL not replicated).
- Ensure primary keys exist for all tables.
- Migrate sequences: pre-seed on Green with high ranges or move to generated always as identity in v10+; capture last values before cut.
Steps
- Start replication
-- On Blue create publication pub_all for all tables; -- On Green create subscription sub_all connection 'host=BLUE user=sub_user password=*** dbname=appdb' publication pub_all with (copy_data = true, create_slot = true); - Wait to catch up
select * from pg_stat_subscription;(should show minimallatency).
- Quiesce writes (short freeze)
- Put app in maintenance or block writes by revoking permissions temporarily:
revoke insert, update, delete on all tables in schema public from app_role; - Apply final DDL & sequence sync
- Recheck schema drift.
- For sequences:
-- On Blue select 'select setval('''||relname||''','||last_value||');' from pg_sequences where schemaname='public'; -- Run results on Green - Stop subscription on Green
alter subscription sub_all disable; - Flip traffic to Green
- Update service DNS/connection strings.
- Re-enable writes on Green
grant insert, update, delete on all tables in schema public to app_role; - Keep Blue read-only for soak (optional).
alter system set default_transaction_read_only = on; select pg_reload_conf();
Rollback
- If errors on Green, point traffic back to Blue, re-enable subscription (or recreate), and investigate.
C) Read Scaling with Physical Replicas
Goal: Add read replicas safely and route analytics there.
Steps
- Create standby via
pg_basebackup. - Tag replica pool in your router (PgBouncer: separate database section).
- Ensure long-running queries don’t block WAL replay: set
hot_standby_feedback=on(with caution—can bloat primary) or cap query timeouts on replicas. - Add monitoring on replica lag and conflicts:
select * from pg_stat_database_conflicts;
9) Best Practices & Common Pitfalls
Physical
- Keep
wal_keep_sizebig enough for network stalls; otherwise replicas fall off the cliff. - Use replication slots to prevent WAL removal—but monitor disk!
- Test
pg_rewindregularly.
Logical
- DDL drift kills cutovers. Use a migrations gate to apply schema to both sides.
- Sequences are not replicated. Plan ranges or a final sync step.
- Replica identity: ensure PK or
REPLICA IDENTITY FULLbefore enabling. - Large initial copies can throttle the primary—schedule outside peak hours and consider
max_sync_workers_per_subscription.
Blue/Green
- Treat connection cutover as its own change with rehearsals.
- Keep a reversible path for at least one day.
Observability
- Dashboards for
pg_stat_replication,pg_stat_subscription, WAL usage, replication slot lag, conflicts, and disk. - Alert on apply lag and WAL retention headroom.
Security
- Separate least-privilege roles for replication vs application.
- Restrict
pg_hba.confto exact subnets/hosts. TLS if crossing networks.
10) Conclusion & Takeaways
- Default to physical replication for HA and read replicas.
- Use logical replication for cross-version upgrades and precise blue/green migrations.
- Practice the cutover. The first time should not be production.
Your action plan this week:
- Stand up a test cluster and rehearse both runbooks.
- Automate checks for DDL drift and sequence sync.
- Script a 1-click cutover that toggles app permissions and connection strings.
Internal Link Ideas (for your blog/wiki)
- “PostgreSQL Indexing Deep Dive: B-tree vs BRIN vs GIN”
- “Designing Idempotent Migrations with Sqitch/Flyway”
- “Connection Pooling: PgBouncer Transaction Mode vs Session Mode”
- “Disaster Recovery: RPO/RTO Targets and WAL Archival to S3”
- “Kafka + Postgres CDC: When Logical Decoding Beats Triggers”
Appendix: Handy Commands & Queries
# Create replication role
createuser --replication repl_user
psql -c "alter role repl_user with replication login password '***';"
# WAL size insight
psql -c "select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));"
-- Who’s replicating from me?
select application_name, client_addr, state, sync_state,
sent_lsn, write_lsn, flush_lsn, replay_lsn
from pg_stat_replication;
-- Subscription lag
select subname, status, bytes_lag, lags.apply_lag
from pg_stat_subscription lags;
-- Conflicts on replicas
select datname, confl_tablespace, confl_lock, confl_snapshot, confl_bufferpin
from pg_stat_database_conflicts;
Image Prompt
“A clean, modern diagram contrasting physical vs logical replication in PostgreSQL. Show a Blue (primary) environment streaming WAL to read replicas, and a Green environment receiving logical changes via publication/subscription. Include a cutover switch. Minimalistic, high contrast, 3D isometric style.”
Tags
#PostgreSQL #Replication #BlueGreen #ZeroDowntime #LogicalReplication #PhysicalReplication #DatabaseUpgrades #ReadScaling #HighAvailability #DataEngineering








Leave a Reply