Logical vs Physical Replication in PostgreSQL

Logical vs Physical Replication in PostgreSQL: Blue/Green, Zero-Downtime Upgrades, and Read Scaling (with Real Cutover Runbooks)

Ever scheduled a “maintenance window” that turns into a midnight firefight? You want safer releases, faster failovers, and painless upgrades—without telling the business to stop clicking. This guide shows exact PostgreSQL runbooks for blue/green, major-version upgrades with near-zero downtime, and read scaling—plus when to pick physical vs logical replication.


TL;DR

  • Physical replication (WAL shipping) is your default for HA + read replicas. Fast, byte-for-byte, minimal overhead.
  • Logical replication is for cross-version upgrades, selective table sync, multi-region fan-out, and online migrations.
  • Blue/green cutovers are safer with logical (controlled, table-level), but physical is simpler for pure switchover/failover.
  • Real runbooks below: copy/paste ready and opinionated.

Table of Contents

  1. Why this matters
  2. Replication models in plain English
  3. Physical replication: when and how
  4. Logical replication: when and how
  5. Blue/Green deployments
  6. Zero-downtime major upgrades
  7. Read scaling patterns
  8. Real cutover runbooks (physical + logical)
  9. Best practices & pitfalls
  10. Summary & next steps

1) Why This Matters

  • SLAs don’t care that VACUUM ran long.
  • Major upgrades (e.g., 13 → 16) shouldn’t require hours of downtime.
  • Read scaling avoids overprovisioning the primary.
  • Blue/green gives you reversible releases.

2) Replication Models in Plain English

AspectPhysical ReplicationLogical Replication
MechanismWAL block-level shipping/replayRow-level change streams (publications/subscriptions)
ScopeWhole cluster/DB (binary)Selected tables/schemas
Version ConstraintsSame major version (usually)Cross-version supported
Use CasesHA, read replicas, DRUpgrades, partial migration, multi-tenant fan-out
LatencyVery lowLow to moderate (depends on apply delay)
DDL HandlingAll structure copied via WALDDL not replicated; you manage it
SequencesReplayed WAL (in sync)Not replicated; handle manually
Large ObjectsHandledNot replicated via logical (requires workaround)
ComplexityLowerHigher (DDL/PK/identity, slot mgmt)

3) Physical Replication: When and How

Use it for:

  • HA + read replicas in same major version.
  • Low-latency read scaling.
  • Simple blue/green within same PG major.

Core setup (primary):

# postgresql.conf
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
wal_keep_size = '2GB'      # tune for network hiccups
# pg_hba.conf
host  replication  repl_user  10.0.0.0/16  md5

Create replica (standby):

# On the standby (empty data dir)
pg_basebackup -h PRIMARY_HOST -U repl_user -D /pgdata -X stream -P
# Start the standby; recovery.conf is now in postgresql.auto.conf via standby.signal

Monitor:

-- On primary
select pid,state,client_addr, sent_lsn, write_lsn, flush_lsn, replay_lsn
from pg_stat_replication;

-- On standby
select now() - pg_last_xact_replay_timestamp() as replay_delay;

4) Logical Replication: When and How

Use it for:

  • Major version upgrades (e.g., 12 → 16).
  • Migrating a subset of tables.
  • Multi-region selective fan-out.

Requirements & gotchas:

  • Every replicated table must have a PRIMARY KEY or a REPLICA IDENTITY FULL.
  • DDL not auto-replicated—run schema changes on both sides.
  • Sequences & large objects need explicit handling.

Enable and create a publication (source):

# postgresql.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
-- On source DB
create role sub_user with login replication password '***';
grant select on all tables in schema public to sub_user;

-- Publication (all tables or selective)
create publication pub_all for all tables; 
-- or: create publication pub_sel for table public.orders, public.customers;

Create subscription (target):

-- Target DB (possibly higher major version)
create subscription sub_migrate
  connection 'host=SRC user=sub_user password=*** dbname=appdb'
  publication pub_all
  with (copy_data = true, create_slot = true, enabled = true);

Lag + health:

select * from pg_stat_subscription;

5) Blue/Green Deployments

  • Blue = current production (source of truth).
  • Green = new stack (new PG version or new hardware).
  • Use logical to stream data from Blue → Green while you validate.
  • Switch traffic via connection string flip or load balancer.
  • Keep Blue in read-only for a short soak; if issues, reverse (Green → Blue) if you planned for it.

6) Zero-Downtime Major Upgrades

Two main paths:

  1. pg_upgrade (fast, requires downtime; can be minutes with link mode, but still some downtime).
  2. Logical replication (near-zero downtime; more moving parts).

This article focuses on logical because you asked for zero-downtime.


7) Read Scaling Patterns

  • Primary + hot standbys (physical) behind a read router (PgBouncer/HAProxy).
  • Pin read-heavy queries to replicas; avoid long transactions that block VACUUM on standbys.
  • Use default_transaction_read_only=on on read pools.
  • Beware replica lag for read-your-writes scenarios; route those to primary.

8) Real Cutover Runbooks

A) Physical Switchover (Same Version, Blue/Green Infra)

Goal: Move writes from Primary (Blue) to Standby (Green) with seconds of downtime.

Pre-checks

  • pg_stat_replication shows 0–X seconds lag.
  • synchronous_commit = on with sync replica (optional for strict RPO=0).
  • Client retry logic tested.

Steps

  1. Freeze writes briefly
    • Put app in read-only/maintenance for ~30–60s (or block at DB):
    alter system set default_transaction_read_only = on; select pg_reload_conf();
  2. Ensure replica caught up
    • On standby: select pg_last_wal_replay_lsn();
    • On primary: ensure sent_lsn == replay_lsn on standby.
  3. Promote standby (Green) pg_ctl promote -D /pgdata
  4. Repoint clients to Green.
  5. Demote Blue to standby (optional):
    • Stop Blue, pg_rewind Blue from Green, start as follower.
  6. Unfreeze writes alter system reset default_transaction_read_only; select pg_reload_conf();

Rollback

  • If promote fails, keep Blue as primary and reattempt after fixing replication.

B) Logical Blue/Green Cutover (Cross-Version or Selective Tables)

Goal: Keep Blue (old version) live, stream to Green (new version), cut over with <30s write pause.

Prep

  • On Green, create schema manually (DDL not replicated).
  • Ensure primary keys exist for all tables.
  • Migrate sequences: pre-seed on Green with high ranges or move to generated always as identity in v10+; capture last values before cut.

Steps

  1. Start replication -- On Blue create publication pub_all for all tables; -- On Green create subscription sub_all connection 'host=BLUE user=sub_user password=*** dbname=appdb' publication pub_all with (copy_data = true, create_slot = true);
  2. Wait to catch up
    • select * from pg_stat_subscription; (should show minimal latency).
  3. Quiesce writes (short freeze)
    • Put app in maintenance or block writes by revoking permissions temporarily:
    revoke insert, update, delete on all tables in schema public from app_role;
  4. Apply final DDL & sequence sync
    • Recheck schema drift.
    • For sequences:
    -- On Blue select 'select setval('''||relname||''','||last_value||');' from pg_sequences where schemaname='public'; -- Run results on Green
  5. Stop subscription on Green alter subscription sub_all disable;
  6. Flip traffic to Green
    • Update service DNS/connection strings.
  7. Re-enable writes on Green grant insert, update, delete on all tables in schema public to app_role;
  8. Keep Blue read-only for soak (optional). alter system set default_transaction_read_only = on; select pg_reload_conf();

Rollback

  • If errors on Green, point traffic back to Blue, re-enable subscription (or recreate), and investigate.

C) Read Scaling with Physical Replicas

Goal: Add read replicas safely and route analytics there.

Steps

  1. Create standby via pg_basebackup.
  2. Tag replica pool in your router (PgBouncer: separate database section).
  3. Ensure long-running queries don’t block WAL replay: set hot_standby_feedback=on (with caution—can bloat primary) or cap query timeouts on replicas.
  4. Add monitoring on replica lag and conflicts:
select * from pg_stat_database_conflicts;

9) Best Practices & Common Pitfalls

Physical

  • Keep wal_keep_size big enough for network stalls; otherwise replicas fall off the cliff.
  • Use replication slots to prevent WAL removal—but monitor disk!
  • Test pg_rewind regularly.

Logical

  • DDL drift kills cutovers. Use a migrations gate to apply schema to both sides.
  • Sequences are not replicated. Plan ranges or a final sync step.
  • Replica identity: ensure PK or REPLICA IDENTITY FULL before enabling.
  • Large initial copies can throttle the primary—schedule outside peak hours and consider max_sync_workers_per_subscription.

Blue/Green

  • Treat connection cutover as its own change with rehearsals.
  • Keep a reversible path for at least one day.

Observability

  • Dashboards for pg_stat_replication, pg_stat_subscription, WAL usage, replication slot lag, conflicts, and disk.
  • Alert on apply lag and WAL retention headroom.

Security

  • Separate least-privilege roles for replication vs application.
  • Restrict pg_hba.conf to exact subnets/hosts. TLS if crossing networks.

10) Conclusion & Takeaways

  • Default to physical replication for HA and read replicas.
  • Use logical replication for cross-version upgrades and precise blue/green migrations.
  • Practice the cutover. The first time should not be production.

Your action plan this week:

  1. Stand up a test cluster and rehearse both runbooks.
  2. Automate checks for DDL drift and sequence sync.
  3. Script a 1-click cutover that toggles app permissions and connection strings.

Internal Link Ideas (for your blog/wiki)

  • “PostgreSQL Indexing Deep Dive: B-tree vs BRIN vs GIN”
  • “Designing Idempotent Migrations with Sqitch/Flyway”
  • “Connection Pooling: PgBouncer Transaction Mode vs Session Mode”
  • “Disaster Recovery: RPO/RTO Targets and WAL Archival to S3”
  • “Kafka + Postgres CDC: When Logical Decoding Beats Triggers”

Appendix: Handy Commands & Queries

# Create replication role
createuser --replication repl_user
psql -c "alter role repl_user with replication login password '***';"

# WAL size insight
psql -c "select pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0'));"
-- Who’s replicating from me?
select application_name, client_addr, state, sync_state, 
       sent_lsn, write_lsn, flush_lsn, replay_lsn
from pg_stat_replication;

-- Subscription lag
select subname, status, bytes_lag, lags.apply_lag
from pg_stat_subscription lags;

-- Conflicts on replicas
select datname, confl_tablespace, confl_lock, confl_snapshot, confl_bufferpin
from pg_stat_database_conflicts;

Image Prompt

“A clean, modern diagram contrasting physical vs logical replication in PostgreSQL. Show a Blue (primary) environment streaming WAL to read replicas, and a Green environment receiving logical changes via publication/subscription. Include a cutover switch. Minimalistic, high contrast, 3D isometric style.”


Tags

#PostgreSQL #Replication #BlueGreen #ZeroDowntime #LogicalReplication #PhysicalReplication #DatabaseUpgrades #ReadScaling #HighAvailability #DataEngineering

Leave a Reply

Your email address will not be published. Required fields are marked *