Multi-Region, Multi-Account Snowflake: Data Sharing, Replication & Failover — Design Choices, RPO/RTO, and Access Control at Scale

Hook: Your exec just asked, “If us-east-1 goes dark, how fast can Finance be back online, and who can still read PII?” If your Snowflake footprint spans regions and accounts, the right answer is designed, not guessed. This guide lays out the real-world options—data sharing vs. replication vs. failover—and how they map to RPO/RTO and access control requirements without lighting your spend on fire.

Why this matters (in one minute)

Multi-region gives you regulatory coverage, latency control, and disaster recovery.
Multi-account enables strong blast-radius boundaries, autonomy for BU/geo teams, and least-privilege governance.
The hard part: choosing between live sharing, periodic replication, and promotable failover (per database or as grouped objects) while keeping RPO/RTO and RBAC honest.

The building blocks (concepts & architecture)

1) Cross-Account Data Sharing (direct share / listings)

What it is: Provider grants read-only access to specific objects to a consumer account (same or different region/cloud).
Why you use it: Zero-ETL distribution to many consumers; no copy jobs; consumers query “live” provider data.
RPO/RTO: Effectively RPO≈0 for published objects; RTO≈0 for read queries (no promotion step).
Caveat: No write access on consumer; not a DR solution for writes.

Governance fit: Tight, object-level grants; overlay row access / masking for PII; create separate shares per audience.

2) Database Replication (Primary → Secondary)

What it is: Snowflake replicates a read-only secondary database in a target account/region.
Why you use it: Off-region analytics copies; warm standby for DR; minimize egress and operational glue.
RPO/RTO: RPO = replication interval (minutes to hours); RTO = promotion time (seconds to minutes).
Caveat: Secondary is read-only until you promote it. Writes resume after switchover/failover.

3) Failover (Promote Secondary → Primary)

What it is: Turn a replicated secondary into the new primary (planned switchover or unplanned failover).
Why you use it: True continuity for write workloads when a region/account is impaired.
RPO/RTO: Driven by your replication schedule (RPO) and the promotion + connection cutover time (RTO).

4) (Advanced) Grouping Objects for Org-Wide DR

What it is: Treat multiple databases + grants + other selected objects as a unit so they replicate and fail over together (minimizes “oops, forgot that database” incidents).
Why you use it: Consistency across apps that span several databases and shared objects.

Brutal truth: If your DR plan requires engineers to remember five manual steps under stress, you don’t have a DR plan—you have a fantasy. Group things.

Choosing the right pattern (cheat-sheet)

Use case	Pattern	RPO	RTO	Notes
Share curated facts to 10+ consumer accounts	Data Sharing	~0	~0	Perfect for distribution; not for write continuity
Analytics read copies closer to users/BI	Database Replication	Minutes–Hours	Minutes	Consumers read from secondary; primary continues writes
Full DR for write workloads	Replication + Failover	Minutes	Minutes	Test switchover quarterly; automate connection flips
Multi-tenant BUs with strict isolation	Multi-Account + Sharing	~0	~0	Each BU owns compute; central team curates datasets
Regulated PII with geo boundaries	Per-region primaries + sharing by region	~0	~0	Overlay row/mask policies per jurisdiction

Reference architectures (visual mental model)

Hub-and-Spoke (most common):
A central “Hub” account in Region A publishes Shares to regional “Spokes” for read workloads. Critical write databases replicate to a paired DR account in Region B with a defined failover runbook.
Active/Active Read, Active/Passive Write:
Reads served locally via replicated secondaries in each region; a single write primary with failover rights to the secondary region.
Producer → Exchange → Consumers:
Producers publish via listings/shares to many consumers (partners, BUs). DR for producer handled by replication/failover; consumers are unaffected.

Practical SQL snippets (illustrative, trimmed to the essentials)

Note: You need appropriate org-level/admin privileges. Exact options vary by account/edition—treat these as patterns.

A) Share data cross-account (read-only)

-- Provider account
CREATE SHARE sales_share;
GRANT USAGE ON DATABASE SALES TO SHARE sales_share;
GRANT USAGE ON SCHEMA SALES.MART TO SHARE sales_share;
GRANT SELECT ON ALL TABLES IN SCHEMA SALES.MART TO SHARE sales_share;

-- Include future tables to avoid drift
GRANT SELECT ON FUTURE TABLES IN SCHEMA SALES.MART TO SHARE sales_share;

-- Allow a target consumer (by account locator or org name + account)
ALTER SHARE sales_share ADD ACCOUNTS = ('ORG1.ACCOUNT_B_ANALYTICS');

Consumer side:

-- In consumer account
CREATE DATABASE SALES_REMOTE FROM SHARE ORG1.PROVIDER_ACCOUNT.sales_share;
-- Now query like a normal DB (read-only)
SELECT * FROM SALES_REMOTE.MART.ORDERS_DAILY;

B) Replicate a database to another region/account

-- On the primary account
ALTER DATABASE SALES ENABLE REPLICATION TO ACCOUNTS = ('ORG1.DR_ACCOUNT');

-- On the DR account: create secondary and refresh on schedule
CREATE DATABASE SALES_DR AS REPLICA OF ORG1.PRIMARY_ACCOUNT.SALES;
-- Later, pull changes (you’ll automate this)
ALTER DATABASE SALES_DR REFRESH;

Planned switchover / unplanned failover:

-- In DR account, promote the secondary to primary when needed
ALTER DATABASE SALES_DR PRIMARY;   -- promotion step

-- (Re-point roles/warehouses/connections to the new database name/endpoint)

C) Access control overlays (masking & row access)

-- Example: role-based dynamic masking policy for PII
CREATE MASKING POLICY mask_email AS (val STRING) RETURNS STRING ->
  CASE WHEN current_role() IN ('PII_READ') THEN val
       ELSE REGEXP_REPLACE(val, '(^.).+(@.*$)', '\\1***\\2') END;

ALTER TABLE SALES.CUSTOMERS MODIFY COLUMN EMAIL SET MASKING POLICY mask_email;

-- Row-level filter for region-restricted data
CREATE ROW ACCESS POLICY eu_only AS (region STRING) RETURNS BOOLEAN ->
  region = 'EU' OR current_role() IN ('GLOBAL_AUDIT');

ALTER TABLE SALES.CUSTOMERS ADD ROW ACCESS POLICY eu_only ON (REGION);

RPO/RTO by design (not hope)

Data Sharing
- RPO: ~0 for the published objects (consumers see current provider state).
- RTO: ~0 for reads; no write continuity (by design).
Replication (no promotion)
- RPO: Your refresh cadence (e.g., 5–15 minutes).
- RTO: ~0 for read queries on the secondary (already online).
Failover (promotion)
- RPO: Last successful replication.
- RTO: Minutes—dominated by promotion + client routing (JDBC/ODBC URLs, secrets, BI connections, dbt profiles).
- Rule of thumb: If you haven’t rehearsed connection flips and secrets rotation, add 15–60 minutes of chaos to your RTO.

Best practices (what works in production)

Separate blast radii with multi-account.
- Central ORGADMIN/SECURITYADMIN only; each BU gets ACCOUNTADMIN in its own account without org-wide rights.
Use shares for distribution, replication for resilience.
- Don’t use replication to feed 20 consumers—share once, not copy 20×.
Group what fails over together.
- Bundle all databases that form one app boundary; include required grants and reference data. Treat it as a unit.
Automate the boring parts.
- Scheduled REFRESH; pre-created warehouses/roles in DR; IaC (Terraform/Snowflake Provider) for all grants.
Make connection cutover trivial.
- Use DNS aliases or connection managers so BI/dbt/Apps swap endpoints via one variable/secret.
Tighten least privilege before you replicate.
- If prod roles are messy, DR copies the mess. Enforce RBAC, masking, row policies, network policies, session parameters now.
Test quarterly.
- Run a game day: stop replication, promote secondary, re-run SLA queries, validate lineage, and fail back.
Cost control
- Replicate only hot marts; leave raw/bronze as shared or re-creatable. Review warehouse sizes in DR (smaller until an event).

Common pitfalls (don’t do these)

Relying on shares for DR. Shares are fantastic for read distribution, not for write continuity.
Forgetting future grants. New tables won’t be visible if you omit GRANT … ON FUTURE TABLES.
Orphaned policies. Masking/row access policies must exist in the target; replication handles database objects, but check dependencies.
Manual runbooks. Clicking through the UI in an outage is how you miss steps. Script it.
Ignoring metadata consumers. Lineage, quality dashboards, and audits need to follow the cutover too.

Real-world mini example (putting it together)

Goal: US primary writes; EU read latency < 100ms; DR in EU with 15-minute RPO.

US Primary Account

SALES database → ENABLE REPLICATION to EU_DR_ACCOUNT.
Publish sales_share (curated marts) to EU Analytics account.

EU Analytics Account

CREATE DATABASE SALES_REMOTE FROM SHARE ….sales_share for read-only BI.
No local copies = no ETL drift.

EU DR Account

CREATE DATABASE SALES_DR AS REPLICA OF ….SALES.
CRON refresh every 15 minutes; keep small warehouses idled.

Runbook

If US region fails: ALTER DATABASE SALES_DR PRIMARY;
Flip a single DNS alias / secret used by BI and apps → new Snowflake account URL.
Post-incident: perform failback after US recovers and replication is healthy.

Outcome:

EU BI never stopped (reads via share).
Write systems resumed in minutes in EU DR with RPO ≤ 15 min.

Access control at scale (multi-account hygiene)

Org-level: Centralize identity via SSO/SCIM; standardize role names across accounts (APP_READ, APP_WRITE, PII_READ).
Per account:
- Keep ACCOUNTADMIN for ops; no data access.
- App roles inherit from functional roles; grants only to roles (never to users).
- Apply masking/row policies in the data layers (schema closest to consumers).
Secrets & endpoints: Store in a central secrets manager; reference by logical alias so failover is configuration, not code changes.

Internal link ideas (to build your content cluster)

Snowflake RBAC Deep Dive: Roles, Grants, and Policy Layers
Row Access vs. Dynamic Masking: When to Use Which
Designing a Gold/Silver/Bronze Lakehouse in Snowflake
Automating Snowflake with Terraform + CI/CD
Testing DR: Switchover, Failover, and Failback Runbooks

Conclusion & takeaways

Use data sharing for distribution, replication for warm copies, and failover for write continuity.
Define RPO/RTO explicitly and back them with automation (refresh schedules, scripted promotion, endpoint flips).
Bake in RBAC + masking/row policies before you copy data across regions/accounts.
Test quarterly. If you haven’t practiced, your RTO is fiction.

Call to action: Want a concrete blueprint? Tell me your regions, accounts, and target RPO/RTO, and I’ll draft an IaC-backed runbook with SQL + Terraform you can ship this week.

Image prompt (for DALL·E / Midjourney)

“A clean, modern architecture diagram of a multi-region, multi-account Snowflake deployment showing: a primary account in Region A, read shares to analytics accounts, database replication to a DR account in Region B, and a failover promotion path. Minimalistic, high contrast, isometric 3D style with labeled arrows for ‘share’, ‘replication refresh’, and ‘failover’.”

Data/ML Engineer Blog