Streams, Tasks, and Dynamic Tables: The Modern Snowflake ELT Playbook

Build incremental pipelines with zero cron jobs

Introduction — why this matters

Cron-driven data jobs rot. They overrun, collide, and lie when they say “green.” Snowflake gives you three primitives that kill most cron scripts: Streams for change capture, Tasks for dependency-aware orchestration, and Dynamic Tables for continuously maintained transformations. Put them together and you get incremental ELT that’s event-driven, debuggable, and cost-aware—without a farm of brittle schedulers.

This guide shows the end-to-end pattern: ingest → raw → bronze/silver/gold using Streams+Tasks or Dynamic Tables, with no cron math, only data-driven triggers and target lags.

The mental model

Streams = lightweight CDC over a table/view (the “delta feed”).
Tasks = serverless schedulers that chain via AFTER and WHEN (run only when there’s work).
Dynamic Tables (DTs) = continuously maintained tables with a TARGET_LAG (how fresh you want it) that propagate changes through a DAG.

Think: Streams for ingest deltas, Tasks for imperative merges/ops, Dynamic Tables for declarative transforms.

Architecture at a glance

Landing → Raw

Files arrive in an external/internal stage (e.g., S3). Use Snowpipe or COPY INTO to load RAW_ tables.

Incremental Transform
Two idioms—pick one per layer:

Streams + Tasks (explicit control, great for upserts/SCD):

A stream over RAW captures new/changed rows.
A task fires only when the stream has data and runs a MERGE.
Downstream tasks chain with AFTER.

Dynamic Tables (declarative, graph-managed):

Define DTs for BRONZE/SILVER/GOLD with TARGET_LAG and WAREHOUSE.
Snowflake incrementally refreshes them; downstream DTs update automatically.

You can mix them: Streams+Tasks for stateful upserts, DTs for stateless transforms/joins.

Quick comparison

Feature	Streams + Tasks	Dynamic Tables
Control	Fine-grained, imperative SQL	Declarative; Snowflake manages refresh
Latency	Task cadence / event-driven	`TARGET_LAG` (e.g., 1 minute)
Best for	Upserts, SCD, idempotent merges, audit trails	Joins, filters, rollups, materialized views of logic
Orchestration	`AFTER` chain; `WHEN` guards	Automatic DAG from DT dependencies
Cost tuning	Choose warehouse per task; sleep if no work	Choose warehouse per DT; Snowflake prunes work
Pitfalls	Stream consumption semantics; multiple readers	Long join chains; “hidden” compute via cascades

Real example: Ingest → Upsert (Streams + Tasks, zero cron)

1) Raw load (Snowpipe or COPY)

-- Raw landing table
create or replace table raw_orders (
  order_id varchar,
  customer_id varchar,
  ts timestamp_ntz,
  amount number(12,2),
  _load_file varchar,
  _ingested_at timestamp_ntz default current_timestamp()
);

-- Example COPY (Snowpipe auto-ingest recommended in production)
copy into raw_orders
from @ingest_stage/orders/
file_format = (type = csv field_optionally_enclosed_by='"' skip_header=1)
pattern='.*\.csv';

2) Create a stream on raw

create or replace stream raw_orders_stream
on table raw_orders
append_only = false;  -- capture inserts + updates + deletes

3) Upsert into bronze via a Task that runs only when there’s work

create or replace table bronze_orders like raw_orders;

create or replace task t_upsert_bronze_orders
  warehouse = etl_xs
  -- No schedule: we’ll “chain-trigger” from a lightweight root,
  -- and guard with WHEN to avoid idle runs
  when system$stream_has_data('RAW_ORDERS_STREAM')
as
merge into bronze_orders b
using (
  select order_id, customer_id, ts, amount
  from raw_orders_stream
) d
on b.order_id = d.order_id
when matched then update set
  b.customer_id = d.customer_id,
  b.ts          = d.ts,
  b.amount      = d.amount
when not matched then insert (order_id, customer_id, ts, amount)
values (d.order_id, d.customer_id, d.ts, d.amount);

4) Minimal root “tick” task (can be very infrequent)

We still need some heartbeat to evaluate WHEN. Make it infrequent and cheap—no cron tuning, just a guard.

-- Fires every 5 minutes but does no work itself
create or replace task t_tick
  schedule = '5 minute'
  warehouse = etl_xs
as
call system$wait(1); -- a no-op; enables evaluation of downstream WHEN clauses

-- Chain the upsert after the tick
alter task t_upsert_bronze_orders add after t_tick;

-- Turn on the DAG
alter task t_tick resume;
alter task t_upsert_bronze_orders resume;

Why this is “zero cron jobs” in practice: you’re not hand-scheduling business logic. The tick just wakes the graph; WHEN system$stream_has_data ensures no compute if there’s no data. Make the tick 1, 5, or 15 minutes—whatever your freshness SLO needs.

Real example: Declarative pipeline (Dynamic Tables)

1) Bronze as a DT over raw

create or replace dynamic table dt_bronze_orders
  target_lag = '1 minute'
  warehouse  = etl_xs
as
select
  order_id,
  customer_id,
  ts,
  amount
from raw_orders;

2) Silver: joins + cleaning

create or replace dynamic table dt_silver_orders
  target_lag = '2 minutes'
  warehouse  = etl_small
as
select
  b.order_id,
  b.customer_id,
  date_trunc('day', b.ts) as order_date,
  coalesce(nullif(b.amount,0), 0) as amount
from dt_bronze_orders b;

3) Gold: daily aggregates

create or replace dynamic table dt_gold_daily_sales
  target_lag = '5 minutes'
  warehouse  = etl_small
as
select
  order_date,
  sum(amount) as total_sales,
  count(*)    as orders
from dt_silver_orders
group by order_date;

Turn them on:

alter dynamic table dt_bronze_orders resume;
alter dynamic table dt_silver_orders resume;
alter dynamic table dt_gold_daily_sales resume;

Result: Snowflake maintains the DAG automatically; changes in raw_orders roll forward to gold within your TARGET_LAG windows.

Query patterns & indexing (Snowflake-style)

Snowflake doesn’t have user-managed indexes; performance hinges on:

Clustering for large tables: alter table bronze_orders cluster by (order_date, customer_id);
Micro-partition pruning: design partition-friendly keys (e.g., dates, tenants).
Result caching and Warehouse right-sizing for bursty loads.

SCD Type 2 with Streams + Tasks (concise pattern)

-- Assume dim_customer_scd2(history table) and a stream on staging_customers
create or replace stream stg_customers_stream on table staging_customers;

create or replace task t_dim_customer_scd2
  warehouse = etl_xs
  when system$stream_has_data('STG_CUSTOMERS_STREAM')
as
merge into dim_customer_scd2 d
using (
  select * from stg_customers_stream
) s
on d.customer_id = s.customer_id
and d.is_current = true
when matched and (hash(d.name, d.segment) <> hash(s.name, s.segment)) then
  update set d.is_current = false, d.valid_to = current_timestamp()
when not matched then
  insert (customer_id, name, segment, is_current, valid_from, valid_to)
  values (s.customer_id, s.name, s.segment, true, current_timestamp(), null);

Monitoring & observability

TASK_HISTORY() & INFORMATION_SCHEMA.TASK_HISTORY for outcomes/duration.
SHOW DYNAMIC TABLES and SYSTEM$EXPLAIN_DYNAMIC_TABLE_REFRESH_PLAN to inspect DT DAG and work.
Add telemetry tables: write a row per run with row counts from SYSTEM$STREAM_HAS_DATA, RESULT_SCAN() diagnostics, and QUERY_HISTORY() links.

Cost control playbook

Use tiny warehouses (XS) for frequent, quick tasks; scale up only where joins/materializations demand.
Prefer WHEN system$stream_has_data and DT TARGET_LAG over tight schedules.
Cluster only the big boys, and recluster on demand, not reflexively.
Keep stream retention aligned to replay windows (default 14 days; verify for your account).
Avoid multiple consumers of the same stream; fan-out with tables or views, not duplicate reads.

Common pitfalls (and fixes)

Consuming a stream twice → the second consumer sees nothing. Fix: Single consumer reads the stream, writes a staging table for others.
Orphaned streams with retention expired → deltas lost. Fix: Monitor STALE_AFTER metrics; don’t pause upstream for long.
DT surprise compute from long chains → Fix: Set per-DT warehouses, widen TARGET_LAG where freshness is overkill, break up expensive joins.
Non-idempotent merges → duplicates on retries. Fix: ensure deterministic MERGE keys and guard with de-dup staging CTEs.
Warehouse thrash (auto-suspend/auto-resume flapping) → Fix: tune suspend to 60–120s on chatty workloads.

Putting it together (reference blueprint)

Option A: Event-driven ELT with Streams + Tasks

Snowpipe loads → RAW_*.
STREAM on raw → WHEN system$stream_has_data tasks.
Chain tasks with AFTER into bronze/silver/gold merges.
One tiny tick task wakes the graph; business logic only runs if there’s work.

Option B: Declarative ELT with Dynamic Tables

Define DTs for bronze/silver/gold with explicit TARGET_LAG.
Snowflake maintains DAG and incremental refresh.
Use Streams+Tasks selectively where stateful upserts/SCD are required.

Internal link ideas (for your site)

“External Tables & Snowpipe: Designing the Landing Zone”
“SCD Type 2 Patterns in Snowflake (Streams vs. Dynamic Tables)”
“Cost Tuning Snowflake Warehouses for ELT”
“Idempotent SQL: Writing Safe MERGE Statements”
“Observability: Query History, Task History, and DT Refresh Plans”

Conclusion + Takeaways

Ditch cron tinkering. Anchor your Snowflake pipelines to data and freshness targets:

Streams expose precisely what changed.
Tasks run only when work exists and chain cleanly.
Dynamic Tables maintain transformation graphs to an SLO you pick.

You’ll ship faster, debug less, and keep compute proportional to actual changes.

Image prompt

“A clean, modern data architecture diagram showing Snowflake stages ingesting to RAW, a Stream feeding a Task that MERGEs to BRONZE, and a parallel path of Dynamic Tables (BRONZE → SILVER → GOLD) with target lags. Minimalistic, high contrast, 3D isometric style, labeled arrows, Snowflake logo accents.”

Data/ML Engineer Blog