Snowflake Procedures & Tasks: The Reliable Automation Backbone for Your Data Platform
Hook: You’ve got hourly ingestions, daily dimensional upserts, and a weekly backfill that only runs if yesterday’s load succeeded. If you’re still wiring this together in your orchestrator only, you’re leaving reliability (and visibility) on the table. In Snowflake, procedures do the work and tasks schedule and chain it—right where the data lives.
Why this matters (fast context)
- Closer to the data = fewer moving parts. No flaky network hops, no external runners just to execute DML.
- Built-in DAGs and lineage in Snowflake. You can schedule, chain, and observe with native primitives.
- Polyglot execution. Use SQL (Snowflake Scripting), JavaScript, or Python (Snowpark) for the best-fit job.
- Idempotent, transactional ELT. Wrap units of work with atomicity and retries.
Core concepts at a glance
- Stored Procedures: Reusable functions that can contain control flow, error handling, and DML/DDL.
- Languages: SQL (Snowflake Scripting), JavaScript, Python (Snowpark).
- Execution rights:
CALLER'S RIGHTSvsOWNER'S RIGHTSmatters for privileges and object access.
- Tasks: Schedules (CRON or time interval) and DAG chaining via
AFTER. Each task runs a single SQL statement (oftenCALL my_proc(...)) using a warehouse or serverless compute.
Architecture: How Procedures and Tasks fit together
- Ingestion lands raw data (stages/files → external/internal tables).
- Stream tracks changes (insert-only or CDC).
- Procedure does a targeted merge/upsert into curated tables, with validation and metrics.
- Task schedules that procedure and chains dependencies:
load_raw → transform_dim → transform_fact → quality_gate → publish.
Visually, think of a Snowflake-native DAG: each node is a TASK calling a PROCEDURE. Success/failure is recorded for audit and re-runs.
When to choose which procedure language
| Criterion | SQL (Snowflake Scripting) | JavaScript | Python (Snowpark) |
|---|---|---|---|
| Primary workload | DDL/DML-heavy, set-based ops | Mixed control-flow + SQL | DataFrames, UDFs, Python libs |
| Learning curve | Easiest for SQL folks | Moderate | Moderate (package mgmt) |
| Perf profile | Great for set-based SQL | Great for orchestrating SQL | Great for data transforms in DF API |
| External deps | None | None (usually) | Requires PACKAGES, version mgmt |
| Use-cases | MERGE, loops, error handling in SQL | Multi-step jobs, conditional SQL | Complex transforms, ML prep, feature eng |
Rule of thumb: If it’s mostly set-based SQL, use SQL procedures. If you need flexible control flow around SQL, JavaScript is fine. If you want DataFrame APIs or Python libs inside Snowflake, use Snowpark Python.
Concrete examples
1) Snowflake Scripting Procedure (SQL) — Idempotent upsert with quality checks
CREATE OR REPLACE PROCEDURE DW.LOAD_DIM_CUSTOMER()
RETURNS VARCHAR
LANGUAGE SQL
EXECUTE AS OWNER
AS
$$
DECLARE
v_rows_loaded NUMBER := 0;
BEGIN
-- Upsert from stream into dimension
MERGE INTO DW.DIM_CUSTOMER d
USING (
SELECT * FROM STG.CUSTOMER_STREAM -- stream on raw/customer
) s
ON d.CUSTOMER_ID = s.CUSTOMER_ID
WHEN MATCHED THEN UPDATE SET
EMAIL = s.EMAIL,
UPDATED_AT = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN INSERT (
CUSTOMER_ID, EMAIL, CREATED_AT, UPDATED_AT
) VALUES (
s.CUSTOMER_ID, s.EMAIL, CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP()
);
v_rows_loaded := SQLROWCOUNT;
-- Simple quality gate
IF v_rows_loaded = 0 THEN
RETURN 'No changes found — nothing to load.';
END IF;
-- Record a metric for observability
INSERT INTO DW.METRICS_LOAD_AUDIT(JOB_NAME, ROWS_AFFECTED, RUNTIME_TS)
VALUES ('LOAD_DIM_CUSTOMER', v_rows_loaded, CURRENT_TIMESTAMP());
RETURN 'Loaded rows: ' || v_rows_loaded;
EXCEPTION
WHEN OTHER THEN
INSERT INTO DW.METRICS_LOAD_AUDIT(JOB_NAME, ERROR, RUNTIME_TS)
VALUES ('LOAD_DIM_CUSTOMER', :SQLERRM, CURRENT_TIMESTAMP());
RAISE;
END;
$$;
2) JavaScript Procedure — Orchestrate conditional logic
CREATE OR REPLACE PROCEDURE DW.RUN_FACT_LOAD()
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS OWNER
AS
$$
var sqlCmd;
sqlCmd = "CALL DW.LOAD_DIM_CUSTOMER()";
snowflake.execute({sqlText: sqlCmd});
sqlCmd = "SELECT COUNT(*) AS C FROM STG.SALES_STREAM";
var rs = snowflake.execute({sqlText: sqlCmd});
rs.next();
var pending = rs.getColumnValue('C');
if (pending > 0) {
snowflake.execute({sqlText: "CALL DW.LOAD_FACT_SALES()"});
return "FACT load executed. Pending rows: " + pending;
} else {
return "No sales changes; FACT load skipped.";
}
$$;
3) Snowpark Python Procedure — DataFrame transform with package pinning
CREATE OR REPLACE PROCEDURE DW.AGG_SESSIONS_DAILY()
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.10'
PACKAGES = ('snowflake-snowpark-python')
EXECUTE AS OWNER
AS
$$
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, sum as ssum
def main(session: Session) -> str:
df = session.table("RAW.SESSIONS")
agg = (df.group_by(col("event_date"))
.agg(ssum(col("duration")).alias("total_duration")))
agg.write.save_as_table("DW.SESSIONS_DAILY", mode="overwrite")
return "Aggregated daily sessions."
$$;
Scheduling & chaining with Tasks
Basic time-based schedule (CRON)
CREATE OR REPLACE TASK DW.TASK_LOAD_DIM_CUSTOMER
WAREHOUSE = ETL_WH
SCHEDULE = 'USING CRON 0 * * * * UTC' -- top of each hour
AS
CALL DW.LOAD_DIM_CUSTOMER();
ALTER TASK DW.TASK_LOAD_DIM_CUSTOMER RESUME;
Build a DAG with AFTER
CREATE OR REPLACE TASK DW.TASK_RUN_FACT_LOAD
WAREHOUSE = ETL_WH
AFTER DW.TASK_LOAD_DIM_CUSTOMER
AS
CALL DW.RUN_FACT_LOAD();
-- Enable the chain
ALTER TASK DW.TASK_LOAD_DIM_CUSTOMER RESUME;
ALTER TASK DW.TASK_RUN_FACT_LOAD RESUME;
Serverless Tasks (no warehouse selection)
CREATE OR REPLACE TASK DW.TASK_AGG_SESSIONS_DAILY
SCHEDULE = 'USING CRON 5 2 * * * UTC'
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
AS
CALL DW.AGG_SESSIONS_DAILY();
Notes
- Use serverless when you don’t want to manage a warehouse; Snowflake bills per-second for compute used.
- For DAG enable/disable, you can resume/suspend root + children or use helper system procedures (e.g., enable/disable dependents) if available in your account.
Observability & operations
- Run history
SNOWFLAKE.ACCOUNT_USAGE.TASK_HISTORY(orINFORMATION_SCHEMA.TASK_HISTORY) for executions, states, error messages.- Procedure calls appear in
QUERY_HISTORYwith text likeCALL DW.PROC(...).
- Manual re-run
EXECUTE TASK(for immediate run) orCALLthe procedure directly.
- Pausing the DAG
ALTER TASK ... SUSPEND;(andRESUME;). Suspend the root to halt downstream tasks.
- Least surprise
- Configure
NOTIFY_ON_FAILUREvia alerts or integrate with external observability (webhooks, Slack, etc.) using your orchestrator.
- Configure
Best practices (what actually works in production)
Architecture
- Thin tasks, fat procedures. Keep tasks as simple callers; put logic in procedures where you can test and reuse.
- Small transactional units. Prefer multiple targeted procedures over one giant “do everything” job; easier retries and lower blast radius.
- Idempotency first. Use Streams or natural keys to ensure re-runs don’t double-count. Write MERGEs, not blind INSERTs.
- Use
OWNER'S RIGHTSfor platform-owned pipelines to avoid per-caller grant headaches. For self-service utilities, useCALLER'S RIGHTS.
Performance & cost
- Choose warehouse sizes intentionally; many short tasks → smaller size + auto-suspend.
- For spiky schedules, serverless tasks simplify ops and can be cheaper.
- Batch small steps into one procedure call to amortize startup overhead.
Reliability
- Add row-count assertions and write to an audit table per job (input rows, affected rows, duration, watermark).
- Put quality gates (e.g., “fail if < X rows on weekday”) to catch silent upstream issues.
- Avoid circular dependencies in
AFTERchains; keep DAGs shallow and legible.
Security
- Separate execution roles (tasks/procs) from developer roles.
- Use
EXECUTE AS OWNER+ dedicated role to centralize privileges. - Keep secrets in Snowflake secrets management; don’t embed credentials in code.
Deployability
- Treat procedures as code: version them, review PRs, run integration tests that
CALLprocs in a dev schema. - Pin Snowpark
RUNTIME_VERSIONandPACKAGESfor reproducibility. - Use
CREATE OR REPLACEwith care in prod; a two-step deploy (new name → cutover) can avoid breaking calls.
Common pitfalls
- Warehouse misconfiguration (or none for non-serverless tasks) → task never runs.
- Wrong rights mode → proc works in dev but fails in prod for other roles.
- Non-idempotent loads → duplicates on retries.
- Long transactions → lock contention and timeouts; chunk large merges.
- Hidden failures when tasks are suspended after errors; always check history or alerting.
A simple, production-ready DAG you can copy
-- 1) Root task: hourly dimension load
CREATE OR REPLACE TASK DW.T_DIM_CUSTOMER
WAREHOUSE = ETL_WH
SCHEDULE = 'USING CRON 0 * * * * UTC'
AS
CALL DW.LOAD_DIM_CUSTOMER();
-- 2) Downstream fact load only if dim succeeded
CREATE OR REPLACE TASK DW.T_FACT_SALES
WAREHOUSE = ETL_WH
AFTER DW.T_DIM_CUSTOMER
AS
CALL DW.RUN_FACT_LOAD();
-- 3) Daily aggregates via serverless
CREATE OR REPLACE TASK DW.T_AGG_SESSIONS_DAILY
AFTER DW.T_FACT_SALES
USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE = 'XSMALL'
AS
CALL DW.AGG_SESSIONS_DAILY();
ALTER TASK DW.T_DIM_CUSTOMER RESUME;
ALTER TASK DW.T_FACT_SALES RESUME;
ALTER TASK DW.T_AGG_SESSIONS_DAILY RESUME;
Internal link ideas (for your site)
- Streams & Change Data Capture in Snowflake (how to build append-only patterns)
- Snowpark Python: Packaging, UDFs, and Stored Procedures
- Data Quality Gates with SQL: Assertions & Audit Tables
- Serverless vs Warehouse-Backed Tasks: Cost & Performance
- Designing Idempotent MERGE Patterns for ELT
Summary & call to action
Bottom line: Put your routine ELT where your data lives. Use procedures to encapsulate idempotent, testable logic and tasks to schedule and chain it into DAGs. You’ll get fewer gremlins, faster recovery, and better observability—without duct-taping external runners.
Want a quick review of your current Snowflake DAG or help converting Python-only jobs into Snowpark procedures with serverless tasks? Share your schema and run cadence—I’ll critique it ruthlessly and propose a hardened design.
Image prompt
“A clean, modern data architecture diagram showing a Snowflake-native DAG: tasks (nodes) triggering stored procedures across raw → stage → dim → fact layers, with Streams feeding changes and audit tables capturing metrics — minimalistic, high contrast, isometric style.”
Tags
#Snowflake #DataEngineering #ELT #Snowpark #StoredProcedures #Tasks #DataPipelines #SQL #Scalability #Reliability




