Snowflake Cost-Saving Tactics

Snowflake Cost-Saving Tactics: Real SQL Techniques Using Dynamic Date Ranges and Partition Pruning

How to Stop Paying for Data You Don’t Need

Snowflake’s elasticity is a double-edged sword. Without discipline, costs spiral as queries scan terabytes of irrelevant data. But with smart SQL design, you can turn Snowflake’s architecture into a cost-saving superpower. Let’s dive into the exact techniques.

1. Dynamic Date Ranges: The $10M Mistake Most Teams Make

Hardcoding date filters (e.g., WHERE date BETWEEN ‘2024-01-01’ AND ‘2024-03-31’) is a silent budget killer. Why?

Queries break when dates change.
Full-table scans occur if dates aren’t aligned with partitions.
Engineers waste hours updating code.

The Fix: Dynamic Date Logic in SQL

Use Snowflake’s date functions to automate range selection.

Example 1: Rolling 7-Day Window

SELECT 
  user_id, 
  SUM(revenue) AS total_revenue  
FROM fact_sales  
WHERE sale_date >= CURRENT_DATE() - 7  -- Automatically adjusts daily  
  AND sale_date < CURRENT_DATE()  
GROUP BY 1;

Impact: Scans only 7 days of data instead of entire history.

Example 2: Month-to-Date (MTD) Analysis

SELECT  
  product_id,  
  COUNT(*) AS orders  
FROM orders  
WHERE order_date >= DATE_TRUNC('MONTH', CURRENT_DATE())  -- First day of current month  
GROUP BY 1;

Impact: Avoids scanning prior months’ data.

2. Partition Pruning: Snowflake’s Secret Weapon

Snowflake auto-partitions data into micro-partitions, but it can’t prune what it can’t see. Your SQL must give Snowflake hints to skip irrelevant partitions.

How to Force Pruning

Step 1: Cluster Tables by Date

-- Create a clustered table  
CREATE TABLE fact_sales  
CLUSTER BY (sale_date)  
AS  
SELECT * FROM raw_sales;

Clustering by sale_date physically groups data, making pruning efficient.

Step 2: Filter on Cluster Keys

SELECT *  
FROM fact_sales  
WHERE sale_date = '2024-05-15';  -- Snowflake skips all other partitions

Step 3: Verify with EXPLAIN

Check if pruning worked using EXPLAIN:

EXPLAIN  
SELECT ...  
WHERE sale_date >= CURRENT_DATE - 7;

Look for “Partition pruning: 1000/1000 partitions” in the plan.

3. Advanced Tactics: Subquery Pruning & Semi-Structured Data

Tactic 1: Prune with Subqueries

Instead of scanning entire tables, filter early:

WITH active_users AS (  
  SELECT user_id  
  FROM dim_users  
  WHERE last_login_date >= CURRENT_DATE - 30  
)  
SELECT  
  u.user_id,  
  SUM(s.revenue)  
FROM fact_sales s  
JOIN active_users u ON s.user_id = u.user_id  -- Prunes sales table during join  
WHERE s.sale_date >= CURRENT_DATE - 30;

Why It Works: Snowflake prunes fact_sales using both sale_date and the join.

Tactic 2: Semi-Structured Data Optimization

For JSON/XML data, use LATERAL FLATTEN with filters:

SELECT  
  event_data:user_id::STRING AS user_id,  
  event_data:amount::FLOAT AS amount  
FROM raw_events  
WHERE event_date >= CURRENT_DATE - 3  
  AND PARSE_JSON(event_data):amount > 100;  -- Filter before flattening

Impact: Filters JSON rows before flattening, reducing compute.

4. Common Pitfalls (and How to Avoid Them)

Pitfall 1: Functions on Partitioned Columns

This disables pruning:

SELECT *  
FROM fact_sales  
WHERE YEAR(sale_date) = 2024;  -- Snowflake can’t map this to partitions

Fix:

WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31';

Pitfall 2: Over-Partitioning

Too many partitions slow metadata lookups. Cluster by date + high-cardinality columns (e.g., (sale_date, region_id)).

5. Real-World Savings: A Case Study

A retail client reduced monthly Snowflake costs from 85kto85kto49k by:

Replacing static date filters with dynamic ranges.
Clustering 10 TB of sales data by sale_date.
Rewriting 60+ reports to use partition-aware joins.

Key Metrics:

Average query scan size: ↓ 72%
Warehouse runtime: ↓ 58%

Your Action Plan

Audit Queries: Use QUERY_HISTORY to find full-table scans:
Implement Dynamic Dates: Refactor hardcoded filters.
Cluster & Verify: Recluster tables and check EXPLAIN plans.

Final Thought

Snowflake’s pay-as-you-go model rewards the efficient. By mastering dynamic ranges and pruning, you’re not just saving money—you’re building a culture of data engineering excellence.

What’s your top Snowflake cost-saving hack? Share in the comments!

Hashtags

Data/ML Engineer Blog