Skip to content
  • Wednesday, 9 July 2025
  • 11:39:57 PM
  • Follow Us
Data Engineer

Data/ML Engineer Blog

  • Home
  • AL/ML Engineering
    • AWS AI/ML Services
    • Compute & Deployment
    • Core AI & ML Concepts
      • Data Processing & ETL
      • Decision Trees
      • Deep Learning
      • Generative AI
      • K-Means Clustering
      • Machine Learning
      • Neural Networks
      • Reinforcement Learning
      • Supervised Learning
      • Unsupervised Learning
    • Database & Storage Services
    • Emerging AI Trends
    • Evaluation Metrics
    • Industry Applications of AI
    • MLOps & DevOps for AI
    • Model Development & Optimization
    • Prompting Techniques
      • Adversarial Prompting
      • Chain-of-Thought Prompting
      • Constitutional AI Prompting
      • Few-Shot Prompting
      • Instruction Prompting
      • Multi-Agent Prompting
      • Negative Prompting
      • Prompt Templates
      • ReAct Prompting
      • Retrieval-Augmented Generation (RAG)
      • Self-Consistency Prompting
      • Zero-Shot Prompting
    • Security & Compliance
      • AWS KMS
      • AWS Macie
      • Azure Key Vault
      • Azure Purview
      • BigID
      • Cloud DLP
      • Collibra Privacy & Risk
      • HashiCorp Vault
      • Immuta
      • Okera
      • OneTrust
      • Privacera
      • Satori
  • Data Engineering
    • Cloud Platforms & Services
      • Alibaba Cloud
      • AWS (Amazon Web Services)
      • Azure Microsoft
      • Google Cloud Platform (GCP)
      • IBM Cloud
      • Oracle Cloud
    • Containerization & Orchestration
      • Amazon EKS
      • Apache Oozie
      • Azure Kubernetes Service (AKS)
      • Buildah
      • Containerd
      • Docker
      • Docker Swarm
      • Google Kubernetes Engine (GKE)
      • Kaniko
      • Kubernetes
      • Podman
      • Rancher
      • Red Hat OpenShift
    • Data Catalog & Governance
      • Amundsen
      • Apache Atlas
      • Apache Griffin
      • Atlan
      • AWS Glue
      • Azure Purview
      • Collibra
      • Databand
      • DataHub
      • Deequ
      • Google Data Catalog
      • Google Dataplex
      • Great Expectations
      • Informatica
      • Marquez
      • Monte Carlo
      • OpenLineage
      • OpenMetadata
      • Soda SQL
      • Spline
    • Data Ingestion & ETL
      • Apache Kafka Connect
      • Apache NiFi
      • Census
      • Confluent Platform
      • Debezium
      • Fivetran
      • Hightouch
      • Informatica PowerCenter
      • Kettle
      • Matillion
      • Microsoft SSIS
      • Omnata
      • Polytomic
      • Stitch
      • StreamSets
      • Striim
      • Talend
    • Data Lakes & File Standards
      • Amazon S3
      • Apache Arrow
      • Apache Avro
      • Apache Iceberg
      • Azure Data Lake Storage
      • CSV
      • Databricks Delta Lake
      • Dremio
      • Dremio
      • Feather
      • Google Cloud Storage
      • JSON
      • ORC
      • Parquet
    • Data Platforms
      • Cloud Data Warehouses
        • ClickHouse
        • Databricks
        • Snowflake
          • Internal and External Staging in Snowflake
          • Network Rules in Snowflake
          • Procedures + Tasks
          • Snowflake administration and configuration
          • Snowflake Cloning
      • Cloudera Data Platform
      • NoSQL Databases
      • On-Premises Data Warehouses
        • DuckDB
      • Relational Databases
        • Amazon Aurora
        • Azure SQL Database
        • Google Cloud SQL
        • MariaDB
        • Microsoft SQL Server
        • MySQL
        • Oracle Database
        • PostgreSQL
    • Data Streaming & Messaging
      • ActiveMQ
      • Aiven for Kafka
      • Amazon Kinesis
      • Amazon MSK
      • Apache Kafka
      • Apache Pulsar
      • Azure Event Hubs
      • Confluent Platform
      • Google Pub/Sub
      • IBM Event Streams
      • NATS
      • Protocol Buffers
      • RabbitMQ
      • Red Hat AMQ Streams
    • Data Warehouse Design
      • Data Governance and Management (DGaM)
        • Compliance Requirements
        • Data Lineage
        • Data Retention Policies
        • Data Stewardship
        • Master Data Management
      • Data Warehouse Architectures (DWA)
        • Enterprise Data Warehouse vs. Data Marts
        • Hub-and-Spoke Architecture
        • Logical vs. Physical Data Models
        • ODS (Operational Data Store)
        • Staging Area Design
      • Data Warehouse Schemas (DWS)
        • Data Vault
        • Galaxy Schema (Fact Constellation)
        • Inmon (Normalized) Approach
        • Kimball (Dimensional) Approach
        • Snowflake Schema
        • Star Schema
      • Database Normalization
      • Dimensional Modeling Techniques (DMT)
        • Bridge Tables
        • Conformed Dimensions
        • Degenerate Dimensions
        • Junk Dimensions
        • Mini-Dimensions
        • Outrigger Dimensions
        • Role-Playing Dimensions
      • ETL/ELT Design Patterns
        • Change Data Capture (CDC)
        • Data Pipeline Architectures
        • Data Quality Management
        • Error Handling
        • Metadata Management
      • Fact Table Design Patterns(FTDP)
        • Accumulating Snapshot Fact Tables
        • Aggregate Fact Tables
        • Factless Fact Tables
        • Periodic Snapshot Fact Tables
        • Transaction Fact Tables
      • Modern Data Warehouse Concepts (MDWC)
        • Data Lakehouse
        • Medallion Architecture
        • Multi-modal Persistence
        • Polyglot Data Processing
        • Real-time Data Warehousing
      • Performance Optimization (PO)
        • Compression Techniques
        • Indexing Strategies
        • Materialized Views
        • Partitioning
        • Query Optimization
      • Slowly Changing Dimensions(SCD)
        • SCD Type 0
        • SCD Type 1
        • SCD Type 2
        • SCD Type 3
        • SCD Type 4
        • SCD Type 6
        • SCD Type 7
    • Distributed Data Processing
      • Apache Beam
      • Apache Flink
      • Apache Hadoop
      • Apache Hive
      • Apache Pig
      • Apache Pulsar
      • Apache Samza
      • Apache Sedona
      • Apache Spark
      • Apache Storm
      • Presto/Trino
      • Spark Streaming
    • Infrastructure as Code & Deployment
      • Ansible
      • Argo CD
      • AWS CloudFormation
      • Azure Resource Manager Templates
      • Chef
      • CircleCI
      • GitHub Actions
      • GitLab CI/CD
      • Google Cloud Deployment Manager
      • Jenkins
      • Pulumi
      • Puppet: Configuration Management Tool for Modern Infrastructure
      • Tekton
      • Terraform
      • Travis CI
    • Monitoring & Logging
      • AppDynamics
      • Datadog
      • Dynatrace
      • ELK Stack
      • Fluentd
      • Graylog
      • Loki
      • Nagios
      • New Relic
      • Splunk
      • Vector
      • Zabbix
    • Operational Systems (OS)
      • Ubuntu
        • Persistent Tasks on Ubuntu
      • Windows
    • Programming Languages
      • Go
      • Java
      • Julia
      • Python
        • Dask
        • NumPy
        • Pandas
        • PySpark
        • SQLAlchemy
      • R
      • Scala
      • SQL
    • Visualization Tools
      • Grafana
      • Kibana
      • Looker
      • Metabase
      • Mode
      • Power BI
      • QuickSight
      • Redash
      • Superset
      • Tableau
    • Workflow Orchestration
      • Apache Airflow
      • Apache Beam Python SDK
      • Azkaban
      • Cron
      • Dagster
      • Dagster Change
      • DBT (data build tool)
      • Jenkins Job Builder
      • Keboola
      • Luigi
      • Prefect
      • Rundeck
      • Temporal
  • Home
  • Snowflake Cost-Saving Tactics
Data Snowflake

Snowflake Cost-Saving Tactics

Alex Apr 10, 2025 0
Snowflake Cost-Saving Tactics: Real SQL Techniques Using Dynamic Date Ranges and Partition Pruning

Snowflake Cost-Saving Tactics: Real SQL Techniques Using Dynamic Date Ranges and Partition Pruning

How to Stop Paying for Data You Don’t Need

Snowflake’s elasticity is a double-edged sword. Without discipline, costs spiral as queries scan terabytes of irrelevant data. But with smart SQL design, you can turn Snowflake’s architecture into a cost-saving superpower. Let’s dive into the exact techniques.


1. Dynamic Date Ranges: The $10M Mistake Most Teams Make

Hardcoding date filters (e.g., WHERE date BETWEEN ‘2024-01-01’ AND ‘2024-03-31’) is a silent budget killer. Why?

  • Queries break when dates change.
  • Full-table scans occur if dates aren’t aligned with partitions.
  • Engineers waste hours updating code.

The Fix: Dynamic Date Logic in SQL

Use Snowflake’s date functions to automate range selection.

Example 1: Rolling 7-Day Window

SELECT 
  user_id, 
  SUM(revenue) AS total_revenue  
FROM fact_sales  
WHERE sale_date >= CURRENT_DATE() - 7  -- Automatically adjusts daily  
  AND sale_date < CURRENT_DATE()  
GROUP BY 1; 

Impact: Scans only 7 days of data instead of entire history.

Example 2: Month-to-Date (MTD) Analysis

SELECT  
  product_id,  
  COUNT(*) AS orders  
FROM orders  
WHERE order_date >= DATE_TRUNC('MONTH', CURRENT_DATE())  -- First day of current month  
GROUP BY 1; 

Impact: Avoids scanning prior months’ data.


2. Partition Pruning: Snowflake’s Secret Weapon

Snowflake auto-partitions data into micro-partitions, but it can’t prune what it can’t see. Your SQL must give Snowflake hints to skip irrelevant partitions.

How to Force Pruning

Step 1: Cluster Tables by Date

-- Create a clustered table  
CREATE TABLE fact_sales  
CLUSTER BY (sale_date)  
AS  
SELECT * FROM raw_sales; 

Clustering by sale_date physically groups data, making pruning efficient.

Step 2: Filter on Cluster Keys

SELECT *  
FROM fact_sales  
WHERE sale_date = '2024-05-15';  -- Snowflake skips all other partitions 

Step 3: Verify with EXPLAIN

Check if pruning worked using EXPLAIN:

EXPLAIN  
SELECT ...  
WHERE sale_date >= CURRENT_DATE - 7; 

Look for “Partition pruning: 1000/1000 partitions” in the plan.


3. Advanced Tactics: Subquery Pruning & Semi-Structured Data

Tactic 1: Prune with Subqueries

Instead of scanning entire tables, filter early:

WITH active_users AS (  
  SELECT user_id  
  FROM dim_users  
  WHERE last_login_date >= CURRENT_DATE - 30  
)  
SELECT  
  u.user_id,  
  SUM(s.revenue)  
FROM fact_sales s  
JOIN active_users u ON s.user_id = u.user_id  -- Prunes sales table during join  
WHERE s.sale_date >= CURRENT_DATE - 30; 

Why It Works: Snowflake prunes fact_sales using both sale_date and the join.

Tactic 2: Semi-Structured Data Optimization

For JSON/XML data, use LATERAL FLATTEN with filters:

SELECT  
  event_data:user_id::STRING AS user_id,  
  event_data:amount::FLOAT AS amount  
FROM raw_events  
WHERE event_date >= CURRENT_DATE - 3  
  AND PARSE_JSON(event_data):amount > 100;  -- Filter before flattening 

Impact: Filters JSON rows before flattening, reducing compute.


4. Common Pitfalls (and How to Avoid Them)

Pitfall 1: Functions on Partitioned Columns

This disables pruning:

SELECT *  
FROM fact_sales  
WHERE YEAR(sale_date) = 2024;  -- Snowflake can’t map this to partitions 

Fix:

WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'; 

Pitfall 2: Over-Partitioning

Too many partitions slow metadata lookups. Cluster by date + high-cardinality columns (e.g., (sale_date, region_id)).


5. Real-World Savings: A Case Study

A retail client reduced monthly Snowflake costs from 85kto85kto49k by:

  1. Replacing static date filters with dynamic ranges.
  2. Clustering 10 TB of sales data by sale_date.
  3. Rewriting 60+ reports to use partition-aware joins.

Key Metrics:

  • Average query scan size: ↓ 72%
  • Warehouse runtime: ↓ 58%

Your Action Plan

  1. Audit Queries: Use QUERY_HISTORY to find full-table scans:
  2. Implement Dynamic Dates: Refactor hardcoded filters.
  3. Cluster & Verify: Recluster tables and check EXPLAIN plans.

Final Thought

Snowflake’s pay-as-you-go model rewards the efficient. By mastering dynamic ranges and pruning, you’re not just saving money—you’re building a culture of data engineering excellence.

What’s your top Snowflake cost-saving hack? Share in the comments!


Hashtags


BigDataCloudComputingCostOptimizationDataEngineeringFinOpssnowflakeSQL
Alex

Website: https://www.kargin-utkin.com

Related Story
The Great ETL Migration
Data ETL/ELT
The Great ETL Migration
Alex Jul 6, 2025
June 2025: The Month Data Engineering Got Seriously Competitive
Data Monthly
June 2025
Alex Jul 2, 2025
AI Copilots Are Replacing
AI Data ETL/ELT
How AI Copilots Are Replacing Manual Data Pipeline
Alex Jun 28, 2025
IaC Horror Stories
Data
IaC Horror Stories
Alex Jun 26, 2025
Building a Sub-Second Analytics Platform
ClickHouse Data OpenSource
Building a Sub-Second Analytics Platform
Alex Jun 24, 2025
The Evolution of Data Architecture
Data Structure
The Evolution of Data Architecture
Alex Jun 21, 2025
Data Modeling Revolution: Why Old Rules Are Killing Your Performance
Data DataLake
Data Modeling Concepts
Alex Jun 20, 2025
Data Mesh
Data DataLake ETL/ELT
The Hidden Economics of Data Mesh
Alex Jun 19, 2025
The Hidden Psychology of ETL
Data ETL/ELT
The Hidden Psychology of ETL
Alex Jun 18, 2025
The Unstructured Data Breakthrough
Data
The Unstructured Data Breakthrough
Alex Jun 17, 2025

Leave a Reply
Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • The Great ETL Migration
  • June 2025
  • How AI Copilots Are Replacing Manual Data Pipeline
  • IaC Horror Stories
  • Building a Sub-Second Analytics Platform

Recent Comments

  1. smortergiremal on Comparison of Equivalent Cloud Services Across AWS, Google Cloud, and Azure
  2. Ustas on The Genius of Snowflake’s Hybrid Architecture: Revolutionizing Data Warehousing

Archives

  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023

Categories

  • AI
  • Analytics
  • AWS
  • ClickHouse
  • Data
  • Databricks
  • DataLake
  • DuckDB
  • ETL/ELT
  • Future
  • ML
  • Monthly
  • OpenSource
  • Snowflake
  • StarRock
  • Structure
  • VS
YOU MAY HAVE MISSED
The Great ETL Migration
Data ETL/ELT
The Great ETL Migration
Alex Jul 6, 2025
June 2025: The Month Data Engineering Got Seriously Competitive
Data Monthly
June 2025
Alex Jul 2, 2025
AI Copilots Are Replacing
AI Data ETL/ELT
How AI Copilots Are Replacing Manual Data Pipeline
Alex Jun 28, 2025
IaC Horror Stories
Data
IaC Horror Stories
Alex Jun 26, 2025

(c) Data/ML Engineer Blog