Skip to content
  • Saturday, 21 June 2025
  • 10:13:28 PM
  • Follow Us
Data Engineer

Data/ML Engineer Blog

  • Home
  • AL/ML Engineering
    • AWS AI/ML Services
    • Compute & Deployment
    • Core AI & ML Concepts
      • Data Processing & ETL
      • Decision Trees
      • Deep Learning
      • Generative AI
      • K-Means Clustering
      • Machine Learning
      • Neural Networks
      • Reinforcement Learning
      • Supervised Learning
      • Unsupervised Learning
    • Database & Storage Services
    • Emerging AI Trends
    • Evaluation Metrics
    • Industry Applications of AI
    • MLOps & DevOps for AI
    • Model Development & Optimization
    • Prompting Techniques
      • Adversarial Prompting
      • Chain-of-Thought Prompting
      • Constitutional AI Prompting
      • Few-Shot Prompting
      • Instruction Prompting
      • Multi-Agent Prompting
      • Negative Prompting
      • Prompt Templates
      • ReAct Prompting
      • Retrieval-Augmented Generation (RAG)
      • Self-Consistency Prompting
      • Zero-Shot Prompting
    • Security & Compliance
      • AWS KMS
      • AWS Macie
      • Azure Key Vault
      • Azure Purview
      • BigID
      • Cloud DLP
      • Collibra Privacy & Risk
      • HashiCorp Vault
      • Immuta
      • Okera
      • OneTrust
      • Privacera
      • Satori
  • Data Engineering
    • Cloud Platforms & Services
      • Alibaba Cloud
      • AWS (Amazon Web Services)
      • Azure Microsoft
      • Google Cloud Platform (GCP)
      • IBM Cloud
      • Oracle Cloud
    • Containerization & Orchestration
      • Amazon EKS
      • Apache Oozie
      • Azure Kubernetes Service (AKS)
      • Buildah
      • Containerd
      • Docker
      • Docker Swarm
      • Google Kubernetes Engine (GKE)
      • Kaniko
      • Kubernetes
      • Podman
      • Rancher
      • Red Hat OpenShift
    • Data Catalog & Governance
      • Amundsen
      • Apache Atlas
      • Apache Griffin
      • Atlan
      • AWS Glue
      • Azure Purview
      • Collibra
      • Databand
      • DataHub
      • Deequ
      • Google Data Catalog
      • Google Dataplex
      • Great Expectations
      • Informatica
      • Marquez
      • Monte Carlo
      • OpenLineage
      • OpenMetadata
      • Soda SQL
      • Spline
    • Data Ingestion & ETL
      • Apache Kafka Connect
      • Apache NiFi
      • Census
      • Confluent Platform
      • Debezium
      • Fivetran
      • Hightouch
      • Informatica PowerCenter
      • Kettle
      • Matillion
      • Microsoft SSIS
      • Omnata
      • Polytomic
      • Stitch
      • StreamSets
      • Striim
      • Talend
    • Data Lakes & File Standards
      • Amazon S3
      • Apache Arrow
      • Apache Avro
      • Apache Iceberg
      • Azure Data Lake Storage
      • CSV
      • Databricks Delta Lake
      • Dremio
      • Dremio
      • Feather
      • Google Cloud Storage
      • JSON
      • ORC
      • Parquet
    • Data Platforms
      • Cloud Data Warehouses
        • ClickHouse
        • Databricks
        • Snowflake
          • Internal and External Staging in Snowflake
          • Network Rules in Snowflake
          • Procedures + Tasks
          • Snowflake administration and configuration
          • Snowflake Cloning
      • Cloudera Data Platform
      • NoSQL Databases
      • On-Premises Data Warehouses
        • DuckDB
      • Relational Databases
        • Amazon Aurora
        • Azure SQL Database
        • Google Cloud SQL
        • MariaDB
        • Microsoft SQL Server
        • MySQL
        • Oracle Database
        • PostgreSQL
    • Data Streaming & Messaging
      • ActiveMQ
      • Aiven for Kafka
      • Amazon Kinesis
      • Amazon MSK
      • Apache Kafka
      • Apache Pulsar
      • Azure Event Hubs
      • Confluent Platform
      • Google Pub/Sub
      • IBM Event Streams
      • NATS
      • Protocol Buffers
      • RabbitMQ
      • Red Hat AMQ Streams
    • Data Warehouse Design
      • Data Governance and Management (DGaM)
        • Compliance Requirements
        • Data Lineage
        • Data Retention Policies
        • Data Stewardship
        • Master Data Management
      • Data Warehouse Architectures (DWA)
        • Enterprise Data Warehouse vs. Data Marts
        • Hub-and-Spoke Architecture
        • Logical vs. Physical Data Models
        • ODS (Operational Data Store)
        • Staging Area Design
      • Data Warehouse Schemas (DWS)
        • Data Vault
        • Galaxy Schema (Fact Constellation)
        • Inmon (Normalized) Approach
        • Kimball (Dimensional) Approach
        • Snowflake Schema
        • Star Schema
      • Database Normalization
      • Dimensional Modeling Techniques (DMT)
        • Bridge Tables
        • Conformed Dimensions
        • Degenerate Dimensions
        • Junk Dimensions
        • Mini-Dimensions
        • Outrigger Dimensions
        • Role-Playing Dimensions
      • ETL/ELT Design Patterns
        • Change Data Capture (CDC)
        • Data Pipeline Architectures
        • Data Quality Management
        • Error Handling
        • Metadata Management
      • Fact Table Design Patterns(FTDP)
        • Accumulating Snapshot Fact Tables
        • Aggregate Fact Tables
        • Factless Fact Tables
        • Periodic Snapshot Fact Tables
        • Transaction Fact Tables
      • Modern Data Warehouse Concepts (MDWC)
        • Data Lakehouse
        • Medallion Architecture
        • Multi-modal Persistence
        • Polyglot Data Processing
        • Real-time Data Warehousing
      • Performance Optimization (PO)
        • Compression Techniques
        • Indexing Strategies
        • Materialized Views
        • Partitioning
        • Query Optimization
      • Slowly Changing Dimensions(SCD)
        • SCD Type 0
        • SCD Type 1
        • SCD Type 2
        • SCD Type 3
        • SCD Type 4
        • SCD Type 6
        • SCD Type 7
    • Distributed Data Processing
      • Apache Beam
      • Apache Flink
      • Apache Hadoop
      • Apache Hive
      • Apache Pig
      • Apache Pulsar
      • Apache Samza
      • Apache Sedona
      • Apache Spark
      • Apache Storm
      • Presto/Trino
      • Spark Streaming
    • Infrastructure as Code & Deployment
      • Ansible
      • Argo CD
      • AWS CloudFormation
      • Azure Resource Manager Templates
      • Chef
      • CircleCI
      • GitHub Actions
      • GitLab CI/CD
      • Google Cloud Deployment Manager
      • Jenkins
      • Pulumi
      • Puppet: Configuration Management Tool for Modern Infrastructure
      • Tekton
      • Terraform
      • Travis CI
    • Monitoring & Logging
      • AppDynamics
      • Datadog
      • Dynatrace
      • ELK Stack
      • Fluentd
      • Graylog
      • Loki
      • Nagios
      • New Relic
      • Splunk
      • Vector
      • Zabbix
    • Operational Systems (OS)
      • Ubuntu
        • Persistent Tasks on Ubuntu
      • Windows
    • Programming Languages
      • Go
      • Java
      • Julia
      • Python
        • Dask
        • NumPy
        • Pandas
        • PySpark
        • SQLAlchemy
      • R
      • Scala
      • SQL
    • Visualization Tools
      • Grafana
      • Kibana
      • Looker
      • Metabase
      • Mode
      • Power BI
      • QuickSight
      • Redash
      • Superset
      • Tableau
    • Workflow Orchestration
      • Apache Airflow
      • Apache Beam Python SDK
      • Azkaban
      • Cron
      • Dagster
      • Dagster Change
      • DBT (data build tool)
      • Jenkins Job Builder
      • Keboola
      • Luigi
      • Prefect
      • Rundeck
      • Temporal
  • Home
  • Snowflake Cost-Saving Tactics
Data Snowflake

Snowflake Cost-Saving Tactics

Alex Apr 10, 2025 0
Snowflake Cost-Saving Tactics: Real SQL Techniques Using Dynamic Date Ranges and Partition Pruning

Snowflake Cost-Saving Tactics: Real SQL Techniques Using Dynamic Date Ranges and Partition Pruning

How to Stop Paying for Data You Don’t Need

Snowflake’s elasticity is a double-edged sword. Without discipline, costs spiral as queries scan terabytes of irrelevant data. But with smart SQL design, you can turn Snowflake’s architecture into a cost-saving superpower. Let’s dive into the exact techniques.


1. Dynamic Date Ranges: The $10M Mistake Most Teams Make

Hardcoding date filters (e.g., WHERE date BETWEEN ‘2024-01-01’ AND ‘2024-03-31’) is a silent budget killer. Why?

  • Queries break when dates change.
  • Full-table scans occur if dates aren’t aligned with partitions.
  • Engineers waste hours updating code.

The Fix: Dynamic Date Logic in SQL

Use Snowflake’s date functions to automate range selection.

Example 1: Rolling 7-Day Window

SELECT 
  user_id, 
  SUM(revenue) AS total_revenue  
FROM fact_sales  
WHERE sale_date >= CURRENT_DATE() - 7  -- Automatically adjusts daily  
  AND sale_date < CURRENT_DATE()  
GROUP BY 1; 

Impact: Scans only 7 days of data instead of entire history.

Example 2: Month-to-Date (MTD) Analysis

SELECT  
  product_id,  
  COUNT(*) AS orders  
FROM orders  
WHERE order_date >= DATE_TRUNC('MONTH', CURRENT_DATE())  -- First day of current month  
GROUP BY 1; 

Impact: Avoids scanning prior months’ data.


2. Partition Pruning: Snowflake’s Secret Weapon

Snowflake auto-partitions data into micro-partitions, but it can’t prune what it can’t see. Your SQL must give Snowflake hints to skip irrelevant partitions.

How to Force Pruning

Step 1: Cluster Tables by Date

-- Create a clustered table  
CREATE TABLE fact_sales  
CLUSTER BY (sale_date)  
AS  
SELECT * FROM raw_sales; 

Clustering by sale_date physically groups data, making pruning efficient.

Step 2: Filter on Cluster Keys

SELECT *  
FROM fact_sales  
WHERE sale_date = '2024-05-15';  -- Snowflake skips all other partitions 

Step 3: Verify with EXPLAIN

Check if pruning worked using EXPLAIN:

EXPLAIN  
SELECT ...  
WHERE sale_date >= CURRENT_DATE - 7; 

Look for “Partition pruning: 1000/1000 partitions” in the plan.


3. Advanced Tactics: Subquery Pruning & Semi-Structured Data

Tactic 1: Prune with Subqueries

Instead of scanning entire tables, filter early:

WITH active_users AS (  
  SELECT user_id  
  FROM dim_users  
  WHERE last_login_date >= CURRENT_DATE - 30  
)  
SELECT  
  u.user_id,  
  SUM(s.revenue)  
FROM fact_sales s  
JOIN active_users u ON s.user_id = u.user_id  -- Prunes sales table during join  
WHERE s.sale_date >= CURRENT_DATE - 30; 

Why It Works: Snowflake prunes fact_sales using both sale_date and the join.

Tactic 2: Semi-Structured Data Optimization

For JSON/XML data, use LATERAL FLATTEN with filters:

SELECT  
  event_data:user_id::STRING AS user_id,  
  event_data:amount::FLOAT AS amount  
FROM raw_events  
WHERE event_date >= CURRENT_DATE - 3  
  AND PARSE_JSON(event_data):amount > 100;  -- Filter before flattening 

Impact: Filters JSON rows before flattening, reducing compute.


4. Common Pitfalls (and How to Avoid Them)

Pitfall 1: Functions on Partitioned Columns

This disables pruning:

SELECT *  
FROM fact_sales  
WHERE YEAR(sale_date) = 2024;  -- Snowflake can’t map this to partitions 

Fix:

WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'; 

Pitfall 2: Over-Partitioning

Too many partitions slow metadata lookups. Cluster by date + high-cardinality columns (e.g., (sale_date, region_id)).


5. Real-World Savings: A Case Study

A retail client reduced monthly Snowflake costs from 85kto85kto49k by:

  1. Replacing static date filters with dynamic ranges.
  2. Clustering 10 TB of sales data by sale_date.
  3. Rewriting 60+ reports to use partition-aware joins.

Key Metrics:

  • Average query scan size: ↓ 72%
  • Warehouse runtime: ↓ 58%

Your Action Plan

  1. Audit Queries: Use QUERY_HISTORY to find full-table scans:
  2. Implement Dynamic Dates: Refactor hardcoded filters.
  3. Cluster & Verify: Recluster tables and check EXPLAIN plans.

Final Thought

Snowflake’s pay-as-you-go model rewards the efficient. By mastering dynamic ranges and pruning, you’re not just saving money—you’re building a culture of data engineering excellence.

What’s your top Snowflake cost-saving hack? Share in the comments!


Hashtags


BigDataCloudComputingCostOptimizationDataEngineeringFinOpssnowflakeSQL
Alex

Website: https://www.kargin-utkin.com

Related Story
The Evolution of Data Architecture
Data Structure
The Evolution of Data Architecture
Alex Jun 21, 2025
Data Modeling Revolution: Why Old Rules Are Killing Your Performance
Data DataLake
Data Modeling Concepts
Alex Jun 20, 2025
Data Mesh
Data DataLake ETL/ELT
The Hidden Economics of Data Mesh
Alex Jun 19, 2025
The Hidden Psychology of ETL
Data ETL/ELT
The Hidden Psychology of ETL
Alex Jun 18, 2025
The Unstructured Data Breakthrough
Data
The Unstructured Data Breakthrough
Alex Jun 17, 2025
Databricks vs. Snowflake: The Performance Edge They Hide
Databricks Snowflake VS
Databricks vs. Snowflake: The Performance Edge They Hide
Alex Jun 16, 2025
GenAI-Assisted Data Cleaning: Beyond Rule-Based Approaches
AI Data
GenAI-Assisted Data Cleaning
Alex Jun 14, 2025
Iceberg vs. Hudi vs. Delta Lake
Data VS
Iceberg vs. Hudi vs. Delta Lake
Alex Jun 13, 2025
The Great Cloud Vendor War
Data VS
The Great Cloud Vendor War
Alex Jun 12, 2025
Observability-Driven Data Engineering
Data
Observability-Driven Data Engineering
Alex Jun 10, 2025

Leave a Reply
Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • The Evolution of Data Architecture
  • Data Modeling Concepts
  • The Hidden Economics of Data Mesh
  • The Hidden Psychology of ETL
  • The Unstructured Data Breakthrough

Recent Comments

  1. Ustas on The Genius of Snowflake’s Hybrid Architecture: Revolutionizing Data Warehousing

Archives

  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023

Categories

  • AI
  • Analytics
  • AWS
  • ClickHouse
  • Data
  • Databricks
  • DataLake
  • DuckDB
  • ETL/ELT
  • Future
  • ML
  • Monthly
  • OpenSource
  • Snowflake
  • StarRock
  • Structure
  • VS
YOU MAY HAVE MISSED
The Evolution of Data Architecture
Data Structure
The Evolution of Data Architecture
Alex Jun 21, 2025
Data Modeling Revolution: Why Old Rules Are Killing Your Performance
Data DataLake
Data Modeling Concepts
Alex Jun 20, 2025
Data Mesh
Data DataLake ETL/ELT
The Hidden Economics of Data Mesh
Alex Jun 19, 2025
The Hidden Psychology of ETL
Data ETL/ELT
The Hidden Psychology of ETL
Alex Jun 18, 2025

(c) Data/ML Engineer Blog