Skip to content
  • Tuesday, 2 December 2025
  • 2:51 am
  • Follow Us
Data Engineer

Data/ML Engineer Blog

  • Home
  • Books
    • Pipeline the right WAY
      • Chapter 1
  • Data Engineering
    • Cloud Platforms & Services
      • Alibaba Cloud
      • AWS (Amazon Web Services)
      • Azure Microsoft
      • Google Cloud Platform (GCP)
      • IBM Cloud
      • Oracle Cloud
    • Containerization & Orchestration
      • Amazon EKS
      • Apache Oozie
      • Azure Kubernetes Service (AKS)
      • Buildah
      • Containerd
      • Docker
      • Docker Swarm
      • Google Kubernetes Engine (GKE)
      • Kaniko
      • Kubernetes
      • Podman
      • Rancher
      • Red Hat OpenShift
    • Data Catalog & Governance
      • Alation
      • Amundsen
      • Apache Atlas
      • Apache Griffin
      • Atlan
      • AWS Glue
      • Azure Purview
      • Collibra
      • Collibra
      • Databand
      • DataHub
      • Deequ
      • Google Data Catalog
      • Google Dataplex
      • Great Expectations
      • Informatica
      • Marquez
      • Monte Carlo
      • OpenLineage
      • OpenMetadata
      • Soda SQL
      • Spline
    • Data Ingestion & ETL
      • Apache Kafka Connect
      • Apache NiFi
      • Census
      • Confluent Platform
      • Debezium
      • Fivetran
      • Hightouch
      • Informatica PowerCenter
      • Kettle
      • Matillion
      • Microsoft SSIS
      • Omnata
      • Polytomic
      • Stitch
      • StreamSets
      • Striim
      • Talend
    • Data Lakes & File Standards
      • Amazon S3
      • Apache Arrow
      • Apache Avro
      • Apache Iceberg
      • Azure Data Lake Storage
      • CSV
      • Databricks Delta Lake
      • Dremio
      • Dremio
      • Feather
      • Google Cloud Storage
      • JSON
      • ORC
      • Parquet
    • Data Platforms
      • Cloud Data Warehouses
        • ClickHouse
        • Databricks
        • Snowflake
          • Internal and External Staging in Snowflake
          • Network Rules in Snowflake
          • Procedures + Tasks
          • Snowflake administration and configuration
          • Snowflake Cloning
      • Cloudera Data Platform
      • NoSQL Databases
        • Amazon DynamoDB
        • Apache Cassandra
        • Couchbase
        • Microsoft Azure Cosmos DB
        • MongoDB
        • Neo4j
        • Redis
      • On-Premises Data Warehouses
        • Apache Druid
        • Databend
        • Doris
        • DuckDB
        • QuestDB
        • VeloDB
      • Relational Databases
        • Amazon Aurora
        • Azure SQL Database
        • Google Cloud SQL
        • MariaDB
        • Microsoft SQL Server
        • MySQL
        • Oracle Database
        • PostgreSQL
    • Data Streaming & Messaging
      • ActiveMQ
      • Aiven for Kafka
      • Amazon Kinesis
      • Amazon MSK
      • Apache Kafka
      • Apache Pulsar
      • Azure Event Hubs
      • Confluent Platform
      • Google Pub/Sub
      • IBM Event Streams
      • NATS
      • Protocol Buffers
      • RabbitMQ
      • Red Hat AMQ Streams
    • Data Warehouse Design
      • Data Governance and Management (DGaM)
        • Compliance Requirements
        • Data Lineage
        • Data Retention Policies
        • Data Stewardship
        • Master Data Management
      • Data Warehouse Architectures (DWA)
        • Enterprise Data Warehouse vs. Data Marts
        • Hub-and-Spoke Architecture
        • Logical vs. Physical Data Models
        • ODS (Operational Data Store)
        • Staging Area Design
      • Data Warehouse Schemas (DWS)
        • Data Vault
        • Galaxy Schema (Fact Constellation)
        • Inmon (Normalized) Approach
        • Kimball (Dimensional) Approach
        • Snowflake Schema
        • Star Schema
      • Database Normalization
      • Dimensional Modeling Techniques (DMT)
        • Bridge Tables
        • Conformed Dimensions
        • Degenerate Dimensions
        • Junk Dimensions
        • Mini-Dimensions
        • Outrigger Dimensions
        • Role-Playing Dimensions
      • ETL/ELT Design Patterns
        • Change Data Capture (CDC)
        • Data Pipeline Architectures
        • Data Quality Management
        • Error Handling
        • Metadata Management
      • Fact Table Design Patterns(FTDP)
        • Accumulating Snapshot Fact Tables
        • Aggregate Fact Tables
        • Factless Fact Tables
        • Periodic Snapshot Fact Tables
        • Transaction Fact Tables
      • Modern Data Warehouse Concepts (MDWC)
        • Data Lakehouse
        • Medallion Architecture
        • Multi-modal Persistence
        • Polyglot Data Processing
        • Real-time Data Warehousing
      • Performance Optimization (PO)
        • Compression Techniques
        • Indexing Strategies
        • Materialized Views
        • Partitioning
        • Query Optimization
      • Slowly Changing Dimensions(SCD)
        • SCD Type 0
        • SCD Type 1
        • SCD Type 2
        • SCD Type 3
        • SCD Type 4
        • SCD Type 6
        • SCD Type 7
    • Distributed Data Processing
      • Apache Beam
      • Apache Flink
      • Apache Hadoop
      • Apache Hive
      • Apache Pig
      • Apache Pulsar
      • Apache Samza
      • Apache Sedona
      • Apache Spark
      • Apache Storm
      • Presto/Trino
      • Spark Streaming
    • Infrastructure as Code & Deployment
      • Ansible
      • Argo CD
      • AWS CloudFormation
      • Azure Resource Manager Templates
      • Chef
      • CircleCI
      • GitHub Actions
      • GitLab CI/CD
      • Google Cloud Deployment Manager
      • Jenkins
      • Pulumi
      • Puppet
      • Tekton
      • Terraform
      • Travis CI
    • Monitoring & Logging
      • AppDynamics
      • Datadog
      • Dynatrace
      • ELK Stack
      • Fluentd
      • Graylog
      • Loki
      • Nagios
      • New Relic
      • Splunk
      • Vector
      • Zabbix
    • Operational Systems (OS)
      • Ubuntu
        • Persistent Tasks on Ubuntu
      • Windows
    • Programming Languages
      • Go
      • Java
      • Julia
      • Python
        • Data validation
        • DB / Warehouse I/O + SQL
        • ETL/ELT
        • ML training
        • Monitoring & alerting
        • Python Patterns
          • Abstract Factory
          • Adapter
          • Bridge
          • Builder
          • Caching layers
          • Chain of Responsibility
          • Command
          • Composite
          • Context Managers
          • CQRS-lite
          • Decorator
          • Descriptors
          • Event-driven callbacks
          • Facade
          • Factory Method
          • Flyweight
          • Generators / Iterators
          • Iterator
          • Lightweight Dependency
          • Lightweight Dependency Injection
          • Mediator
          • Memento
          • Observer
          • Plug-in discovery
          • Prototype
          • Proxy
          • Repository & Unit of Work
          • Retry policies
          • Singleton
          • State
          • Strategy
          • Template Method
          • Visitor
        • SQLAlchemy
      • R
      • Scala
      • SQL
    • Visualization Tools
      • Grafana
      • Kibana
      • Looker
      • Metabase
      • Mode
      • Power BI
      • QuickSight
      • Redash
      • Superset
      • Tableau
    • Workflow Orchestration
      • Apache Airflow
      • Apache Beam Python SDK
      • Argo Workflows
      • Azkaban
      • Conductor
      • Cron
      • Dagster
      • DBT (data build tool)
      • Flyte
      • Jenkins Job Builder
      • Keboola
      • Kubeflow Pipelines
      • Luigi
      • Metaflow
      • Nextflow
      • Nifi
      • Prefect
      • Rundeck
      • Snakemake
      • Step Functions
      • Taverna
      • Temporal
  • ML/AI Engineering
    • AI Vocabulary
    • AWS AI/ML Services
    • Compute & Deployment
    • Core AI & ML Concepts
      • Data Processing & ETL
      • Decision Trees
      • Deep Learning
      • Generative AI
      • K-Means Clustering
      • Machine Learning
      • Neural Networks
      • Reinforcement Learning
      • Supervised Learning
      • Unsupervised Learning
    • Database & Storage Services
      • Cortex
    • Emerging AI Trends
    • Evaluation Metrics
    • Industry Applications of AI
    • MLOps & DevOps for AI
    • Model Development & Optimization
    • Prompting Techniques
      • Adversarial Prompting
      • Chain-of-Thought Prompting
      • Constitutional AI Prompting
      • Few-Shot Prompting
      • Instruction Prompting
      • Multi-Agent Prompting
      • Negative Prompting
      • Prompt Templates
      • ReAct Prompting
      • Retrieval-Augmented Generation (RAG)
      • Self-Consistency Prompting
      • Zero-Shot Prompting
    • Security & Compliance
      • AWS KMS
      • AWS Macie
      • Azure Key Vault
      • Azure Purview
      • BigID
      • Cloud DLP
      • HashiCorp Vault
      • Immuta
      • Okera
      • OneTrust
      • Privacera
      • Satori
  • Tech People
    • AI/ML Visionaries
    • Cloudera Data Platform
    • Community Champions
    • Data Quality
    • Data Strategy
    • Modern Stack Leaders
    • Platform Founders
      • Maxime Beauchemin
    • Real-time Systems
  • Home
  • Archive by category "Data"
Data RDS

Secure by Default

Alex Sep 26, 2025 0

Secure by Default: AAD Only, Private Endpoints, and Auditing for Azure SQL A zero-trust, least-privilege checklist you can actually run…

Read More
Lakehouse vs Data Warehouse
Data VS

Lakehouse vs Data Warehouse: Choosing the Right Architecture for Power BI in 2025

Alex Sep 23, 2025 0

Introduction: The $468,000 Decision That Could Define Your Data Strategy Every data leader faces a moment of reckoning: your existing…

Read More
Agentic AI in Data Engineering
AI Data

Agentic AI in Data Engineering

Alex Sep 16, 2025 0

Agentic AI in Data Engineering: How AI Agents Are Automating the Entire Data Lifecycle Introduction: The Dawn of Autonomous Data…

Read More
Data RDS

Query Store Playbook

Alex Sep 13, 2025 0

Query Store Playbook: Finding Regressions and Forcing the Right Plan in Azure SQL Why this matters (a quick story) Yesterday…

Read More
Data

Alert Fatigue in DevOps

Alex Sep 12, 2025 0

Alert Fatigue in DevOps: How to Design Monitoring Alerts People Don’t Ignore If every on-call shift feels like babysitting a…

Read More
Data

Parallel Execution Deep Dive

Alex Sep 10, 2025 0

Parallel Execution Deep Dive: DOP math, skew handling, and monitoring PX servers like a pro Hook: Your query “flies” in…

Read More
Data PostgreSQL RDS

Logical vs Physical Replication in PostgreSQL

Alex Sep 8, 2025 0

Logical vs Physical Replication in PostgreSQL: Blue/Green, Zero-Downtime Upgrades, and Read Scaling (with Real Cutover Runbooks) Ever scheduled a “maintenance…

Read More
Analytics Data OpenSource Structure

Designing Partitions & Buckets in Apache Doris

Alex Sep 3, 2025 0

Designing Partitions & Buckets in Apache Doris: Rules of Thumb to Auto Partition for Time-Series Meta description (157 chars):A practical…

Read More
Data RDS

From RDS to Aurora

Alex Sep 1, 2025 0

From RDS to Aurora: A Migration Checklist for Mid-Size Teams Version checks, storage limits (128–256 TiB), endpoint switching, and load…

Read More
Data PostgreSQL RDS

PostgreSQL Partitioning in Practice

Alex Aug 27, 2025 0

PostgreSQL Partitioning in Practice: Monthly, Hash, and Hybrid Patterns Blueprints + automation snippets Why this matters (a quick hook) Your…

Read More

Posts pagination

1 2 3 4 … 14

Recent Posts

  • Monitoring 101 for Data Engineers
  • Materialized Views in the Real World
  • Kafka Ingestion with Apache Doris Routine Load
  • Structured Logging 101
  • Pandas → QuestDB

Recent Comments

  1. smortergiremal on Comparison of Equivalent Cloud Services Across AWS, Google Cloud, and Azure
  2. Ustas on The Genius of Snowflake’s Hybrid Architecture: Revolutionizing Data Warehousing

Archives

  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023

Categories

  • AI
  • Analytics
  • AWS
  • ClickHouse
  • Data
  • Databricks
  • DataLake
  • DevOps
  • DuckDB
  • Future
  • ML
  • Monthly
  • NoSQL
  • OpenSource
  • Oracle
  • PostgreSQL
  • Python
  • RDS
  • Snowflake
  • StarRock
  • Structure
  • VS
YOU MAY HAVE MISSED
Monitoring 101 for Data Engineers
Data
Monitoring 101 for Data Engineers
Alex Nov 25, 2025
Materialized Views in the Real World
Oracle
Materialized Views in the Real World
Alex Nov 21, 2025
Kafka Ingestion with Apache Doris Routine Load
NoSQL
Kafka Ingestion with Apache Doris Routine Load
Alex Nov 20, 2025
Structured Logging 101
Data
Structured Logging 101
Alex Nov 18, 2025

(c) Data/ML Engineer Blog