If you've been anywhere near the data engineering ecosystem in the past two years, you've heard the arguments. Slack channels light up, conference talks draw crowds, and vendor marketing teams work overtime. The open table format war between Apache Iceberg, Delta Lake, and Apache Hudi is real, and the stakes are high: whoever wins shapes how the next generation of data lakehouses stores, queries, and governs petabyte-scale datasets.
I've spent the better part of 2025 evaluating all three formats across production workloads at two different companies. One was a 40TB Spark-heavy pipeline on AWS, the other a 200TB mixed workload on a multi-cloud setup. This article is what I wish someone had handed me before I started.
Key Takeaways (TL;DR)
- Apache Iceberg has the strongest momentum heading into 2026, with the broadest engine compatibility and cleanest separation between compute and storage.
- Delta Lake remains the default if you're all-in on Databricks, but the gap has narrowed significantly since Delta went fully open-source.
- Apache Hudi excels at incremental ingestion and CDC workloads, but its complexity and smaller community make it harder to justify for greenfield projects.
- All three now support time travel, schema evolution, ACID transactions, and partition evolution. The differences are in how they implement these features and who supports them.
- If you're starting fresh today, Iceberg is the safest bet. If you're already on Databricks, Delta Lake is perfectly fine. If you have heavy CDC requirements and Hudi expertise on your team, stick with it.
What Problem Do Open Table Formats Actually Solve?
Before we compare, let's be clear about why these formats exist. Traditional data lakes built on Parquet or ORC files have no concept of transactions. If a Spark job writes 500 files and crashes at file 347, your table is corrupt. There's no consistent snapshot, no rollback, no schema enforcement. You end up writing custom cleanup scripts and hoping your downstream consumers don't read half-written data.
Open table formats add a metadata layer on top of your existing file formats (still Parquet under the hood, usually) that provides:
- ACID transactions for concurrent reads and writes
- Time travel to query historical snapshots
- Schema evolution without rewriting data
- Partition evolution without data migration
- Row-level updates and deletes (critical for GDPR compliance)
Think of them as adding a "table of contents" and "version history" to what was previously just a directory full of files.
Architecture Deep Dive
Apache Iceberg: The Catalog-Centric Approach
Iceberg uses a three-layer metadata architecture that's elegant in its simplicity. At the top, a catalog (Hive Metastore, AWS Glue, Nessie, or REST catalog) points to the current metadata file. That metadata file contains a list of manifest lists, each representing a snapshot. Each manifest list points to manifest files that track individual data files along with column-level statistics (min/max values, null counts).
This design means Iceberg can plan queries by reading only the metadata files it needs, skipping entire manifest groups based on partition pruning and column statistics. For a 10TB table, query planning might read 50KB of metadata instead of listing millions of files from object storage.
The key architectural decision: Iceberg tracks data at the file level with rich statistics. This makes it engine-agnostic by design — any engine that can read the metadata format can query the table without needing a specific runtime.
Delta Lake: The Transaction Log
Delta Lake takes a different approach centered around a JSON-based transaction log (the _delta_log directory). Every operation — write, delete, schema change, compaction — appends a new JSON file to this log. Periodically, Delta creates checkpoint files (Parquet format) that consolidate the log entries for faster reads.
To reconstruct the current table state, a reader starts from the latest checkpoint and replays any subsequent JSON log entries. This is conceptually similar to a write-ahead log in databases.
Delta's architecture was originally tightly coupled to Spark and the Databricks runtime, which gave it performance advantages in that ecosystem but limited adoption elsewhere. The open-sourcing of Delta Lake 3.0 and the UniForm initiative have been Databricks' answer to this criticism, but the Spark-first DNA is still visible in the codebase.
Apache Hudi: The Timeline
Hudi (Hadoop Upserts Deletes and Incrementals) organizes everything around a timeline of actions stored in the .hoodie metadata directory. Each action (commit, compaction, clean, rollback) is recorded as an instant on this timeline.
What makes Hudi architecturally distinct is its two storage types:
- Copy-on-Write (CoW): Updates rewrite entire files. Read performance is excellent because there are no merge operations at query time. Write amplification can be significant.
- Merge-on-Read (MoR): Updates go to delta log files that get merged during reads or background compaction. Write performance is better, but reads require merging base files with deltas.
This dual-storage model gives Hudi fine-grained control over the read/write performance tradeoff, which is why it shines in near-real-time ingestion scenarios. But it also adds operational complexity — you now have to think about compaction strategies, cleaning policies, and clustering configurations.
Feature Comparison Table
| Feature | Apache Iceberg | Delta Lake | Apache Hudi |
|---|---|---|---|
| ACID Transactions | Yes (optimistic concurrency) | Yes (optimistic concurrency) | Yes (timeline-based) |
| Time Travel | Snapshot-based, configurable retention | Version-based, default 30 days | Timeline-based, configurable |
| Schema Evolution | Full (add, drop, rename, reorder columns) | Add/rename columns; mergeSchema on write | Add columns, limited type promotion |
| Partition Evolution | Yes, without rewriting data | No (requires full rewrite) | No (requires full rewrite) |
| Hidden Partitioning | Yes (transforms: year, month, day, hour, bucket, truncate) | No (Liquid Clustering as alternative) | No |
| Row-Level Deletes | Copy-on-write and merge-on-read | Deletion vectors (since 3.0) | CoW and MoR native |
| Streaming Ingest | Supported (via Flink, Spark Structured Streaming) | Strong (native Spark Structured Streaming) | Excellent (DeltaStreamer, Flink) |
| Engine Support | Spark, Flink, Trino, Presto, Dremio, Snowflake, BigQuery, Athena, StarRocks | Spark, Flink (limited), Trino (via connectors), Databricks | Spark, Flink, Presto, Trino, Hive |
| Cloud Catalog | REST catalog, Glue, Nessie, Polaris, Unity (via UniForm) | Unity Catalog, Glue (limited) | Hive Metastore, Glue |
| Governance | Apache Software Foundation | Linux Foundation (Databricks-led) | Apache Software Foundation |
Code Examples: Common Operations
Let's see how each format handles the same fundamental operations. All examples use PySpark 3.5+.
Creating a Table
# Apache Iceberg
spark.sql("""
CREATE TABLE catalog.db.events (
event_id BIGINT,
user_id BIGINT,
event_type STRING,
event_ts TIMESTAMP,
payload STRING
)
USING iceberg
PARTITIONED BY (days(event_ts))
""")
# Delta Lake
spark.sql("""
CREATE TABLE delta_db.events (
event_id BIGINT,
user_id BIGINT,
event_type STRING,
event_ts TIMESTAMP,
payload STRING
)
USING delta
PARTITIONED BY (date_trunc('day', event_ts))
""")
# Apache Hudi — typically done via DataFrame write
df.write.format("hudi") \
.option("hoodie.table.name", "events") \
.option("hoodie.datasource.write.recordkey.field", "event_id") \
.option("hoodie.datasource.write.precombine.field", "event_ts") \
.option("hoodie.datasource.write.partitionpath.field", "event_ts") \
.option("hoodie.datasource.write.table.type", "COPY_ON_WRITE") \
.mode("overwrite") \
.save("/data/hudi/events")
Notice how Iceberg's days(event_ts) is a hidden partition transform — users query event_ts directly, and Iceberg figures out the partitioning. With Hudi, you immediately see the verbosity: five Hoodie-specific options just to create a table.
Upserts (Merge)
# Apache Iceberg (Spark 3.4+)
spark.sql("""
MERGE INTO catalog.db.events t
USING updates s
ON t.event_id = s.event_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
""")
# Delta Lake
spark.sql("""
MERGE INTO delta_db.events t
USING updates s
ON t.event_id = s.event_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
""")
# Apache Hudi
updates_df.write.format("hudi") \
.option("hoodie.table.name", "events") \
.option("hoodie.datasource.write.recordkey.field", "event_id") \
.option("hoodie.datasource.write.precombine.field", "event_ts") \
.option("hoodie.datasource.write.operation", "upsert") \
.mode("append") \
.save("/data/hudi/events")
Iceberg and Delta have nearly identical MERGE syntax (both follow SQL standard). Hudi's DataFrame API works but is harder to reason about for complex merge logic.
Time Travel
# Apache Iceberg — by snapshot ID or timestamp
spark.sql("SELECT * FROM catalog.db.events VERSION AS OF 123456789")
spark.sql("SELECT * FROM catalog.db.events TIMESTAMP AS OF '2025-11-01 00:00:00'")
# Also: inspect snapshot history
spark.sql("SELECT * FROM catalog.db.events.snapshots")
# Delta Lake — by version or timestamp
spark.sql("SELECT * FROM delta_db.events VERSION AS OF 42")
spark.sql("SELECT * FROM delta_db.events TIMESTAMP AS OF '2025-11-01'")
# Also: audit log
spark.sql("DESCRIBE HISTORY delta_db.events")
# Apache Hudi — timestamp-based
spark.read.format("hudi") \
.option("as.of.instant", "20251101000000000") \
.load("/data/hudi/events")
Schema Evolution
# Apache Iceberg — full schema evolution
spark.sql("ALTER TABLE catalog.db.events ADD COLUMNS (device STRING)")
spark.sql("ALTER TABLE catalog.db.events RENAME COLUMN payload TO event_payload")
spark.sql("ALTER TABLE catalog.db.events DROP COLUMN device")
spark.sql("ALTER TABLE catalog.db.events ALTER COLUMN user_id TYPE BIGINT")
# Delta Lake — add and rename
spark.sql("ALTER TABLE delta_db.events ADD COLUMNS (device STRING)")
spark.sql("ALTER TABLE delta_db.events RENAME COLUMN payload TO event_payload")
# Column drop supported since Delta 3.0 with column mapping enabled
# Apache Hudi — add columns only (broad support)
spark.sql("ALTER TABLE hudi_db.events ADD COLUMNS (device STRING)")
Iceberg clearly leads on schema evolution. The ability to rename, reorder, and drop columns without rewriting data is a genuine operational advantage. I've had production scenarios where a column rename in Delta required a full table rewrite because column mapping wasn't enabled from the start — a mistake you can't easily undo.
Why Iceberg Is Winning
Let me be direct: if you asked me to pick a single table format for a new project in 2026, I'd pick Iceberg. Here's why, and I'll try to be honest about the caveats.
Engine Neutrality Is Not Just Marketing
The single biggest advantage of Iceberg is that it genuinely works well across engines. I've queried the same Iceberg tables from Spark, Trino, Flink, and Athena in production. The same table, the same catalog, consistent results. Try doing that with Delta Lake — you'll hit connector quirks, version mismatches, and features that only work on the Databricks runtime.
Snowflake natively reads Iceberg tables. BigQuery added Iceberg support. AWS made Iceberg the default for Athena and EMR. Apple, Netflix, and LinkedIn run Iceberg at massive scale. When the three largest cloud providers and some of the largest data organizations all converge on one format, that's a signal worth paying attention to.
Partition Evolution Solves a Real Pain Point
This is the feature that sold me. In a previous role, we had a Delta table partitioned by day that needed to switch to hourly partitioning as data volume grew. That meant: create a new table, backfill all historical data with new partitioning, swap downstream consumers, and delete the old table. A weekend project that took two weeks due to edge cases.
With Iceberg, you run:
spark.sql("""
ALTER TABLE catalog.db.events
ADD PARTITION FIELD hours(event_ts)
""")
# New data uses hourly partitions
# Old data stays in daily partitions
# Queries work correctly across both — Iceberg handles it transparently
No data rewrite. No downtime. Historical queries still work. New data uses the new scheme. This alone justifies the format choice for any table that might change partitioning over its lifetime (which is most of them).
The REST Catalog Standard
Iceberg's REST catalog specification is quietly becoming the standard API for table format catalogs. It provides a vendor-neutral HTTP interface for catalog operations, which means your tooling doesn't need format-specific connectors. Snowflake's managed Iceberg tables, Tabular (now part of Databricks, ironically), and Dremio's Arctic all implement this specification. Even Databricks Unity Catalog now exposes Iceberg-compatible metadata through UniForm.
The Caveats
Iceberg isn't perfect. Small file management still requires attention — you need to run compaction regularly. The ecosystem for Iceberg-native CDC ingestion isn't as mature as Hudi's DeltaStreamer. And if you're on Databricks, Delta Lake's integration is still smoother for day-to-day operations because the optimizations are built into the runtime.
Iceberg's community is also still catching up on documentation. The official docs are comprehensive but dry, and many "Apache Iceberg tutorial" results online are either outdated or vendor-specific. The learning curve is real, even if the concepts are sound.
When Delta Lake Still Makes Sense
I don't want to make this a one-sided argument. Delta Lake is a mature, battle-tested format with real advantages:
- Databricks-native performance: Photon engine optimizations, predictive I/O, auto-compaction, and Liquid Clustering are genuinely impressive. If Databricks is your platform, Delta tables will be faster out of the box.
- Liquid Clustering: This is Delta's answer to partition evolution, and in some ways it's more elegant — it automatically reorganizes data based on query patterns without manual partition management.
- Simpler mental model: The transaction log approach is easier to explain and debug than Iceberg's multi-layer metadata.
- UniForm: Delta tables can now expose Iceberg-compatible metadata, so you can write with Delta and read with Iceberg-compatible engines. It's not perfect, but it reduces lock-in concerns.
If your entire data stack runs on Databricks and there's no plan to change, Delta Lake is the pragmatic choice. Don't migrate for the sake of following industry trends.
When Hudi Is the Right Choice
Hudi gets overlooked in the table format comparison discourse, which is a shame because it solves a specific class of problems better than the alternatives:
- High-frequency CDC ingestion: Hudi's DeltaStreamer and its native understanding of change data capture make it excellent for streaming database replicas into a data lake. If you're syncing hundreds of MySQL/Postgres tables in near-real-time, Hudi's tooling is purpose-built for this.
- Fine-grained write control: The CoW/MoR choice, combined with configurable indexing (Bloom, HBase, bucket) and compaction strategies, gives experienced teams knobs that Iceberg and Delta don't expose.
- Record-level indexing: Hudi can maintain indexes on record keys, which makes point lookups and upserts on specific records faster than file-level scanning.
The tradeoff is complexity. A production Hudi deployment requires more configuration, more monitoring, and more expertise than either Iceberg or Delta. The community is smaller, Stack Overflow answers are fewer, and the documentation assumes more background knowledge.
Migration Considerations
If you're planning a migration between formats — or from raw Parquet to any of these — here's what I've learned the hard way:
From Raw Parquet/ORC
Start with a metadata-only migration if possible. Both Iceberg and Delta support "in-place" conversion that adds the metadata layer without rewriting data files. This is dramatically faster than a full rewrite.
# Iceberg: migrate existing Parquet table
spark.sql("""
CALL catalog.system.migrate('db.existing_parquet_table')
""")
# Delta: convert existing Parquet
spark.sql("""
CONVERT TO DELTA parquet.`/path/to/parquet/table`
PARTITIONED BY (date STRING)
""")
Between Table Formats
There's no magic converter between formats. The practical approach is:
- Create the target table with the desired schema and partitioning.
- Read from source format, write to target format using Spark.
- Validate row counts, schema, and partition structure.
- Update downstream consumers (this is always the hardest part).
- Keep the source table around for at least two weeks as a rollback option.
For large tables (50TB+), consider migrating partition by partition rather than all at once. This keeps the operation resumable and reduces blast radius.
Testing Your Migration
# Validate row counts after migration
source_count = spark.table("old_format.events").count()
target_count = spark.table("new_format.events").count()
assert source_count == target_count, f"Row count mismatch: {source_count} vs {target_count}"
# Validate schema compatibility
source_schema = spark.table("old_format.events").schema
target_schema = spark.table("new_format.events").schema
for field in source_schema:
assert field.name in [f.name for f in target_schema], f"Missing field: {field.name}"
# Spot-check data quality on key columns
spark.sql("""
SELECT
(SELECT COUNT(DISTINCT user_id) FROM old_format.events) as old_users,
(SELECT COUNT(DISTINCT user_id) FROM new_format.events) as new_users
""").show()
Performance: What the Benchmarks Don't Tell You
Every vendor publishes benchmarks showing their format is fastest. Here's what actually matters in production:
Query planning overhead: Iceberg's rich file-level statistics mean the query planner can skip irrelevant files without reading them. For tables with thousands of partitions, this is measurable — I've seen Iceberg plan queries 3-5x faster than Delta on wide tables with 500+ columns.
Write amplification: For update-heavy workloads, Hudi's MoR mode has the lowest write amplification because it appends delta logs instead of rewriting files. Iceberg and Delta (with deletion vectors) have improved significantly, but Hudi still edges them out for workloads with >20% update ratio.
Small file problem: All three formats suffer from it, all three have compaction mechanisms. In my experience, Delta's auto-compaction on Databricks is the most hands-off. Iceberg requires explicit compaction jobs (or using the Iceberg-managed compaction service). Hudi's compaction is deeply configurable but demands attention.
Concurrent writers: Iceberg handles concurrent writers well with optimistic concurrency and retry mechanisms. Delta's conflict resolution is tightly integrated with Spark. Hudi supports concurrent writes but requires careful configuration of lock providers.
The Convergence Trend
Here's the thing nobody in the vendor marketing departments wants to admit: these formats are converging. Delta added deletion vectors (an Iceberg concept). Iceberg added merge-on-read (a Hudi concept). Hudi added metadata-based file listing (an Iceberg concept). Databricks acquired Tabular (the Iceberg company). Delta's UniForm writes Iceberg-compatible metadata.
In three years, the format you choose might matter less than it does today. The interoperability layers are getting better, and the feature gaps are shrinking. The question isn't "which format is best forever" but "which format gives me the best experience for my current stack and the most options for my future stack."
Practical Recommendations
After living with all three formats in production, here's my decision framework:
- Choose Iceberg if: You use multiple query engines, want partition evolution, care about long-term vendor neutrality, or are starting a greenfield data platform. Also if Snowflake, BigQuery, or Athena are in your stack.
- Choose Delta Lake if: You're on Databricks and plan to stay. The native integration, auto-optimization features, and Liquid Clustering make it the path of least resistance. UniForm mitigates lock-in concerns.
- Choose Hudi if: Your primary use case is streaming CDC ingestion from transactional databases, you need fine-grained control over write performance, and your team has (or can build) Hudi expertise.
- Don't migrate for the sake of migrating: If your current format works, the cost of migration (engineering time, risk, downstream changes) often outweighs the theoretical benefits of switching.
The table format war isn't really about which format is objectively "best" — it's about which format fits your architecture, your team, and your trajectory. Iceberg has the momentum and the broadest compatibility. Delta has the deepest single-vendor integration. Hudi has the most specialized ingestion tooling. Pick the one that solves your actual problems, not the one that wins Twitter arguments.
Leave a Comment