Key Takeaways
- The Modern Data Stack (MDS) era is over. The Fivetran + Snowflake + dbt + Looker combination that defined 2020-2023 data engineering has fractured under its own weight: cost explosions, integration nightmares, and vendor lock-in drove teams to seek alternatives.
- The 2024-2025 startup graveyard is real. At least a dozen well-funded MDS companies shut down, pivoted, or were acqui-hired as the market corrected from peak hype.
- Consolidated platforms are winning. Databricks Unity Catalog, Snowflake's expanding surface area, and open-source stacks like DuckDB + dbt-core are replacing the "best of breed" philosophy with "fewer tools, less pain."
- Cost matters more than architecture diagrams. Teams that spent $40K-$120K/month on a full MDS stack are cutting bills by 60-80% with lakehouse architectures and embedded analytics.
- The post-modern data stack is boring, consolidated, and AI-native. And that is exactly what production data teams need.
I Built the Cathedral. Then I Watched It Burn.
In 2021, I architected what I thought was the perfect data platform for a Series B fintech company. Fivetran for ingestion (47 connectors). Snowflake for warehousing. dbt Cloud for transformations. Looker for BI. Census for reverse ETL. Monte Carlo for observability. Great Expectations for data quality. Airflow for orchestration. Eight tools, eight vendor relationships, eight billing dashboards.
The architecture diagram looked beautiful. It won me compliments at a data engineering meetup. And for about six months, it worked brilliantly. Then reality set in.
Snowflake costs tripled when the marketing team discovered they could run ad-hoc queries. Fivetran's pricing model meant every new data source added thousands to the monthly bill. dbt Cloud's IDE was fine, but the team kept hitting edge cases that required dropping down to dbt-core anyway. Looker's LookML learning curve meant only two people in the company could actually build dashboards. And when something broke at 2 AM, debugging across eight different systems with eight different logging formats was a special kind of hell.
By 2024, I had ripped out four of those eight tools. By 2025, the architecture looked nothing like what the modern data stack evangelists had promised. And I was not alone.
What Was the Modern Data Stack, Really?
Let's define what we are eulogizing. The Modern Data Stack was a philosophy as much as a technology choice. Born around 2019-2020, it was built on a few core beliefs:
- Cloud-native everything. No on-premise servers. Fully managed SaaS. Scale to zero when idle, scale to infinity when needed.
- Best of breed over suite. Pick the best tool for each layer of the stack. Specialization wins. Monoliths lose.
- ELT over ETL. Load raw data first, transform in the warehouse later. Storage is cheap. Compute is elastic.
- SQL as the lingua franca. Analytics engineers write SQL (via dbt), not Python. Democratize data transformation.
- Separation of storage and compute. Pay for what you use. Snowflake and BigQuery proved this model.
The canonical stack looked something like this:
| Layer | Canonical Tool | Alternatives |
|---|---|---|
| Ingestion | Fivetran | Airbyte, Stitch, Hevo |
| Warehouse | Snowflake | BigQuery, Redshift, Databricks SQL |
| Transformation | dbt Cloud | Dataform, SQLMesh |
| BI / Analytics | Looker | Mode, Metabase, Lightdash, Preset |
| Reverse ETL | Census / Hightouch | Polytomic, RudderStack |
| Orchestration | Airflow / Dagster | Prefect, Mage |
| Data Quality | Monte Carlo | Great Expectations, Soda, Anomalo |
| Data Catalog | Atlan / Alation | DataHub, Amundsen, Select Star |
At peak hype in 2022, a fully loaded MDS stack could easily span 8-12 different vendors. VCs poured billions into the category. Every YC batch had three or four "the X for your data stack" startups. Conference talks were dominated by increasingly intricate architecture diagrams with more arrows than a battle plan.
It was glorious. And it was unsustainable.
Why the Modern Data Stack Failed
1. The Cost Explosion Was Real
This is the elephant in the room that nobody in vendor-sponsored conference talks wanted to address. A mid-market company running the canonical MDS stack was looking at:
| Tool | Monthly Cost (typical) |
|---|---|
| Snowflake | $15,000 - $60,000 |
| Fivetran | $5,000 - $20,000 |
| dbt Cloud (Team) | $2,500 - $8,000 |
| Looker | $5,000 - $15,000 |
| Monte Carlo | $5,000 - $15,000 |
| Census / Hightouch | $2,000 - $8,000 |
| Atlan / Catalog | $3,000 - $10,000 |
| Total | $37,500 - $136,000/month |
That is $450K to $1.6M per year before you pay a single data engineer's salary. For a Series A or B company with $20M in revenue, the data stack could consume 2-8% of total revenue. CFOs noticed. And when the interest rate environment shifted in 2023, "optimize cloud spend" became the top directive at nearly every company I consulted for.
Snowflake's consumption-based pricing, which seemed liberating at first ("only pay for what you use!"), turned into a nightmare when there was no natural ceiling on usage. I have personally seen Snowflake bills jump 4x in a single quarter because an analyst accidentally left a warehouse running, or because a dbt model went quadratic on a join.
2. Integration Debt Replaced Technical Debt
The "best of breed" philosophy sounds great in a vendor pitch. In practice, it means you are now the systems integrator. Every tool has its own authentication model, its own API versioning, its own concept of a "connection." When Fivetran changes its schema handling behavior and it breaks your dbt models, whose problem is it? When Looker's caching layer serves stale data because dbt's run finished 30 seconds late, who debugs that?
I spent more time in 2023 debugging the seams between tools than I spent on actual data engineering. The connective tissue between eight SaaS products became the most fragile part of the entire platform.
"We replaced one complex system (the legacy on-prem warehouse) with eight simple systems that created a complex system anyway. Except now the complexity is distributed across eight vendors' roadmaps that we don't control."
-- A staff data engineer at a Fortune 500, anonymized, who summarized it better than I ever could.
3. Vendor Lock-In Wore a Different Disguise
The MDS was supposed to free us from vendor lock-in. "It's all SQL! It's all in your warehouse! You can swap any layer!" In theory. In practice, migrating off Snowflake once you have 200 dbt models using Snowflake-specific SQL functions, Looker dashboards with Snowflake connection strings baked in, and Fivetran schemas that assume Snowflake's type system is a 6-month project. I know because I quoted one.
The lock-in didn't go away. It just got distributed. Instead of being locked into one vendor, you were locked into a specific combination of vendors. Arguably worse, because at least Oracle would pick up the phone when something broke.
4. The Talent Problem
Finding an engineer who deeply understood Snowflake optimization, dbt best practices, Airflow DAG design, Fivetran connector quirks, AND Looker's LookML was like finding a unicorn. So most teams had specialists for each layer, which meant a 5-person "data team" had no redundancy. When the dbt person went on vacation, transformations stopped evolving. When the Airflow person quit, pipelines became unmaintainable. The MDS required a breadth of specialized knowledge that small and mid-size teams simply could not staff.
The 2024-2025 Startup Graveyard
The market correction was brutal. Here is an incomplete list of what happened to MDS-era companies:
- Stitch (acquired by Talend, then Qlik): Effectively shelved. The free tier that made it popular is gone.
- Mode Analytics: Acquired by ThoughtSpot in 2023 for a fraction of its peak valuation. Product direction uncertain.
- Dataform: Acquired by Google, folded into BigQuery. No longer an independent option.
- Select Star: Quietly wound down operations in 2024 after failing to raise a Series B.
- Polytomic: Pivoted hard from reverse ETL as the category compressed.
- Mage: Open-source orchestrator that struggled to find a monetization path. Team shrunk significantly.
- Anomalo: Narrowed focus, cut staff. Data observability as a standalone category proved difficult to sustain.
- Numerous dbt ecosystem startups: The "dbt for X" playbook stopped working when dbt Labs expanded its own surface area and customers consolidated vendors.
The pattern is clear: standalone tools serving a single layer of the MDS could not build sustainable businesses. The category collapse was not a failure of the technology. It was a failure of the market structure. When every layer is a $5K-$15K/month SaaS product, customers inevitably ask: "Can we get three of these layers from one vendor instead?"
The answer, increasingly, was yes.
What Replaced the Modern Data Stack
Pattern 1: Lakehouse Convergence
The single biggest shift has been the convergence of data lakes and data warehouses into the lakehouse architecture. Databricks and Snowflake both realized that their customers wanted one platform, not a "warehouse for BI" and a "lake for ML." By 2025, the distinction between a data lake and a data warehouse has become academic.
Databricks Unity Catalog is the most complete expression of this pattern. One platform handles:
- Ingestion (Databricks ingestion pipelines, Delta Live Tables)
- Storage (Delta Lake on your object storage)
- Transformation (SQL, Python, Spark, dbt-core)
- Governance (Unity Catalog, lineage, access control)
- ML and AI (MLflow, Model Serving, Feature Store)
- BI (Databricks SQL + Lakeview dashboards, or connect your own BI tool)
Is it perfect? No. Databricks is expensive at scale, and you are absolutely trading one form of lock-in for another. But the operational complexity drops dramatically. One authentication system. One audit log. One billing dashboard. One support contract. For a 5-person data team, that difference is existential.
Pattern 2: The DuckDB + dbt-core Revolution
On the opposite end of the spectrum, something remarkable happened at the small-to-medium end of the market. Teams realized they didn't need a cloud warehouse at all.
# dbt_project.yml -- the $0/month data stack
name: 'my_analytics'
version: '1.0.0'
profile: 'duckdb_local'
# profiles.yml
duckdb_local:
target: dev
outputs:
dev:
type: duckdb
path: 'analytics.duckdb'
threads: 8
DuckDB as a dbt backend, reading directly from Parquet files on S3, transformed with dbt-core (free, open source), served to a BI tool like Evidence or Metabase. Total infrastructure cost: the S3 storage bill. For many companies processing under 500 GB of data, this pattern delivers 90% of the functionality at 5% of the cost.
-- DuckDB reads directly from S3 Parquet files
-- No warehouse needed, no ingestion tool needed
CREATE TABLE raw_events AS
SELECT *
FROM read_parquet('s3://my-bucket/events/2026/01/*.parquet');
-- Run dbt models on top of this
-- Total cost: ~$3/month in S3 storage
I have moved three clients to this pattern in the last year. The performance is shockingly good. One client went from $14,000/month on Snowflake + Fivetran to $200/month on S3 + a single EC2 instance running DuckDB. The dashboards load faster. The dbt runs complete in 40 seconds instead of 4 minutes. The engineers are happier because they can run the entire stack locally.
Pattern 3: Single-Vendor Platforms
Google's BigQuery has quietly become one of the most compelling data platforms precisely because it never fully embraced the MDS unbundling philosophy. BigQuery includes native ingestion (BigQuery Data Transfer Service), transformation (Dataform, now built-in), ML (BigQuery ML), BI (Looker, now tightly integrated), and streaming (BigQuery's streaming insert API). It is a single platform where you can go from raw data to dashboard without leaving the console.
Snowflake, recognizing the threat, has been on an acquisition and feature-building spree: Snowpark for Python, Cortex for AI, Streamlit for apps, native data quality features, and a marketplace that attempts to replace standalone data catalog products. They are not selling you a warehouse anymore. They are selling you a platform.
The irony is thick: the MDS was a rebellion against monolithic platforms like Informatica and Oracle. Ten years later, we are building monolithic platforms again. But this time they run in the cloud and have better APIs. Progress, I suppose.
Pattern 4: AI-Native Data Tools
The final pattern reshaping the data stack is AI-native tooling. Not "we added a chatbot to our BI tool" AI. Genuinely different approaches to data work:
- Text-to-SQL that actually works. Tools where analysts describe what they want in English and get correct, optimized SQL. This is reducing the need for dedicated analytics engineers writing dbt models for every business question.
- Automated data pipeline generation. Describe your source and destination, and an AI agent builds the ingestion pipeline, including schema mapping and error handling. Fivetran's moat looks thinner when an LLM can write a custom connector in 20 minutes.
- Self-healing pipelines. Systems that detect schema changes upstream, automatically adjust transformations, and notify stakeholders. This category barely existed in 2023 and is becoming table stakes in 2026.
- Embedded analytics powered by LLMs. Instead of building dashboards, embed a natural language interface directly in your product. Users ask questions; the LLM queries the database and returns answers. No Looker, no LookML, no dashboard maintenance.
I am not suggesting AI replaces data engineers. But it is collapsing layers of the stack. When an LLM can generate a dbt model, test it, and deploy it with human approval, you need fewer specialized tools and fewer specialized people to operate those tools.
Cost Comparison: MDS vs. Consolidated
Let me make this concrete. Here is the same workload -- a mid-market SaaS company with 15 data sources, 500 GB of data, 50 business users, and a 4-person data team -- priced across three architectures:
| Component | Classic MDS (Monthly) | Databricks Unified (Monthly) | DuckDB + OSS (Monthly) |
|---|---|---|---|
| Ingestion | Fivetran: $8,000 | Delta Live Tables: included | Custom Python: $0 |
| Warehouse / Compute | Snowflake: $25,000 | Databricks SQL: $12,000 | EC2 + DuckDB: $400 |
| Transformation | dbt Cloud: $5,000 | dbt-core (free): $0 | dbt-core (free): $0 |
| BI | Looker: $10,000 | Lakeview + Streamlit: included | Evidence / Metabase: $0-$500 |
| Data Quality | Monte Carlo: $8,000 | Lakehouse Monitoring: included | dbt tests + Soda OSS: $0 |
| Orchestration | Managed Airflow: $2,000 | Databricks Workflows: included | Dagster OSS: $0 |
| Storage | Included in Snowflake | S3: $200 | S3: $200 |
| Total | $58,000 | $12,200 | $600-$1,100 |
Yes, the DuckDB + OSS column looks absurdly low. It is real. The trade-off is operational: you need engineers who can maintain open-source tools, handle upgrades, and debug without a vendor support line. But for a 4-person team that already has those skills, the math is overwhelming.
Even the Databricks option, which is by no means cheap, represents a 79% cost reduction from the classic MDS. And that reduction comes with less operational complexity, not more.
The Post-Modern Data Stack: What 2026 Actually Looks Like
After spending the last two years helping companies dismantle their MDS architectures, here is what I see winning in production:
For Startups and Small Teams (sub-500 GB)
Source APIs/DBs
|
v
Python ingestion scripts (scheduled via GitHub Actions or cron)
|
v
S3 / GCS (Parquet files)
|
v
DuckDB + dbt-core (transform and serve)
|
v
Evidence or Streamlit (dashboards)
|
v
LLM layer for ad-hoc questions
Total cost: under $500/month. Time to set up: one week. Maintainable by a single engineer.
For Mid-Market (500 GB - 10 TB)
Source APIs/DBs
|
v
Airbyte OSS or Databricks ingestion
|
v
Delta Lake / Iceberg on object storage
|
v
Databricks or Snowflake (single platform)
|
v
dbt-core for transformations
|
v
Embedded BI (Preset, Lightdash, or native)
|
v
Unity Catalog or Polaris for governance
Total cost: $8K-$25K/month. Maintained by a 3-5 person team. One primary vendor relationship.
For Enterprise (10 TB+)
Honestly? Enterprise stacks still look messy. But the trend is toward platform consolidation around Databricks or Snowflake as the core, with open table formats (Apache Iceberg is winning this race) providing an escape hatch against lock-in. The key difference from the MDS era is that enterprises are actively reducing their vendor count rather than expanding it.
My Controversial Takes for 2026
I will end with a few opinions that tend to generate heated responses at conferences. I stand by all of them.
Reverse ETL as a category is dead. Every warehouse now has native connectors to push data to SaaS tools. Census and Hightouch built real businesses, but the functionality is being absorbed into the platforms. It is the same pattern as monitoring being absorbed into cloud providers.
Data observability as a standalone product is unsustainable. Monte Carlo raised $200M+ but the long-term home for data observability is inside the warehouse or lakehouse platform. Databricks Lakehouse Monitoring and Snowflake's native data quality features are good enough for 80% of use cases.
dbt Cloud's moat is thinner than people think. dbt-core is the real product. The cloud offering adds CI/CD, scheduling, and a nice IDE, but GitHub Actions + VSCode + a cron job replicates 90% of that for free. SQLMesh is a credible open-source alternative that handles state management better. dbt Labs knows this, which is why they are pivoting toward the "semantic layer" and "dbt Mesh" -- but those features need to ship and mature before the window closes.
Apache Iceberg will be the default table format by end of 2026. Snowflake adopted it. Databricks is supporting it alongside Delta (and their recent moves suggest they see the writing on the wall). BigQuery supports it. Iceberg's open governance model and multi-engine compatibility make it the obvious convergence point. The table format wars are effectively over.
Most companies do not need a data warehouse at all. This is my spiciest take and I will not back down. If you have under 500 GB of data and fewer than 20 analysts, DuckDB on a single machine with dbt-core will outperform a cloud warehouse on speed, cost, and developer experience. The warehouse vendors have convinced an entire generation of data engineers that they need distributed cloud compute for workloads that fit comfortably in the RAM of a $200/month server.
Where Do We Go from Here?
The modern data stack was not a mistake. It was a necessary phase. It proved that cloud-native, SQL-first, modular data platforms were superior to the old Informatica-Oracle-Cognos monoliths. It created dbt, which genuinely transformed how we think about data transformation. It forced warehouse vendors to compete on price and developer experience. It trained a generation of analytics engineers.
But the "best of breed" philosophy was taken too far. We ended up with too many tools, too much integration overhead, and too little accountability when things broke. The market corrected, as markets do.
What replaced the modern data stack is not a single architecture. It is a set of principles:
- Consolidate ruthlessly. Every additional tool must justify its operational overhead, not just its feature set. Default to fewer vendors.
- Open formats over open source. Apache Iceberg and Parquet matter more than whether your warehouse is open-source. Data portability beats code portability.
- Right-size your compute. Stop defaulting to distributed cloud warehouses. Evaluate whether DuckDB, ClickHouse, or a single Postgres instance solves your problem first.
- AI as a collapsing function. Use LLMs to eliminate entire categories of tools, not just to add chat interfaces to existing ones.
- Total cost of ownership over sticker price. A "free" open-source tool that requires a full-time engineer to maintain costs $180K/year. A $5K/month managed service that just works costs $60K/year. Do the math for your situation.
The post-modern data stack is less exciting than its predecessor. There are fewer architecture diagrams to show off at meetups. The conference talks are about cost optimization and operational simplicity rather than elegant abstractions and category-creating vendor pitches. It is boring. And boring is exactly what data infrastructure should be.
Your data stack should be a utility, not a science project. Build it like plumbing: reliable, invisible, and as simple as the requirements allow. The modern data stack forgot that. The market remembered.




Leave a Comment