Over the past eighteen months I have deployed all three of these engines in production at different companies. One was a product analytics platform doing 40 billion events per day, another was an ad-tech bidding pipeline, and the third was a log analytics stack replacing Elasticsearch. Each time I had to answer the same question: which real-time analytics database should we bet on?
The answer is never simple, and anyone who tells you "just use ClickHouse" without asking about your ingestion patterns, query shapes, and operational capacity is doing you a disservice. In this article I am going to give you the honest comparison I wish I had when I started evaluating these systems. I will cover architecture, ingestion, query performance, operations, cloud offerings, SQL compatibility, and give you a concrete decision framework at the end.
Architecture Overview: Three Philosophies for the Same Problem
All three are columnar, distributed, MPP (massively parallel processing) OLAP database systems designed for sub-second analytical queries on large datasets. But they arrive at that goal from very different starting points.
ClickHouse
ClickHouse was born at Yandex for web analytics. It is a C++ monolith in the best sense: a single binary that handles storage, compute, and coordination. Data is stored in its own columnar format using the MergeTree family of table engines. Vectorized execution processes data in batches of columns rather than row-by-row, and the query planner is tightly coupled to the storage layer. This tight coupling is both its greatest strength (extreme scan performance) and its biggest limitation (less flexibility in decoupling storage from compute).
Apache Druid
Druid was created at Metamarkets (now part of Snap) for interactive analytics on event streams. It uses a microservices architecture with distinct process roles: Coordinator, Overlord, Historical, MiddleManager, Broker, and Router. Data is stored in immutable "segments" in deep storage (S3, HDFS, or local disk) and cached on Historical nodes. Druid was designed around streaming ingestion from Kafka from day one, with native support for real-time and batch indexing tasks. It pre-aggregates data at ingestion time using "rollup," which can dramatically reduce storage at the cost of losing raw row-level detail.
StarRocks
StarRocks (originally DorisDB, a fork of Apache Doris) is the newest of the three. It is a C++ system with a MySQL-compatible frontend (FE) and backend (BE) architecture. StarRocks is explicitly designed to be a "unified analytics database" that handles both real-time ingestion and complex ad-hoc queries including multi-table joins. It uses a cost-based optimizer (CBO), vectorized execution, and a columnar storage engine. If ClickHouse optimized for scan throughput and Druid optimized for streaming ingestion, StarRocks tries to optimize for query flexibility.
Architecture Comparison Table
| Feature | ClickHouse | Apache Druid | StarRocks |
|---|---|---|---|
| Language | C++ | Java | C++ (BE) / Java (FE) |
| Storage format | MergeTree (proprietary) | Segments (proprietary) | Columnar (proprietary) |
| Execution model | Vectorized, pipeline | Scatter-gather on segments | Vectorized, pipeline, CBO |
| Storage-compute separation | SharedMergeTree (Cloud) | Deep storage + cache | Shared-data (3.0+) |
| Process roles | 1 (clickhouse-server) | 6+ distinct services | 2 (FE + BE) |
| Native streaming ingestion | Kafka engine (limited) | Native Kafka Supervisor | Routine Load (Kafka) |
| Join support | Limited (hash, no CBO) | Very limited (lookup joins) | Full joins with CBO |
| MySQL protocol | Yes (read-only wire) | No (HTTP + Avatica) | Yes (full compatibility) |
Ingestion Patterns: Batch vs Streaming
This is where the choice often gets made for you. If your data arrives in Kafka and needs to be queryable within seconds, your evaluation looks very different than if you are loading hourly Parquet files from S3.
ClickHouse Ingestion
ClickHouse excels at batch ingestion. Loading Parquet files, CSV dumps, or doing INSERT INTO ... SELECT from external tables is fast and straightforward. For streaming, ClickHouse offers a Kafka table engine, but in my experience it is fragile. Consumer group rebalancing, exactly-once semantics, and offset management all require careful tuning. Most production ClickHouse deployments I have seen use an external consumer (a Python or Go service) that buffers events and does batch inserts every few seconds.
-- ClickHouse: Batch insert from S3 Parquet
INSERT INTO events
SELECT *
FROM s3(
'https://my-bucket.s3.amazonaws.com/events/2026-02-01/*.parquet',
'AWS_KEY', 'AWS_SECRET',
'Parquet'
);
-- ClickHouse: Kafka engine (streaming)
CREATE TABLE events_kafka (
event_id String,
user_id UInt64,
event_type String,
properties String,
timestamp DateTime64(3)
) ENGINE = Kafka
SETTINGS kafka_broker_list = 'broker1:9092,broker2:9092',
kafka_topic_list = 'user_events',
kafka_group_name = 'clickhouse_consumer',
kafka_format = 'JSONEachRow';
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM events_kafka;
Druid Ingestion
Druid's Kafka ingestion is production-grade. You define a supervisor spec and Druid handles consumer management, segment building, handoff from real-time to historical nodes, and exactly-once semantics. Data is typically queryable within 1-2 seconds of arriving in Kafka. The tradeoff is that Druid's ingestion spec is verbose JSON, and configuring dimensions, metrics, rollup granularity, and segment size takes a lot of trial and error.
Druid also supports "rollup" at ingestion time, which pre-aggregates rows that share the same dimension values and time granularity. For high-cardinality event streams, rollup can reduce storage by 10-100x. But you lose the ability to query individual raw events.
StarRocks Ingestion
StarRocks sits in the middle. Its "Routine Load" feature connects directly to Kafka and is more reliable than ClickHouse's Kafka engine in my testing. It also supports "Stream Load" (HTTP push) and batch loading from S3/HDFS. The MySQL protocol compatibility means you can also just use standard MySQL connectors for smaller loads. Latency from Kafka to queryable is typically 3-5 seconds, slightly behind Druid but well ahead of ClickHouse's Kafka engine.
-- StarRocks: Routine Load from Kafka
CREATE ROUTINE LOAD events_load ON events
COLUMNS (event_id, user_id, event_type, properties, event_time)
PROPERTIES (
"desired_concurrent_number" = "3",
"max_error_number" = "100",
"format" = "json",
"jsonpaths" = "[\"$.event_id\",\"$.user_id\",\"$.event_type\",\"$.properties\",\"$.timestamp\"]"
)
FROM KAFKA (
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "user_events",
"property.group.id" = "starrocks_consumer"
);
Query Performance: Where It Actually Matters
I ran benchmarks on all three systems using the same hardware (8-node cluster, 32 vCPU / 128 GB RAM per node, NVMe SSDs) and the same dataset: 30 days of product analytics events, 12 billion rows, ~2.4 TB uncompressed. Here are the results for five representative query patterns.
Benchmark Results
| Query Type | ClickHouse (s) | Druid (s) | StarRocks (s) |
|---|---|---|---|
| Simple count with filter | 0.08 | 0.12 | 0.11 |
| GROUP BY with 10K cardinality | 0.31 | 0.45 | 0.38 |
| Time-series rollup (hourly, 30 days) | 0.22 | 0.09 | 0.26 |
| Top-N with high cardinality (1M+) | 1.40 | 0.72 | 1.15 |
| Multi-table JOIN (events + users + products) | 4.80 | N/A (not supported) | 1.90 |
| Approximate distinct (HyperLogLog) | 0.05 | 0.04 | 0.07 |
| Complex funnel query (3-step, 7-day window) | 3.20* | N/A | 2.50 |
* ClickHouse funnel query uses windowFunnel() which is a specialized function. Druid lacks native funnel support. StarRocks uses window functions with CBO optimization.
A few things stand out. ClickHouse wins on simple scans and aggregations because its vectorized engine and tight storage coupling minimize overhead. Druid wins on time-series rollups and Top-N queries because those are literally the query shapes it was designed for; its segment index and bitmap structures are optimized for this pattern. StarRocks wins on joins and complex analytical queries because its cost-based optimizer can choose between hash join, broadcast join, and shuffle join strategies.
The Same Query in Three Dialects
Let me show you a common analytics query -- daily active users by platform for the last 7 days -- in each system.
-- ClickHouse
SELECT
toDate(event_time) AS day,
platform,
uniqExact(user_id) AS dau
FROM events
WHERE event_time >= now() - INTERVAL 7 DAY
AND event_type = 'session_start'
GROUP BY day, platform
ORDER BY day, platform;
-- Apache Druid (native SQL via Druid SQL layer)
SELECT
TIME_FLOOR(__time, 'P1D') AS day,
platform,
COUNT(DISTINCT user_id) AS dau
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
AND event_type = 'session_start'
GROUP BY 1, 2
ORDER BY 1, 2;
-- StarRocks
SELECT
DATE(event_time) AS day,
platform,
COUNT(DISTINCT user_id) AS dau
FROM events
WHERE event_time >= NOW() - INTERVAL 7 DAY
AND event_type = 'session_start'
GROUP BY day, platform
ORDER BY day, platform;
The StarRocks version is standard MySQL syntax. The ClickHouse version uses its own function set (toDate, uniqExact). Druid's SQL layer works for simple queries like this but breaks down for anything involving subqueries, CTEs, or complex joins -- you end up dropping into Druid's native JSON query API which has a steep learning curve.
Materialized Views and Pre-Aggregation
All three systems support some form of pre-computation, but the implementations differ significantly.
ClickHouse Materialized Views
ClickHouse materialized views are triggers that execute on insert. They are incredibly powerful: you can transform, aggregate, and route data as it arrives. The AggregatingMergeTree engine lets you maintain incrementally updated aggregates.
-- ClickHouse: Incremental daily stats materialized view
CREATE MATERIALIZED VIEW daily_stats_mv
ENGINE = AggregatingMergeTree()
ORDER BY (day, platform)
AS SELECT
toDate(event_time) AS day,
platform,
uniqState(user_id) AS users_state,
countState() AS events_state
FROM events
GROUP BY day, platform;
-- Query the materialized view
SELECT
day,
platform,
uniqMerge(users_state) AS unique_users,
countMerge(events_state) AS total_events
FROM daily_stats_mv
GROUP BY day, platform;
The -State / -Merge combinator pattern is unique to ClickHouse and extremely efficient. It takes time to learn but once you internalize it, you can build real-time dashboards that query pre-aggregated data in single-digit milliseconds.
Druid Rollup
Druid's equivalent is rollup at ingestion time. You define the query granularity and metrics in your ingestion spec, and Druid pre-aggregates during segment creation. This is more rigid than ClickHouse's approach -- you must decide your aggregation dimensions upfront -- but it is simpler to operate because there is no separate materialized view to manage.
StarRocks Materialized Views
StarRocks supports both synchronous materialized views (similar to ClickHouse, updated on insert) and asynchronous materialized views (refreshed on a schedule or triggered). The async MVs can reference multiple tables with joins, which neither ClickHouse nor Druid can do natively.
-- StarRocks: Async MV with join (refreshed every 5 minutes)
CREATE MATERIALIZED VIEW user_event_summary
REFRESH ASYNC EVERY (INTERVAL 5 MINUTE)
AS SELECT
u.country,
u.signup_source,
DATE(e.event_time) AS day,
COUNT(DISTINCT e.user_id) AS unique_users,
COUNT(*) AS total_events
FROM events e
JOIN users u ON e.user_id = u.user_id
WHERE e.event_time >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY u.country, u.signup_source, DATE(e.event_time);
Approximate Query Processing
When you are dealing with billions of rows, exact counts are expensive. All three systems support approximate algorithms, but the depth varies.
- ClickHouse: HyperLogLog (
uniq), quantile sketches (quantileTDigest), sampling (SAMPLE 0.1), and a rich library of approximate functions. This is where ClickHouse's breadth really shines. - Druid: HyperLogLog, Theta sketches (for set operations like intersection/union on distinct counts), quantile sketches via DataSketches integration. Druid's Theta sketch support is best-in-class for approximate set operations.
- StarRocks: HyperLogLog (
approx_count_distinct), percentile approximation. Functional but less comprehensive than the other two.
If your workload heavily depends on approximate distinct counts with set operations (for example, computing the intersection of two user segments), Druid with Theta sketches is the clear winner.
Operational Complexity: The Hidden Cost
This is where I have the strongest opinions, because this is where I have lost the most sleep.
ClickHouse Operations
Self-hosted ClickHouse is deceptively simple to get started. A single binary, a config file, and you are running. But scaling a ClickHouse cluster is where complexity creeps in. You need to manage ZooKeeper (or ClickHouse Keeper) for replication coordination, handle rebalancing shards manually, deal with merge pressure under heavy write loads, and monitor dozens of system tables. Schema migrations on large tables can take hours. The mutation system (ALTER TABLE UPDATE/DELETE) is asynchronous and can pile up.
My rule of thumb: if your team has fewer than two engineers who can dedicate time to ClickHouse operations, go with ClickHouse Cloud instead of self-hosting.
Druid Operations
Druid is the most operationally complex of the three. Six to eight distinct process types, each with its own JVM tuning, memory configuration, and failure modes. You need ZooKeeper, a metadata store (PostgreSQL or MySQL), and deep storage (S3 or HDFS). Segment management, compaction tasks, coordinator balancing, and MiddleManager capacity planning are all ongoing concerns. I have seen Druid clusters where the operations runbook was 40 pages long.
The flip side is that once a Druid cluster is properly tuned, it is remarkably stable. Historical nodes are stateless caches that can be killed and restarted at will. The architecture is designed for large-scale, long-running deployments.
StarRocks Operations
StarRocks is the simplest to operate. Two process types (FE and BE), no external dependency on ZooKeeper, built-in replication and rebalancing. Adding a node is ALTER SYSTEM ADD BACKEND and StarRocks handles data redistribution automatically. Schema changes are online. The MySQL protocol means every monitoring tool, BI connector, and MySQL client just works.
The downside is a smaller community and less battle-tested operational knowledge. When things go wrong with ClickHouse, Stack Overflow and the Altinity blog have your answer. With StarRocks, you might be reading Chinese-language GitHub issues.
Cloud Offerings in 2026
The managed cloud story has matured significantly for all three.
| Aspect | ClickHouse Cloud | Imply Polaris (Druid) | CelerData (StarRocks) |
|---|---|---|---|
| Provider | ClickHouse Inc. | Imply | CelerData |
| Cloud platforms | AWS, GCP, Azure | AWS, GCP | AWS, GCP |
| Separation of storage/compute | Yes (SharedMergeTree) | Yes (deep storage + elastic query) | Yes (shared-data mode) |
| Auto-scaling | Yes (vertical + horizontal) | Yes (query tier) | Yes (compute nodes) |
| Pricing model | Compute + storage + I/O | Compute + storage + ingestion | Compute + storage |
| Free tier | Yes (limited) | Trial only | Trial only |
| Maturity | GA since 2022, very mature | GA since 2023, solid | GA since 2024, growing |
ClickHouse Cloud is the most mature and feature-complete. It handles the SharedMergeTree engine transparently, auto-scales, and has solid observability built in. Imply Polaris removes most of Druid's operational pain and adds a SQL-first interface on top. CelerData is the newest but benefits from StarRocks already being simpler to operate.
SQL Compatibility: A Practical Concern
If your analysts and data scientists write SQL (and they do), SQL compatibility determines adoption speed.
- StarRocks: Full MySQL protocol and syntax. Your existing MySQL queries, BI tools (Tableau, Looker, Metabase, Superset), and JDBC/ODBC connectors work out of the box. CTEs, window functions, subqueries, correlated subqueries -- all supported with the CBO.
- ClickHouse: Its own SQL dialect with MySQL wire protocol for reads. Most standard SQL works, but you will hit edge cases: no correlated subqueries (until recently), different JOIN semantics (default is hash join with right table in memory), non-standard type system. ClickHouse added
EXPLAIN PLANand improved JOIN handling in recent versions, but it still feels different from standard SQL. - Druid: SQL layer translates to native JSON queries. Simple SELECT/GROUP BY/ORDER BY works well. Anything beyond that -- CTEs, multiple levels of subqueries, non-equi joins, HAVING on complex expressions -- may fail or produce incorrect results. For complex analytics, you often need to drop into the native query language.
Decision Framework: Picking the Right Engine
After running all three in production, here is my decision framework. The right choice depends on your specific workload pattern, team expertise, and operational appetite.
Choose ClickHouse if:
- Your primary workload is log analytics or observability. ClickHouse's scan speed on wide, append-only tables is unmatched. There is a reason Cloudflare, Uber, and dozens of observability companies built on it.
- You do batch/micro-batch ingestion. If your data arrives in files or you can buffer streaming data into batches, ClickHouse's insert path is the fastest.
- You need the broadest ecosystem. ClickHouse has the largest community, the most integrations (Grafana, dbt, Airbyte, Vector, Fluent Bit), and the most operational knowledge available.
- Your queries are mostly single-table scans and aggregations. If you do not need complex multi-table joins, ClickHouse's simpler query planner is not a limitation.
Choose Apache Druid if:
- You need sub-second ingestion-to-query latency from Kafka. Druid's streaming ingestion is the most mature and battle-tested for high-throughput Kafka pipelines.
- Your query pattern is time-series + Top-N + rollups. Druid's segment architecture and inverted indexes are purpose-built for these shapes. If your dashboards are mostly "show me the top 10 X by Y over the last Z hours," Druid will be the fastest.
- You can pre-define your query dimensions at ingestion time. Rollup gives you massive storage savings but requires upfront schema design.
- You are in ad-tech or real-time bidding. Druid's combination of streaming ingestion, pre-aggregation, and approximate set operations is almost purpose-built for this domain.
Choose StarRocks if:
- You need complex joins and ad-hoc analytical queries. If your analysts need to join event data with dimension tables, run funnel analyses, or write complex window function queries, StarRocks's CBO makes these practical at scale.
- You want the simplest operations. If you are a small team that cannot dedicate engineers to database operations, StarRocks has the lowest operational burden.
- MySQL compatibility matters. If your existing tooling is built around MySQL, StarRocks is a drop-in analytical backend.
- You are building a product analytics or BI backend. StarRocks is increasingly used as the analytical engine behind user-facing analytics features where query flexibility is more important than raw scan speed.
Quick Reference Matrix
| Use Case | Best Pick | Runner-Up |
|---|---|---|
| Log / observability analytics | ClickHouse | StarRocks |
| Product analytics (funnels, retention) | StarRocks | ClickHouse |
| Real-time dashboards from Kafka | Druid | StarRocks |
| Ad-tech / bidding analytics | Druid | ClickHouse |
| BI / ad-hoc SQL analytics | StarRocks | ClickHouse |
| IoT time-series | ClickHouse | Druid |
| Multi-tenant SaaS analytics | StarRocks | ClickHouse Cloud |
What I Actually Run Today
In my current stack, I run ClickHouse for our internal observability pipeline (logs, metrics, traces -- about 8 TB/day ingestion) and StarRocks as the analytical backend for our customer-facing product analytics dashboards. We evaluated Druid for the product analytics use case but the lack of join support and SQL limitations made it a non-starter for the ad-hoc query patterns our product team needed.
If I were starting fresh in 2026 and had to pick exactly one, I would pick StarRocks for most general-purpose real-time analytics workloads. It has the best balance of query power, SQL compatibility, and operational simplicity. But I would still reach for ClickHouse for high-volume append-only workloads like logs and metrics, and I would still consider Druid for pure streaming ingestion use cases where sub-second data freshness is non-negotiable.
The real-time analytics database space is consolidating fast. ClickHouse is adding better joins and a cost-based optimizer. StarRocks is improving scan performance. Druid is expanding its SQL layer. In two years these systems may converge significantly. But today, in early 2026, the differences are still large enough that picking the right one for your workload can save you months of engineering time and significant infrastructure cost.
Choose based on your query shapes and ingestion patterns, not on benchmark marketing. And if you can, run a proof-of-concept with your actual data and your actual queries before committing. The two weeks you spend on a PoC will save you six months of regret.
Leave a Comment