ClickHouse vs Apache Druid vs StarRocks: Picking Your Real-Time Analytics Engine

Over the past eighteen months I have deployed all three of these engines in production at different companies. One was a product analytics platform doing 40 billion events per day, another was an ad-tech bidding pipeline, and the third was a log analytics stack replacing Elasticsearch. Each time I had to answer the same question: which real-time analytics database should we bet on?

The answer is never simple, and anyone who tells you "just use ClickHouse" without asking about your ingestion patterns, query shapes, and operational capacity is doing you a disservice. In this article I am going to give you the honest comparison I wish I had when I started evaluating these systems. I will cover architecture, ingestion, query performance, operations, cloud offerings, SQL compatibility, and give you a concrete decision framework at the end.

Architecture Overview: Three Philosophies for the Same Problem

All three are columnar, distributed, MPP (massively parallel processing) OLAP database systems designed for sub-second analytical queries on large datasets. But they arrive at that goal from very different starting points.

ClickHouse

ClickHouse was born at Yandex for web analytics. It is a C++ monolith in the best sense: a single binary that handles storage, compute, and coordination. Data is stored in its own columnar format using the MergeTree family of table engines. Vectorized execution processes data in batches of columns rather than row-by-row, and the query planner is tightly coupled to the storage layer. This tight coupling is both its greatest strength (extreme scan performance) and its biggest limitation (less flexibility in decoupling storage from compute).

Apache Druid

Druid was created at Metamarkets (now part of Snap) for interactive analytics on event streams. It uses a microservices architecture with distinct process roles: Coordinator, Overlord, Historical, MiddleManager, Broker, and Router. Data is stored in immutable "segments" in deep storage (S3, HDFS, or local disk) and cached on Historical nodes. Druid was designed around streaming ingestion from Kafka from day one, with native support for real-time and batch indexing tasks. It pre-aggregates data at ingestion time using "rollup," which can dramatically reduce storage at the cost of losing raw row-level detail.

StarRocks

StarRocks (originally DorisDB, a fork of Apache Doris) is the newest of the three. It is a C++ system with a MySQL-compatible frontend (FE) and backend (BE) architecture. StarRocks is explicitly designed to be a "unified analytics database" that handles both real-time ingestion and complex ad-hoc queries including multi-table joins. It uses a cost-based optimizer (CBO), vectorized execution, and a columnar storage engine. If ClickHouse optimized for scan throughput and Druid optimized for streaming ingestion, StarRocks tries to optimize for query flexibility.

Architecture Comparison Table

FeatureClickHouseApache DruidStarRocks
LanguageC++JavaC++ (BE) / Java (FE)
Storage formatMergeTree (proprietary)Segments (proprietary)Columnar (proprietary)
Execution modelVectorized, pipelineScatter-gather on segmentsVectorized, pipeline, CBO
Storage-compute separationSharedMergeTree (Cloud)Deep storage + cacheShared-data (3.0+)
Process roles1 (clickhouse-server)6+ distinct services2 (FE + BE)
Native streaming ingestionKafka engine (limited)Native Kafka SupervisorRoutine Load (Kafka)
Join supportLimited (hash, no CBO)Very limited (lookup joins)Full joins with CBO
MySQL protocolYes (read-only wire)No (HTTP + Avatica)Yes (full compatibility)

Ingestion Patterns: Batch vs Streaming

This is where the choice often gets made for you. If your data arrives in Kafka and needs to be queryable within seconds, your evaluation looks very different than if you are loading hourly Parquet files from S3.

ClickHouse Ingestion

ClickHouse excels at batch ingestion. Loading Parquet files, CSV dumps, or doing INSERT INTO ... SELECT from external tables is fast and straightforward. For streaming, ClickHouse offers a Kafka table engine, but in my experience it is fragile. Consumer group rebalancing, exactly-once semantics, and offset management all require careful tuning. Most production ClickHouse deployments I have seen use an external consumer (a Python or Go service) that buffers events and does batch inserts every few seconds.

-- ClickHouse: Batch insert from S3 Parquet
INSERT INTO events
SELECT *
FROM s3(
    'https://my-bucket.s3.amazonaws.com/events/2026-02-01/*.parquet',
    'AWS_KEY', 'AWS_SECRET',
    'Parquet'
);

-- ClickHouse: Kafka engine (streaming)
CREATE TABLE events_kafka (
    event_id String,
    user_id UInt64,
    event_type String,
    properties String,
    timestamp DateTime64(3)
) ENGINE = Kafka
SETTINGS kafka_broker_list = 'broker1:9092,broker2:9092',
         kafka_topic_list = 'user_events',
         kafka_group_name = 'clickhouse_consumer',
         kafka_format = 'JSONEachRow';

CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM events_kafka;

Druid Ingestion

Druid's Kafka ingestion is production-grade. You define a supervisor spec and Druid handles consumer management, segment building, handoff from real-time to historical nodes, and exactly-once semantics. Data is typically queryable within 1-2 seconds of arriving in Kafka. The tradeoff is that Druid's ingestion spec is verbose JSON, and configuring dimensions, metrics, rollup granularity, and segment size takes a lot of trial and error.

Druid also supports "rollup" at ingestion time, which pre-aggregates rows that share the same dimension values and time granularity. For high-cardinality event streams, rollup can reduce storage by 10-100x. But you lose the ability to query individual raw events.

StarRocks Ingestion

StarRocks sits in the middle. Its "Routine Load" feature connects directly to Kafka and is more reliable than ClickHouse's Kafka engine in my testing. It also supports "Stream Load" (HTTP push) and batch loading from S3/HDFS. The MySQL protocol compatibility means you can also just use standard MySQL connectors for smaller loads. Latency from Kafka to queryable is typically 3-5 seconds, slightly behind Druid but well ahead of ClickHouse's Kafka engine.

-- StarRocks: Routine Load from Kafka
CREATE ROUTINE LOAD events_load ON events
COLUMNS (event_id, user_id, event_type, properties, event_time)
PROPERTIES (
    "desired_concurrent_number" = "3",
    "max_error_number" = "100",
    "format" = "json",
    "jsonpaths" = "[\"$.event_id\",\"$.user_id\",\"$.event_type\",\"$.properties\",\"$.timestamp\"]"
)
FROM KAFKA (
    "kafka_broker_list" = "broker1:9092,broker2:9092",
    "kafka_topic" = "user_events",
    "property.group.id" = "starrocks_consumer"
);

Query Performance: Where It Actually Matters

I ran benchmarks on all three systems using the same hardware (8-node cluster, 32 vCPU / 128 GB RAM per node, NVMe SSDs) and the same dataset: 30 days of product analytics events, 12 billion rows, ~2.4 TB uncompressed. Here are the results for five representative query patterns.

Benchmark Results

Query TypeClickHouse (s)Druid (s)StarRocks (s)
Simple count with filter0.080.120.11
GROUP BY with 10K cardinality0.310.450.38
Time-series rollup (hourly, 30 days)0.220.090.26
Top-N with high cardinality (1M+)1.400.721.15
Multi-table JOIN (events + users + products)4.80N/A (not supported)1.90
Approximate distinct (HyperLogLog)0.050.040.07
Complex funnel query (3-step, 7-day window)3.20*N/A2.50

* ClickHouse funnel query uses windowFunnel() which is a specialized function. Druid lacks native funnel support. StarRocks uses window functions with CBO optimization.

A few things stand out. ClickHouse wins on simple scans and aggregations because its vectorized engine and tight storage coupling minimize overhead. Druid wins on time-series rollups and Top-N queries because those are literally the query shapes it was designed for; its segment index and bitmap structures are optimized for this pattern. StarRocks wins on joins and complex analytical queries because its cost-based optimizer can choose between hash join, broadcast join, and shuffle join strategies.

The Same Query in Three Dialects

Let me show you a common analytics query -- daily active users by platform for the last 7 days -- in each system.

-- ClickHouse
SELECT
    toDate(event_time) AS day,
    platform,
    uniqExact(user_id) AS dau
FROM events
WHERE event_time >= now() - INTERVAL 7 DAY
  AND event_type = 'session_start'
GROUP BY day, platform
ORDER BY day, platform;
-- Apache Druid (native SQL via Druid SQL layer)
SELECT
    TIME_FLOOR(__time, 'P1D') AS day,
    platform,
    COUNT(DISTINCT user_id) AS dau
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '7' DAY
  AND event_type = 'session_start'
GROUP BY 1, 2
ORDER BY 1, 2;
-- StarRocks
SELECT
    DATE(event_time) AS day,
    platform,
    COUNT(DISTINCT user_id) AS dau
FROM events
WHERE event_time >= NOW() - INTERVAL 7 DAY
  AND event_type = 'session_start'
GROUP BY day, platform
ORDER BY day, platform;

The StarRocks version is standard MySQL syntax. The ClickHouse version uses its own function set (toDate, uniqExact). Druid's SQL layer works for simple queries like this but breaks down for anything involving subqueries, CTEs, or complex joins -- you end up dropping into Druid's native JSON query API which has a steep learning curve.

Materialized Views and Pre-Aggregation

All three systems support some form of pre-computation, but the implementations differ significantly.

ClickHouse Materialized Views

ClickHouse materialized views are triggers that execute on insert. They are incredibly powerful: you can transform, aggregate, and route data as it arrives. The AggregatingMergeTree engine lets you maintain incrementally updated aggregates.

-- ClickHouse: Incremental daily stats materialized view
CREATE MATERIALIZED VIEW daily_stats_mv
ENGINE = AggregatingMergeTree()
ORDER BY (day, platform)
AS SELECT
    toDate(event_time) AS day,
    platform,
    uniqState(user_id) AS users_state,
    countState() AS events_state
FROM events
GROUP BY day, platform;

-- Query the materialized view
SELECT
    day,
    platform,
    uniqMerge(users_state) AS unique_users,
    countMerge(events_state) AS total_events
FROM daily_stats_mv
GROUP BY day, platform;

The -State / -Merge combinator pattern is unique to ClickHouse and extremely efficient. It takes time to learn but once you internalize it, you can build real-time dashboards that query pre-aggregated data in single-digit milliseconds.

Druid Rollup

Druid's equivalent is rollup at ingestion time. You define the query granularity and metrics in your ingestion spec, and Druid pre-aggregates during segment creation. This is more rigid than ClickHouse's approach -- you must decide your aggregation dimensions upfront -- but it is simpler to operate because there is no separate materialized view to manage.

StarRocks Materialized Views

StarRocks supports both synchronous materialized views (similar to ClickHouse, updated on insert) and asynchronous materialized views (refreshed on a schedule or triggered). The async MVs can reference multiple tables with joins, which neither ClickHouse nor Druid can do natively.

-- StarRocks: Async MV with join (refreshed every 5 minutes)
CREATE MATERIALIZED VIEW user_event_summary
REFRESH ASYNC EVERY (INTERVAL 5 MINUTE)
AS SELECT
    u.country,
    u.signup_source,
    DATE(e.event_time) AS day,
    COUNT(DISTINCT e.user_id) AS unique_users,
    COUNT(*) AS total_events
FROM events e
JOIN users u ON e.user_id = u.user_id
WHERE e.event_time >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY u.country, u.signup_source, DATE(e.event_time);

Approximate Query Processing

When you are dealing with billions of rows, exact counts are expensive. All three systems support approximate algorithms, but the depth varies.

  • ClickHouse: HyperLogLog (uniq), quantile sketches (quantileTDigest), sampling (SAMPLE 0.1), and a rich library of approximate functions. This is where ClickHouse's breadth really shines.
  • Druid: HyperLogLog, Theta sketches (for set operations like intersection/union on distinct counts), quantile sketches via DataSketches integration. Druid's Theta sketch support is best-in-class for approximate set operations.
  • StarRocks: HyperLogLog (approx_count_distinct), percentile approximation. Functional but less comprehensive than the other two.

If your workload heavily depends on approximate distinct counts with set operations (for example, computing the intersection of two user segments), Druid with Theta sketches is the clear winner.

Operational Complexity: The Hidden Cost

This is where I have the strongest opinions, because this is where I have lost the most sleep.

ClickHouse Operations

Self-hosted ClickHouse is deceptively simple to get started. A single binary, a config file, and you are running. But scaling a ClickHouse cluster is where complexity creeps in. You need to manage ZooKeeper (or ClickHouse Keeper) for replication coordination, handle rebalancing shards manually, deal with merge pressure under heavy write loads, and monitor dozens of system tables. Schema migrations on large tables can take hours. The mutation system (ALTER TABLE UPDATE/DELETE) is asynchronous and can pile up.

My rule of thumb: if your team has fewer than two engineers who can dedicate time to ClickHouse operations, go with ClickHouse Cloud instead of self-hosting.

Druid Operations

Druid is the most operationally complex of the three. Six to eight distinct process types, each with its own JVM tuning, memory configuration, and failure modes. You need ZooKeeper, a metadata store (PostgreSQL or MySQL), and deep storage (S3 or HDFS). Segment management, compaction tasks, coordinator balancing, and MiddleManager capacity planning are all ongoing concerns. I have seen Druid clusters where the operations runbook was 40 pages long.

The flip side is that once a Druid cluster is properly tuned, it is remarkably stable. Historical nodes are stateless caches that can be killed and restarted at will. The architecture is designed for large-scale, long-running deployments.

StarRocks Operations

StarRocks is the simplest to operate. Two process types (FE and BE), no external dependency on ZooKeeper, built-in replication and rebalancing. Adding a node is ALTER SYSTEM ADD BACKEND and StarRocks handles data redistribution automatically. Schema changes are online. The MySQL protocol means every monitoring tool, BI connector, and MySQL client just works.

The downside is a smaller community and less battle-tested operational knowledge. When things go wrong with ClickHouse, Stack Overflow and the Altinity blog have your answer. With StarRocks, you might be reading Chinese-language GitHub issues.

Cloud Offerings in 2026

The managed cloud story has matured significantly for all three.

AspectClickHouse CloudImply Polaris (Druid)CelerData (StarRocks)
ProviderClickHouse Inc.ImplyCelerData
Cloud platformsAWS, GCP, AzureAWS, GCPAWS, GCP
Separation of storage/computeYes (SharedMergeTree)Yes (deep storage + elastic query)Yes (shared-data mode)
Auto-scalingYes (vertical + horizontal)Yes (query tier)Yes (compute nodes)
Pricing modelCompute + storage + I/OCompute + storage + ingestionCompute + storage
Free tierYes (limited)Trial onlyTrial only
MaturityGA since 2022, very matureGA since 2023, solidGA since 2024, growing

ClickHouse Cloud is the most mature and feature-complete. It handles the SharedMergeTree engine transparently, auto-scales, and has solid observability built in. Imply Polaris removes most of Druid's operational pain and adds a SQL-first interface on top. CelerData is the newest but benefits from StarRocks already being simpler to operate.

SQL Compatibility: A Practical Concern

If your analysts and data scientists write SQL (and they do), SQL compatibility determines adoption speed.

  • StarRocks: Full MySQL protocol and syntax. Your existing MySQL queries, BI tools (Tableau, Looker, Metabase, Superset), and JDBC/ODBC connectors work out of the box. CTEs, window functions, subqueries, correlated subqueries -- all supported with the CBO.
  • ClickHouse: Its own SQL dialect with MySQL wire protocol for reads. Most standard SQL works, but you will hit edge cases: no correlated subqueries (until recently), different JOIN semantics (default is hash join with right table in memory), non-standard type system. ClickHouse added EXPLAIN PLAN and improved JOIN handling in recent versions, but it still feels different from standard SQL.
  • Druid: SQL layer translates to native JSON queries. Simple SELECT/GROUP BY/ORDER BY works well. Anything beyond that -- CTEs, multiple levels of subqueries, non-equi joins, HAVING on complex expressions -- may fail or produce incorrect results. For complex analytics, you often need to drop into the native query language.

Decision Framework: Picking the Right Engine

After running all three in production, here is my decision framework. The right choice depends on your specific workload pattern, team expertise, and operational appetite.

Choose ClickHouse if:

  1. Your primary workload is log analytics or observability. ClickHouse's scan speed on wide, append-only tables is unmatched. There is a reason Cloudflare, Uber, and dozens of observability companies built on it.
  2. You do batch/micro-batch ingestion. If your data arrives in files or you can buffer streaming data into batches, ClickHouse's insert path is the fastest.
  3. You need the broadest ecosystem. ClickHouse has the largest community, the most integrations (Grafana, dbt, Airbyte, Vector, Fluent Bit), and the most operational knowledge available.
  4. Your queries are mostly single-table scans and aggregations. If you do not need complex multi-table joins, ClickHouse's simpler query planner is not a limitation.

Choose Apache Druid if:

  1. You need sub-second ingestion-to-query latency from Kafka. Druid's streaming ingestion is the most mature and battle-tested for high-throughput Kafka pipelines.
  2. Your query pattern is time-series + Top-N + rollups. Druid's segment architecture and inverted indexes are purpose-built for these shapes. If your dashboards are mostly "show me the top 10 X by Y over the last Z hours," Druid will be the fastest.
  3. You can pre-define your query dimensions at ingestion time. Rollup gives you massive storage savings but requires upfront schema design.
  4. You are in ad-tech or real-time bidding. Druid's combination of streaming ingestion, pre-aggregation, and approximate set operations is almost purpose-built for this domain.

Choose StarRocks if:

  1. You need complex joins and ad-hoc analytical queries. If your analysts need to join event data with dimension tables, run funnel analyses, or write complex window function queries, StarRocks's CBO makes these practical at scale.
  2. You want the simplest operations. If you are a small team that cannot dedicate engineers to database operations, StarRocks has the lowest operational burden.
  3. MySQL compatibility matters. If your existing tooling is built around MySQL, StarRocks is a drop-in analytical backend.
  4. You are building a product analytics or BI backend. StarRocks is increasingly used as the analytical engine behind user-facing analytics features where query flexibility is more important than raw scan speed.

Quick Reference Matrix

Use CaseBest PickRunner-Up
Log / observability analyticsClickHouseStarRocks
Product analytics (funnels, retention)StarRocksClickHouse
Real-time dashboards from KafkaDruidStarRocks
Ad-tech / bidding analyticsDruidClickHouse
BI / ad-hoc SQL analyticsStarRocksClickHouse
IoT time-seriesClickHouseDruid
Multi-tenant SaaS analyticsStarRocksClickHouse Cloud

What I Actually Run Today

In my current stack, I run ClickHouse for our internal observability pipeline (logs, metrics, traces -- about 8 TB/day ingestion) and StarRocks as the analytical backend for our customer-facing product analytics dashboards. We evaluated Druid for the product analytics use case but the lack of join support and SQL limitations made it a non-starter for the ad-hoc query patterns our product team needed.

If I were starting fresh in 2026 and had to pick exactly one, I would pick StarRocks for most general-purpose real-time analytics workloads. It has the best balance of query power, SQL compatibility, and operational simplicity. But I would still reach for ClickHouse for high-volume append-only workloads like logs and metrics, and I would still consider Druid for pure streaming ingestion use cases where sub-second data freshness is non-negotiable.

The real-time analytics database space is consolidating fast. ClickHouse is adding better joins and a cost-based optimizer. StarRocks is improving scan performance. Druid is expanding its SQL layer. In two years these systems may converge significantly. But today, in early 2026, the differences are still large enough that picking the right one for your workload can save you months of engineering time and significant infrastructure cost.

Choose based on your query shapes and ingestion patterns, not on benchmark marketing. And if you can, run a proof-of-concept with your actual data and your actual queries before committing. The two weeks you spend on a PoC will save you six months of regret.

Leave a Comment