Azure Cosmos DB Performance Explained: Partition Keys, RUs, and Consistency (A Practical Guide)

Meta description (158 chars):
Master Cosmos DB performance: pick the right partition key, budget RUs, choose consistency, and avoid costly pitfalls—with clear examples and checklists.

Introduction: the on-call 2 a.m. scenario

Your product just went viral. Reads are fine, but writes throttle, hot partitions light up, and the CFO pings you about skyrocketing costs. Cosmos DB can deliver predictable performance—but only if you pick solid partition keys, budget RUs realistically, and set the right consistency level. This guide shows you how, with practical rules you can apply today.

Cosmos DB building blocks (what actually matters)

Partitioning: Data is split by a logical partition key (your choice). Azure manages physical partitions behind the scenes; you control the key and item distribution. (Microsoft Learn)
RUs (Request Units): The currency of work. Every operation consumes RUs; throughput is provisioned in RU/s (or billed on demand in serverless). A 1-KB point read ≈ 1 RU. (Microsoft Learn)
Consistency: Five levels—Strong, Bounded staleness, Session, Consistent prefix, Eventual—trading latency/availability for read correctness. Most apps default to Session. (Microsoft Learn)
Global distribution: Add regions for low-latency reads and resilience; enable multi-region writes for active-active apps. (Microsoft Learn)

Partition keys: how to choose one you won’t regret

Goal: Evenly spread reads/writes across many partition key values, with queries that mostly stay within a single key when possible.

Heuristics that work:

Prefer high-cardinality attributes with natural locality (e.g., userId, deviceId, tenantId).
Avoid time-bucket keys alone (e.g., yyyy-mm-dd)—they create daily hot spots.
Ensure your top query filters include the key (so queries can be scoped to one partition).
Watch item size: fewer, larger items per key can pressure a single partition.
For IoT/telemetry: pair deviceId with time in the document (not in the key) to keep writes spread and queries filterable. (Microsoft Learn)

Quick check: If you can answer “Yes” to both—

“Is my write load evenly distributed across many key values?”
“Do my top queries include the key?”—
you’re likely safe.

RU budgeting: from “vibes” to numbers

RUs map to CPU/IO/memory. You can run provisioned throughput (manual or autoscale) or serverless:

Capacity mode	Best for	How it bills	Pros	Watch-outs
Provisioned (manual RU/s)	Steady traffic	Fixed RU/s	Predictable performance, lowest unit price at scale	Over-provisioning during off-hours
Provisioned (autoscale)	Variable traffic	Scales 10%–100% of a max RU/s you set	Handles bursts, fewer 429s	Set max realistically; you pay up to it even if unused
Serverless	Spiky/dev/low-QPS	Per-request RU consumption	No capacity planning, great for intermittently used apps	Per-RU cost higher; no multi-region writes

Estimation workflow (practical):

Measure representative operations (point reads, write size, query shapes).
Use the Capacity Planner with sample doc sizes and ops per second to estimate RU/s and cost. Start with 20–30% headroom. (Microsoft Learn)
Autoscale early if you expect bursty traffic; later, consider reserved capacity or manual tuning. (Microsoft Learn)

Rule of thumb examples (approximate; validate with the planner):

1-KB point read ≈ 1 RU; 1-KB write ≈ 5+ RUs depending on indexing. Reduce index paths to cut write cost. (Microsoft Learn)

Consistency: choose it like a product decision (not a default)

Strong: linearizable reads, highest latency, single-region write with limited region pairs; use for critical configs or financial balances.
Bounded staleness: reads lag by time/updates; globally consistent ordering, good for multi-region reads that must be close to real-time.
Session (default): per-session monotonic reads; great for user-centric apps.
Consistent prefix: never out-of-order, but may be stale.
Eventual: lowest latency/cost; acceptable for counters, feeds, or non-critical displays. (Microsoft Learn)

Tactic: Keep the account default at Session; override per-request to Strong only for specific reads that truly require it. (Microsoft Learn)

Indexing: pay only for what you query

Cosmos DB indexes everything by default for NoSQL API—great for agility, expensive for heavy write paths.
Tune with an indexing policy: exclude large arrays/fields you never filter/sort; force composite indexes for common ORDER BY + filter combinations. (Microsoft Learn)

Example (trim the fat):

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {"path": "/userId/?"},
    {"path": "/ts/?"}
  ],
  "excludedPaths": [
    {"path": "/*"}  // exclude everything else
  ],
  "compositeIndexes": [
    [
      {"path": "/userId", "order": "ascending"},
      {"path": "/ts", "order": "descending"}
    ]
  ]
}

Real example: modeling an activity feed

Goal: Per-user timeline with low-latency writes, efficient reads of “my latest N items.”

Model

{
  "id": "evt_7f9...",
  "userId": "u_123",            // partition key
  "ts": "2025-11-20T14:05:03Z", // ISO timestamp
  "type": "like",
  "payload": {"postId":"p_88"}
}

Query (NoSQL ‘SQL’ language): “latest 20 for this user”

SELECT TOP 20 *
FROM c
WHERE c.userId = @uid
ORDER BY c.ts DESC

This stays within one logical partition (cheap, consistent RU cost). (Microsoft Learn)

SDK snippet (Node.js)

const { CosmosClient } = require("@azure/cosmos");
const client = new CosmosClient(process.env.COSMOS_CS);
const container = client.database("app").container("feed");
const { resources } = await container.items.query({
  query: "SELECT TOP 20 * FROM c WHERE c.userId = @uid ORDER BY c.ts DESC",
  parameters: [{ name: "@uid", value: "u_123" }]
}, { consistencyLevel: "Session" }).fetchAll();

Global distribution & multi-region writes: when you need it

If you serve users across continents or require write availability during region outages, enable multi-region writes and pick the lightest consistency that still meets correctness needs (often Session). Plan for conflict resolution (LWW or custom). (Microsoft Learn)

Real-time pipelines: use the Change Feed (it’s your durable event log)

Change feed gives you an ordered stream of inserts/updates you can consume to build projections, push to queues, or run ETL—no extra CDC infra. There are multiple modes (e.g., all versions & deletes vs latest-version) with different retention/metadata. (Microsoft Learn)

Analytics without ETL: analytical store, Synapse Link & Fabric mirroring

Enable the analytical store to run large, isolated analytics via Synapse/SQL Serverless/Spark while keeping OLTP hot path fast. Microsoft now recommends evaluating Fabric mirroring for NoSQL API, which offers improved analytical performance and open Delta Parquet access in OneLake. (Microsoft Learn)

Operational tips (best practices)

Pick the key first. Schema comes second; queries last. Revisit with workloads. (Microsoft Learn)
Use autoscale if you have bursty traffic; pair with index tuning to control write RU cost. (Microsoft Learn)
Scope queries to a partition (include the key) and page results; avoid cross-partition fan-out unless necessary. (Microsoft Learn)
Apply TTL for data that naturally expires (sessions, events) to reclaim storage automatically. (Microsoft Learn)
Monitor RU consumption & 429s; raise max RU/s or optimize queries/indexing when throttling appears. (Capacity planner + metrics) (Microsoft Learn)
Set Session as default; override to Strong only for reads that truly require it. (Microsoft Learn)

Common pitfalls (and how to avoid them)

Hot partition: Low-cardinality key (e.g., country) concentrates writes. Fix: switch to a per-user/device/tenant key; consider synthetic keys if needed. (Microsoft Learn)
Runaway RU costs: Indexing everything + large documents. Fix: exclude unused paths; keep docs lean; compress large arrays. (Microsoft Learn)
Cross-partition queries everywhere: Missing key in filters. Fix: redesign query patterns or add a materialized projection keyed by the access pattern. (Microsoft Learn)
Wrong capacity mode: Manual RU/s for spiky traffic or serverless for 24/7 heavy load. Fix: autoscale for bursty; manual with schedules for steady. (Microsoft Learn)

TL;DR & next steps

Choose a high-cardinality partition key that matches your access patterns.
Estimate RUs with the capacity planner; prefer autoscale if traffic is unpredictable.
Default to Session consistency; override per-request only when needed.
Tune indexing to cut RU cost; use TTL to trim storage.
For global apps, plan multi-region writes and conflict resolution.

Call to action: Sketch your top 3 queries, pick a partition key that scopes each query, and run your workload through the capacity planner before the next release.

Internal link ideas (official resources)

Consistency levels overview (why 5 levels, when to use each). (Microsoft Learn)
Partitioning & choosing a key (logical vs physical, IoT example). (Microsoft Learn)
Request Units & capacity planner (estimation workflow). (Microsoft Learn)
Autoscale vs manual vs serverless (how to choose and configure). (Microsoft Learn)
Indexing policies (include/exclude paths, composite indexes). (Microsoft Learn)
Multi-region writes (configuration & trade-offs). (Microsoft Learn)
Change feed modes & patterns (push/pull). (Microsoft Learn)
Analytical store, Synapse Link & Fabric mirroring. (Microsoft Learn)

Image prompt

“A clean, modern architecture diagram of Azure Cosmos DB showing logical partitions across multiple regions, RU throughput meters, and a decision slider for consistency levels—minimalistic, high contrast, isometric 3D, azure-blue accents.”

Data/ML Engineer Blog