Serverless vs Provisioned vCore in Azure SQL: Performance, Cold-Start, and Cost Traps

Buying decision TL;DR:
Choose Serverless when usage is intermittent and you can tolerate warm-up lag after idle periods. Choose Provisioned vCore when usage is predictable/steady, when you can’t afford cold-starts, or when you need tight control over performance (and often lower unit price per hour). (Microsoft Learn)

Introduction — “Why is my API fast at noon and sluggish at 7am?”

You spin up a small SaaS. Nights are quiet; mid-morning traffic spikes. On Serverless, your database naps to save money—then takes a minute to wake up. On Provisioned, it never sleeps, but you pay whether anyone shows up or not. This guide gives you the practical trade-offs, measured cold-start expectations, and workload profiles so you can decide with eyes open.

The Concepts (and where the gotchas live)

What “Serverless” really means in Azure SQL

Auto-scales CPU (vCores) within a configurable min–max range. You also set an auto-pause delay; during pause you pay storage only. Billing is per-second for compute, and depends on vCores and memory used. (Microsoft Learn)
Available in General Purpose and Hyperscale (but auto-pause/resume is GP-only for now). Hyperscale Serverless auto-scales but does not currently auto-pause. (Microsoft Learn)
Cache reclamation: during low usage, Serverless may trim caches (e.g., buffer pool/RBPEX behavior varies), which can hurt first-touch latency after idle. Provisioned doesn’t do this. (Microsoft Learn)

What “Provisioned vCore” gives you

You pick a fixed vCore size and pay per hour regardless of activity; you get immediate responsiveness (no warm-up) and no cache trimming due to idleness. (Microsoft Learn)
Unit price per vCore-hour is lower than Serverless; the trade is that you pay even when idle. (Microsoft Learn)

Feature & Behavior Comparison (fast scan)

Dimension	Serverless	Provisioned vCore
Usage pattern	Intermittent, spiky, unpredictable	Steady or predictable; many 24×7 apps
Scaling	Automatic within min–max vCores	Manual (or scheduled)
Warm-up after idle	Lower responsiveness after idle; may auto-resume from pause	Immediate
Auto-pause	GP tier only; configurable 15–10,080 min; can disable	N/A
Billing	Per-second for compute; $0 compute while paused (storage billed)	Per-hour for provisioned compute
Cache behavior	Can reclaim caches on low usage → risk of cold-cache hits	No such reclamation due to idleness
Where available	GP (full), Hyperscale (autoscale only; no pause/resume yet)	All service tiers
Unit price	Higher per unit time, but burst-friendly	Lower per unit time, pay when idle

Sources: Microsoft Learn and Azure SQL team posts. (Microsoft Learn)

Hard Numbers You’ll Care About

Auto-pause delay options: 15 to 10,080 minutes (7 days); -1 disables auto-pause. Default is 60 minutes. (Microsoft Learn)
Serverless GP (Gen5) limits & memory: e.g., GP_S_Gen5_8 allows 1–8 vCores with ~3–24 GB memory; see full min–max tables per SLO. (Microsoft Learn)
Resume time (“cold-start”): expect ~1 minute to auto-resume after pause; commonly reported 1–2 minutes depending on size/cache/state. (This is typical, not guaranteed.) (Stack Overflow)
Billing reality: Serverless bills compute per second, and when usage dips below your configured minimum it still bills the minimum vCores & memory; while paused, compute cost is zero. (Microsoft Learn)

Workload Profiles (which one are you?)

Dev/Test, Internal Tools, Admin Portals
- Traffic: Sporadic bursts, long idle nights/weekends.
- Pick: Serverless with a sensible auto-pause.
- Risk: First request of the day may be slow; pre-warm before demos.
B2C SaaS with diurnal peaks
- Traffic: Predictable morning rush, sleepy nights.
- Pick: If strict SLOs, Provisioned (or Serverless with auto-pause disabled plus higher min vCores).
- Risk: With Serverless, cache trimming + resume lag can violate P95.
Event-driven batch/ETL windows
- Traffic: Heavy for 2–4 hours, idle otherwise.
- Pick: Serverless can shine; schedule runs right before auto-pause and avoid frequent stop/start.
- Risk: If jobs kick off after long idle, account for resume + cold cache.
24×7 APIs / Latency-sensitive
- Traffic: Constant.
- Pick: Provisioned (or Hyperscale Provisioned) to avoid warm-ups and control throughput headroom.
- Risk: Overpaying if you routinely provision far above need—right-size or use reserved capacity.

Measured Cold-Start: How to Test (and what we typically see)

You should measure in your environment. Typical observations in the community: ~1 minute from first connection attempt to readiness after auto-pause; small DBs often resume in 1–2 minutes. Don’t rely on this as an SLA; test. (Stack Overflow)

Quick PowerShell harness (run from build/ops agent) to measure resume time:

$sw = [System.Diagnostics.Stopwatch]::StartNew()
$ready = $false
while (-not $ready) {
  try {
    $conn = New-Object System.Data.SqlClient.SqlConnection "Server=tcp:<yourserver>.database.windows.net,1433;Database=<db>;User ID=<user>;Password=<pwd>;Encrypt=True"
    $conn.Open()
    $cmd = $conn.CreateCommand()
    $cmd.CommandText = "SELECT TOP (1) name FROM sys.objects"
    $cmd.ExecuteScalar() | Out-Null
    $ready = $true
    $conn.Close()
  } catch {
    Start-Sleep -Seconds 5
  }
}
$sw.Stop()
"Resume+first query latency: {0:N1} seconds" -f $sw.Elapsed.TotalSeconds

Tip: Run it twice—first hit measures resume + cold cache; second hit measures warm cache. Expect a large delta on Serverless after long idle due to cache reclamation. (Microsoft Learn)

Configuration Recipes

Create Serverless (General Purpose) with guardrails

az sql db create -g <rg> -s <server> -n <db> \
  -e GeneralPurpose --compute-model Serverless -f Gen5 \
  --min-capacity 0.5 -c 4 --auto-pause-delay 120

Sets 0.5–4 vCores and auto-pause 120 min. Tune min vCores up if you see excessive cold-cache penalties. (Microsoft Learn)

Disable auto-pause (still serverless autoscale, but no sleeping)

az sql db update -g <rg> -s <server> -n <db> \
  --edition GeneralPurpose --compute-model Serverless --family Gen5 \
  --min-capacity 1 --capacity 4 --auto-pause-delay -1

Keeps compute hot to avoid resume lag while still allowing autoscaling within your range. (Microsoft Learn)

Move between tiers (T-SQL)

-- Provisioned → Serverless (GP)
ALTER DATABASE MyDb
MODIFY ( SERVICE_OBJECTIVE = 'GP_S_Gen5_4' );

-- Serverless → Provisioned (GP 2 vCores example)
ALTER DATABASE MyDb
MODIFY ( SERVICE_OBJECTIVE = 'GP_Gen5_2' );

Defaults apply for min vCores and auto-pause when using T-SQL; adjust later via portal/CLI/PowerShell. (Microsoft Learn)

Cost Patterns (read before you swipe)

Provisioned is usually cheaper per vCore-hour but always on—great when average utilization is decent. (Microsoft Learn)
Serverless wins when idle dominates and you can accept resume lag. Remember: usage below your configured minimum still bills the minimum; full pause bills storage only. (Microsoft Learn)
If you’re hovering at high average CPU all day, Serverless can be more expensive than same-size Provisioned due to unit pricing and cache behaviors that nudge you to raise the minimum. (Guidance derived from Microsoft docs on billing mechanics and cache reclamation.) (Microsoft Learn)

Best Practices & Common Pitfalls

Do this

Classify your workload (see profiles) and test with real traffic traces.
On Serverless, set min vCores high enough to avoid thrash during business hours; allow pause only when it truly saves money.
Pre-warm before critical windows (e.g., synthetic ping every 10–15 min) if auto-pause is enabled.
Track P95/P99 latencies separately for “first after idle” vs steady-state—they are different worlds.
For Hyperscale Serverless, remember no auto-pause yet—plan accordingly. (TECHCOMMUNITY.MICROSOFT.COM)

Avoid this

Expecting Serverless to behave like always-hot Provisioned—it won’t.
Setting auto-pause to 15 minutes on a dashboard used “every so often”—you’ll pay in user pain.
Ignoring cache reclamation—long idle + big working set = surprise cold cache.
Moving to Serverless purely for “cost savings” when your DB is busy all day—you may pay more.

Conclusion & Takeaways

If latency SLOs matter at all times: pick Provisioned vCore (or Serverless with auto-pause disabled and a sensible minimum).
If you’re idle most of the day: Serverless is your friend—pause aggressively and accept a ~1 minute first-hit penalty. (Stack Overflow)
Measure before committing: run the PowerShell harness in your pipeline; compare steady-state cost vs idle savings using your actual usage curve.
Don’t underestimate caches: warm vs cold behavior is often the real cost of Serverless.

Internal link ideas (for your site)

“Azure SQL Hyperscale: When (Not) to Use It”
“Right-Sizing vCores: A 30-Minute Checklist”
“Designing Resilient Connection Retries for Cloud Databases”
“Synthetic Warm-Up Jobs: Pre-warming Databases Without Wasting Money”

Image prompt (for DALL·E / Midjourney)

“A clean, modern architecture diagram comparing Azure SQL Serverless and Provisioned vCore: show a database that auto-scales with min–max vCores and auto-pause on one side, and a fixed vCore always-on database on the other. Highlight resume lag vs instant response, per-second vs per-hour billing. Minimalistic, high contrast, Azure color accents, 3D isometric style.”

Data/ML Engineer Blog