Serverless vs Provisioned vCore in Azure SQL: Performance, Cold-Start, and Cost Traps
Buying decision TL;DR:
Choose Serverless when usage is intermittent and you can tolerate warm-up lag after idle periods. Choose Provisioned vCore when usage is predictable/steady, when you can’t afford cold-starts, or when you need tight control over performance (and often lower unit price per hour). (Microsoft Learn)
Introduction — “Why is my API fast at noon and sluggish at 7am?”
You spin up a small SaaS. Nights are quiet; mid-morning traffic spikes. On Serverless, your database naps to save money—then takes a minute to wake up. On Provisioned, it never sleeps, but you pay whether anyone shows up or not. This guide gives you the practical trade-offs, measured cold-start expectations, and workload profiles so you can decide with eyes open.
The Concepts (and where the gotchas live)
What “Serverless” really means in Azure SQL
- Auto-scales CPU (vCores) within a configurable min–max range. You also set an auto-pause delay; during pause you pay storage only. Billing is per-second for compute, and depends on vCores and memory used. (Microsoft Learn)
- Available in General Purpose and Hyperscale (but auto-pause/resume is GP-only for now). Hyperscale Serverless auto-scales but does not currently auto-pause. (Microsoft Learn)
- Cache reclamation: during low usage, Serverless may trim caches (e.g., buffer pool/RBPEX behavior varies), which can hurt first-touch latency after idle. Provisioned doesn’t do this. (Microsoft Learn)
What “Provisioned vCore” gives you
- You pick a fixed vCore size and pay per hour regardless of activity; you get immediate responsiveness (no warm-up) and no cache trimming due to idleness. (Microsoft Learn)
- Unit price per vCore-hour is lower than Serverless; the trade is that you pay even when idle. (Microsoft Learn)
Feature & Behavior Comparison (fast scan)
| Dimension | Serverless | Provisioned vCore |
|---|---|---|
| Usage pattern | Intermittent, spiky, unpredictable | Steady or predictable; many 24×7 apps |
| Scaling | Automatic within min–max vCores | Manual (or scheduled) |
| Warm-up after idle | Lower responsiveness after idle; may auto-resume from pause | Immediate |
| Auto-pause | GP tier only; configurable 15–10,080 min; can disable | N/A |
| Billing | Per-second for compute; $0 compute while paused (storage billed) | Per-hour for provisioned compute |
| Cache behavior | Can reclaim caches on low usage → risk of cold-cache hits | No such reclamation due to idleness |
| Where available | GP (full), Hyperscale (autoscale only; no pause/resume yet) | All service tiers |
| Unit price | Higher per unit time, but burst-friendly | Lower per unit time, pay when idle |
Sources: Microsoft Learn and Azure SQL team posts. (Microsoft Learn)
Hard Numbers You’ll Care About
- Auto-pause delay options: 15 to 10,080 minutes (7 days); -1 disables auto-pause. Default is 60 minutes. (Microsoft Learn)
- Serverless GP (Gen5) limits & memory: e.g.,
GP_S_Gen5_8allows 1–8 vCores with ~3–24 GB memory; see full min–max tables per SLO. (Microsoft Learn) - Resume time (“cold-start”): expect ~1 minute to auto-resume after pause; commonly reported 1–2 minutes depending on size/cache/state. (This is typical, not guaranteed.) (Stack Overflow)
- Billing reality: Serverless bills compute per second, and when usage dips below your configured minimum it still bills the minimum vCores & memory; while paused, compute cost is zero. (Microsoft Learn)
Workload Profiles (which one are you?)
- Dev/Test, Internal Tools, Admin Portals
- Traffic: Sporadic bursts, long idle nights/weekends.
- Pick: Serverless with a sensible auto-pause.
- Risk: First request of the day may be slow; pre-warm before demos.
- B2C SaaS with diurnal peaks
- Traffic: Predictable morning rush, sleepy nights.
- Pick: If strict SLOs, Provisioned (or Serverless with auto-pause disabled plus higher min vCores).
- Risk: With Serverless, cache trimming + resume lag can violate P95.
- Event-driven batch/ETL windows
- Traffic: Heavy for 2–4 hours, idle otherwise.
- Pick: Serverless can shine; schedule runs right before auto-pause and avoid frequent stop/start.
- Risk: If jobs kick off after long idle, account for resume + cold cache.
- 24×7 APIs / Latency-sensitive
- Traffic: Constant.
- Pick: Provisioned (or Hyperscale Provisioned) to avoid warm-ups and control throughput headroom.
- Risk: Overpaying if you routinely provision far above need—right-size or use reserved capacity.
Measured Cold-Start: How to Test (and what we typically see)
You should measure in your environment. Typical observations in the community: ~1 minute from first connection attempt to readiness after auto-pause; small DBs often resume in 1–2 minutes. Don’t rely on this as an SLA; test. (Stack Overflow)
Quick PowerShell harness (run from build/ops agent) to measure resume time:
$sw = [System.Diagnostics.Stopwatch]::StartNew()
$ready = $false
while (-not $ready) {
try {
$conn = New-Object System.Data.SqlClient.SqlConnection "Server=tcp:<yourserver>.database.windows.net,1433;Database=<db>;User ID=<user>;Password=<pwd>;Encrypt=True"
$conn.Open()
$cmd = $conn.CreateCommand()
$cmd.CommandText = "SELECT TOP (1) name FROM sys.objects"
$cmd.ExecuteScalar() | Out-Null
$ready = $true
$conn.Close()
} catch {
Start-Sleep -Seconds 5
}
}
$sw.Stop()
"Resume+first query latency: {0:N1} seconds" -f $sw.Elapsed.TotalSeconds
Tip: Run it twice—first hit measures resume + cold cache; second hit measures warm cache. Expect a large delta on Serverless after long idle due to cache reclamation. (Microsoft Learn)
Configuration Recipes
Create Serverless (General Purpose) with guardrails
az sql db create -g <rg> -s <server> -n <db> \
-e GeneralPurpose --compute-model Serverless -f Gen5 \
--min-capacity 0.5 -c 4 --auto-pause-delay 120
- Sets 0.5–4 vCores and auto-pause 120 min. Tune min vCores up if you see excessive cold-cache penalties. (Microsoft Learn)
Disable auto-pause (still serverless autoscale, but no sleeping)
az sql db update -g <rg> -s <server> -n <db> \
--edition GeneralPurpose --compute-model Serverless --family Gen5 \
--min-capacity 1 --capacity 4 --auto-pause-delay -1
- Keeps compute hot to avoid resume lag while still allowing autoscaling within your range. (Microsoft Learn)
Move between tiers (T-SQL)
-- Provisioned → Serverless (GP)
ALTER DATABASE MyDb
MODIFY ( SERVICE_OBJECTIVE = 'GP_S_Gen5_4' );
-- Serverless → Provisioned (GP 2 vCores example)
ALTER DATABASE MyDb
MODIFY ( SERVICE_OBJECTIVE = 'GP_Gen5_2' );
Defaults apply for min vCores and auto-pause when using T-SQL; adjust later via portal/CLI/PowerShell. (Microsoft Learn)
Cost Patterns (read before you swipe)
- Provisioned is usually cheaper per vCore-hour but always on—great when average utilization is decent. (Microsoft Learn)
- Serverless wins when idle dominates and you can accept resume lag. Remember: usage below your configured minimum still bills the minimum; full pause bills storage only. (Microsoft Learn)
- If you’re hovering at high average CPU all day, Serverless can be more expensive than same-size Provisioned due to unit pricing and cache behaviors that nudge you to raise the minimum. (Guidance derived from Microsoft docs on billing mechanics and cache reclamation.) (Microsoft Learn)
Best Practices & Common Pitfalls
Do this
- Classify your workload (see profiles) and test with real traffic traces.
- On Serverless, set min vCores high enough to avoid thrash during business hours; allow pause only when it truly saves money.
- Pre-warm before critical windows (e.g., synthetic ping every 10–15 min) if auto-pause is enabled.
- Track P95/P99 latencies separately for “first after idle” vs steady-state—they are different worlds.
- For Hyperscale Serverless, remember no auto-pause yet—plan accordingly. (TECHCOMMUNITY.MICROSOFT.COM)
Avoid this
- Expecting Serverless to behave like always-hot Provisioned—it won’t.
- Setting auto-pause to 15 minutes on a dashboard used “every so often”—you’ll pay in user pain.
- Ignoring cache reclamation—long idle + big working set = surprise cold cache.
- Moving to Serverless purely for “cost savings” when your DB is busy all day—you may pay more.
Conclusion & Takeaways
- If latency SLOs matter at all times: pick Provisioned vCore (or Serverless with auto-pause disabled and a sensible minimum).
- If you’re idle most of the day: Serverless is your friend—pause aggressively and accept a ~1 minute first-hit penalty. (Stack Overflow)
- Measure before committing: run the PowerShell harness in your pipeline; compare steady-state cost vs idle savings using your actual usage curve.
- Don’t underestimate caches: warm vs cold behavior is often the real cost of Serverless.
Internal link ideas (for your site)
- “Azure SQL Hyperscale: When (Not) to Use It”
- “Right-Sizing vCores: A 30-Minute Checklist”
- “Designing Resilient Connection Retries for Cloud Databases”
- “Synthetic Warm-Up Jobs: Pre-warming Databases Without Wasting Money”
Image prompt (for DALL·E / Midjourney)
“A clean, modern architecture diagram comparing Azure SQL Serverless and Provisioned vCore: show a database that auto-scales with min–max vCores and auto-pause on one side, and a fixed vCore always-on database on the other. Highlight resume lag vs instant response, per-second vs per-hour billing. Minimalistic, high contrast, Azure color accents, 3D isometric style.”
Tags
#AzureSQL #Serverless #vCore #CloudDatabases #Performance #ColdStart #CostOptimization #DataEngineering #Hyperscale #SaaS
Key sources & further reading:
- Serverless compute tier overview (autoscaling, auto-pause, billing, cache reclamation, defaults). (Microsoft Learn)
- vCore model & Provisioned vs Serverless (compute billing semantics, hardware, tiers). (Microsoft Learn)
- Pricing model notes (unit price lower for Provisioned per unit time). (Microsoft Learn)
- Hyperscale Serverless GA (no auto-pause yet). (TECHCOMMUNITY.MICROSOFT.COM)
- Typical resume times from community reports (1–2 minutes). (Stack Overflow)








Leave a Reply