Parallel Execution Deep Dive: DOP math, skew handling, and monitoring PX servers like a pro
Hook: Your query “flies” in dev, crawls in prod, and half the PX slaves are idle while one is melting the CPU. That’s not “Oracle being Oracle.” It’s you missing three things: how DOP is actually calculated, how to kill data skew, and how to monitor PX like a surgeon. Let’s fix that.
Why this matters
- Parallel Execution (PX) is the biggest performance lever you have for large scans/joins.
- Get DOP wrong → you thrash CPUs, saturate interconnects, and trigger queuing.
- Ignore skew → one or two slaves do all the work while the rest sip coffee.
- Skip monitoring → you’ll tune blind and blame the wrong layer.
Concept & architecture (clear mental model)
The PX cast
- QC (Query Coordinator): Serial session orchestrating the parallel plan.
- PX Servers: Worker processes in sets (producer/consumer). A 2-set plan often means
DOPworkers produce rows → redistribute →DOPworkers consume. - Granules: Units of work (block/partition granules). Good granularity = balanced work.
- TQ (Table Queue): The “pipe” between producer and consumer sets. Distribution methods:
HASH,RANGE,BROADCAST,RANDOM.
Where bottlenecks occur
- CPU-bound: DOP > usable CPU → run-queue spikes, regressions.
- Interconnect-bound (RAC): heavy
HASHorBROADCASTacross instances. - I/O-bound: Too few granules or slow storage → PX servers stall.
- Skew-bound: One key or partition hoards rows → one server becomes the tail.
DOP math (the part people hand-wave and then suffer)
Think of Effective DOP as the minimum of several caps.
Key caps
| Cap | Formula / Setting | Notes |
|---|---|---|
| CPU cap | cpu_count * parallel_threads_per_cpu * instances | Upper bound per cluster. Practically lower due to other workloads. |
| Policy cap | parallel_degree_limit or AUTO algorithm | Hard or adaptive cap. |
| Servers availability | parallel_max_servers, parallel_servers_target | Target enforces queuing when reached (AUTO). |
| Statement/segment DOP | Object/statement DOP, hints, AUTO DOP | The DOP you asked for is only a proposal. |
| Resource Manager | Consumer group directives | Can downscale or queue PX. |
Effective DOP = min( requested_DOP, policy_cap, available_servers_cap, cpu_cap, resource_manager_cap )
Practical examples
Example 1: single-instance
cpu_count=32,parallel_threads_per_cpu=2→ CPU cap ≈ 64parallel_degree_limit=32→ Policy cap = 32parallel_servers_target=80, currently 64 busy → Avail cap ≈ 16- Statement requests
DOP 64→ Effective DOP = min(64, 32, 16, 64) = 16
Example 2: RAC (2 nodes)
- Each node
cpu_count=24,parallel_threads_per_cpu=2→ cluster cap ≈ 96 - Policy cap AUTO estimates 48 for the SQL.
- Target nearly reached; only 20 servers free → Effective DOP = 20
Hints and knobs that actually change results
- Request DOP:
/*+ parallel(t 16) parallel(j 16) */oralter session force parallel query 16; - Cap DOP:
alter session set parallel_degree_limit=16; - Adaptive DOP & queuing:
alter system set parallel_degree_policy=AUTO;alter system set parallel_min_time_threshold=10;— only parallelize long opsalter system set parallel_servers_target=NNN;— start queuing at this level
Brutal truth: If you rely only on hints, Production will still cap/queue you. Know the system caps.
Skew handling like a pro
How to detect skew fast
- Runtime TQ stats:
SELECT dfo_number, tq_id, server_type, MIN(num_rows) min_rows, MAX(num_rows) max_rows, ROUND(MAX(num_rows)/NULLIF(MIN(num_rows),0),2) skew_ratio FROM v$pq_tqstat GROUP BY dfo_number, tq_id, server_type ORDER BY 1,2,3;skew_ratio > 2is already suspicious; >5 is pain.
- SQL Monitor (per slave bytes/rows):
SELECT * FROM TABLE(DBMS_SQLTUNE.REPORT_SQL_MONITOR(sql_id => '&sql_id', type => 'ACTIVE')); - ASH samples per PX server: if one server owns most samples, you’re skewed.
Typical skew causes → surgical fixes
| Cause | Symptom | Fix |
|---|---|---|
Hot key (e.g., many rows with status='ACTIVE') | One PX server huge, others tiny | Salting: add pseudo-random suffix to the key before hash-redistribution. Example: `HASH(key |
| Uneven partitions | One partition >> others | Re-partition or subpartition heavy partitions; or range-hash composite. |
| Tiny # of granules | Early idle slaves | Increase granularity: lower PARALLEL_MIN_PARTITION, enable block granules, or split partitions. |
| Broadcast of huge table | Network/one set overloaded | Force HASH join with PQ_DISTRIBUTE hint; ensure join order makes smaller table broadcasted if needed. |
| Correlated predicates | Hash buckets uneven | Pre-aggregate by key first; or create histograms to guide distribution. |
| Top-N GROUP BY | A few groups dominate | Two-phase aggregation: partial aggregate per slave → rehash on salted key → final aggregate. |
Hints you’ll actually use
- Force distribution:
/*+ pq_distribute(t hash hash) */— both sides hash-redistribute/*+ pq_distribute(t broadcast none) */— broadcasttto avoid hash - Granule control:
/*+ qb_name(q) px_granule(block) */orpx_granule(partition) - Salt for group-by:
SELECT salted_key, SUM(val) FROM ( SELECT (key || ':' || MOD(ORA_HASH(key), 16)) AS salted_key, val FROM fact ) GROUP BY salted_key; -- Then roll up salted_key back to key if needed
Monitoring PX servers (no guesswork)
Quick health checklist
- Are we queued? Too many statements at
PX Coordinatorwait → lower requested DOP or increaseparallel_servers_target(if you own the box). - Is DOP what we think? Compare plan DOP vs actual in SQL Monitor.
- Is work balanced? Inspect
v$pq_tqstatmin/max rows and bytes. - Is interconnect hot? In RAC, check GC/Net waits per session and
gv$pq_tqstatcross-instance flow. - Are we CPU or I/O bound? Look at DB time split (DB CPU vs
cell single block read/direct path read).
Views and scripts you should keep
Active PX processes
SELECT * FROM v$px_process ORDER BY server_name;
SELECT status, COUNT(*) FROM v$px_process GROUP BY status;
PX sessions (who’s working on what)
SELECT s.sid, s.serial#, q.qcinst_id, q.server_set, q.req_degree, q.degree, s.sql_id
FROM v$px_session q JOIN v$session s ON q.sid = s.sid
ORDER BY q.server_set, q.req_degree;
TQ stats (distribution balance)
SELECT dfo_number, tq_id, server_type, MIN(bytes) min_b, MAX(bytes) max_b,
ROUND(MAX(bytes)/NULLIF(MIN(bytes),1),2) skew_ratio
FROM v$pq_tqstat
GROUP BY dfo_number, tq_id, server_type
ORDER BY 1,2,3;
System saturation (queuing threshold)
SELECT name, value FROM v$px_process_sysstat
WHERE name IN ('Servers In Use', 'Servers Highwater', 'Servers Exhausted');
SQL Monitor link (HTML report)
SELECT DBMS_SQLTUNE.REPORT_SQL_MONITOR(sql_id => '&sql_id', report_level => 'ALL')
FROM dual;
Tip: Save “golden” reports for good runs. Diff them against regressions to see what changed (DOP, distribution, waits).
Real example: fixing a 30→4 minute hash join
Symptoms
- Requested
DOP 32, but SQL Monitor shows Effective DOP = 12 (target reached). v$pq_tqstatshows max_bytes / min_bytes ≈ 7.9 on a hash TQ → skew.- A single
status='ACTIVE'chunk dominates one server.
Fix
- Add salting on the join key for the large side:
/*+ pq_distribute(big hash hash) */ SELECT /*+ parallel(32) */ ... FROM big_fact bf JOIN dim d ON ( (bf.key || ':' || MOD(ORA_HASH(bf.key), 16)) = d.salted_key )or apply salting only in redistribution subquery + roll-up. - Increase granules by switching to block granules on big table:
/*+ px_granule(block) */ - Lower requested DOP to 24 to avoid queuing thrash.
Result
- Skew ratio drops to <1.5.
- Interconnect bytes smooth.
- Elapsed time 4 min, CPU stable, no queue.
Best practices (and honest pitfalls)
Best practices
- Size DOP against usable CPU, not theoretical caps. Leave 20–30% headroom.
- Prefer HASH redistribution for large joins, BROADCAST only if the smaller input is truly small.
- Keep granules small enough to oversubscribe (more granules than slaves).
- Use AUTO DOP with
parallel_min_time_thresholdto avoid trashing short queries. - Capture SQL Monitor for every major batch path. Build a simple diff workflow.
Pitfalls
- Forcing DOP 64 everywhere: You’ll hit queuing and make latency worse.
- Broadcasting a “small” table that isn’t small in prod: Explodes network + memory.
- Ignoring histograms: Optimizer guesses uniformity, drives skew.
- RAC unaware design: Heavy hash redistributions across nodes can be slower than single-instance.
Conclusion & takeaways
- DOP math is deterministic. Compute your effective DOP from caps before you run.
- Skew is the silent killer. Detect with
v$pq_tqstatand fix with salting, better partitioning, and correct distribution. - Monitor like a pro. SQL Monitor + PX views will tell you exactly where the time and bytes go.
Call to action: Grab one painful SQL today, run the monitoring queries, compute its real DOP, and test a skew fix. Screenshot the before/after SQL Monitor—your future self will thank you.
Internal link ideas
- “Designing Keys to Avoid Skew in Hash Joins”
- “When to Broadcast vs Hash Redistribute in PX”
- “Adaptive DOP: Tuning
parallel_degree_policythe Right Way” - “RAC Interconnect Considerations for PX Queries”
Image prompt
“A clean, modern data architecture diagram showing an Oracle-style PX plan: a QC coordinating two PX server sets with a hash redistribution TQ between them, DOP annotated, and skew highlighted on one worker. Minimalist, high-contrast, isometric 3D, labeled arrows for producer→consumer, with a small inset showing DOP caps (CPU/policy/servers).”
Tags
#ParallelExecution #OracleDatabase #DOP #SQLTuning #DataEngineering #Performance #PX #Skew #RAC #QueryOptimization








Leave a Reply