In a significant validation for open-source AI systems, NVIDIA's AI-Q research agent has taken the top spot on two major industry benchmarks: DeepResearch Bench and DeepResearch Bench II. The achievement demonstrates that a single, configurable architecture can produce state-of-the-art results, offering enterprises a transparent and adaptable alternative to closed systems.
AI-Q functions as an open blueprint for building agents that analyze enterprise and web data to deliver cited answers. Its core is a multi-agent system—an orchestrator, a planner, and a researcher—built using the NVIDIA NeMo Agent Toolkit and fine-tuned Nemotron 3 models. This design allows each component to use a different AI model and enables specialized sub-agents to gather evidence from various analytical angles.
The two benchmarks test different strengths. DeepResearch Bench evaluates the overall quality and narrative polish of a final report. DeepResearch Bench II uses over 70 specific checks to grade factual recall, analysis, and presentation. Leading on both indicates the system is proficient at both high-level synthesis and granular accuracy.
Key to this performance was fine-tuning a Nemotron 3 model on approximately 67,000 high-quality research trajectories. These training examples, filtered by a principle-based judge model, taught the AI multi-step search, synthesis, and citation. The architecture also incorporates custom software middleware to maintain reliability over long, complex research tasks, handling issues like tool-call limits and incomplete outputs.
For maximum quality, an optional ensemble layer runs multiple research pipelines in parallel and merges their findings, while a refiner step polishes the final report. The entire stack is open and configurable via YAML files, letting developers swap models and tools.
The results suggest that powerful, agentic AI research doesn't require a black box. NVIDIA will present further details on this evaluation-driven approach at GTC in San Jose the week of March 16, 2026.
Source: Hugging Face Blog