AI

Blackwell Ultra NVL72 Runs 20x More AI Agents Per Megawatt

New AgentPerf benchmark reveals dramatic efficiency gains for agentic workloads that chain dozens of LLM calls together.

Omega Editorial· June 13, 2026· 3 min read

NVIDIA's GB300 NVL72 platform delivers 20 times more AI agents per megawatt than its previous-generation Hopper architecture, according to results from AgentPerf, the first industry benchmark designed specifically for agentic AI workloads.

The benchmark, developed by Artificial Analysis, measures performance on fundamentally different tasks than traditional AI inference tests. Where conversational AI involves a single large language model call and response, agentic AI breaks complex goals into dozens or hundreds of chained LLM calls, each passing expanding context to the next while executing tool calls for code compilation, database searches, and web browsing.

A New Class of Benchmark

Existing AI inference benchmarks measure how quickly a system responds to individual requests and how many simultaneous requests it can handle. Those metrics fail to capture the multiplicative complexity of agentic workloads, where chained calls, tool delays, and growing context windows stress computing infrastructure in entirely different ways.

AgentPerf addresses this gap by measuring real coding agent trajectories drawn from public repositories across more than 12 programming languages. Agents receive tasks, read files, write and edit code, execute commands, and iterate based on results—all while the benchmark tracks how many concurrent agentic tasks a platform can support while meeting defined performance thresholds for responsiveness and output token rate.

In initial results testing DeepSeek V4 Pro, a large mixture-of-experts model representing frontier-class capabilities, the GB300 NVL72 achieved the highest performance scores in the benchmark.

Full-Stack Architecture Advantage

The performance gains stem from coordinated design across NVIDIA's entire stack. The GB300 NVL72 connects 72 GPUs into a single rack-scale system, enabling large mixture-of-experts models to distribute execution efficiently at scale. CUDA kernels overlap communication and compute operations, absorbing coordination costs rather than adding latency. NVIDIA TensorRT LLM maintains efficiency as concurrent agent sessions scale by separating input processing from output generation for independent optimization.

Why it matters

For enterprises deploying AI agents at production scale, these metrics translate directly into infrastructure economics. The number of concurrent agentic tasks per accelerator and per megawatt determines how much productive work a given capital and power investment can deliver. As agents move from experimental deployments to production systems handling customer service, code generation, and business process automation, understanding true operational efficiency becomes critical for budgeting and capacity planning.

Production Deployments Underway

Inference providers including Baseten, DeepInfra, and Together AI are already serving agentic workloads on Blackwell infrastructure. Together AI powers real-time inference for Cursor, an AI-powered coding platform where agents debug issues and generate features while developers work. DeepInfra runs Pam.ai, an AI workforce platform for car dealerships that deploys agents to book service appointments and handle sales campaigns.

NVIDIA's Vera Rubin architecture is now in full production, bringing additional infrastructure capacity for agentic AI demands.

These details were first reported by NVIDIA.

#agentic ai#nvidia blackwell#ai benchmarks#agentperf#ai infrastructure#deepseek

This is an original analysis by the Omega editorial team. Source reporting: AI Watch.

Want systems like this working for your business?

Book a Call

More in AI

AI· 3 min read

AI Search Queries Now Use 10x More Energy Than Traditional Search

A columnist's call to treat AI features like household utilities—turn them off when you don't need them.

Via AI Watch · Jun 13, 2026
AI· 3 min read

NVIDIA GB300 Delivers 20x Gain on First Agentic AI Benchmark

New AA-AgentPerf standard measures how many concurrent AI coding agents inference systems can support under real-world conditions.

Via AI Watch · Jun 12, 2026
AI· 2 min read

Google Launches Information Agents for AI Ultra Subscribers

The new feature monitors web content continuously and delivers synthesized updates when conditions match user-defined criteria.

Via AI Watch · Jun 12, 2026