NVIDIA XR AI Framework Brings Multimodal Agents to AR Glasses
Public beta enables developers to build spatially aware AI systems that perceive environments, access enterprise data, and assist workers in real time.
NVIDIA has released a developer framework designed to embed AI agents directly into augmented reality glasses and extended reality devices, moving artificial intelligence assistance from screens into physical work environments.
The NVIDIA XR AI platform, now available in public beta, provides libraries and infrastructure for building multimodal agents that can process video, audio, depth, and sensor data from AR devices while connecting to enterprise knowledge systems and reasoning models. The framework addresses a technical challenge: creating AI systems that can perceive changing environments, interpret spatial context, retrieve relevant information, and respond with low enough latency to support rather than distract workers.
Why it matters
As AI capabilities expand beyond text generation, enterprises need practical ways to deploy agents in settings where workers can't stop to type queries or read lengthy responses. AR-based AI assistance represents a shift from software that requires dedicated attention to systems that operate within existing workflows—particularly valuable in manufacturing, healthcare, and laboratory settings where hands-free operation is essential.
Technical architecture
The XR AI framework integrates four core components. It ingests real-world signals from AR and XR devices, including video streams, audio, depth mapping, pose tracking, and sensor data. The platform connects these inputs to specialized tools including NVIDIA Metropolis for video search and summarization, and NVIDIA NeMo Retriever for enterprise knowledge retrieval and retrieval-augmented generation.
The system supports multiple AI models, including NVIDIA Nemotron reasoning models and NVIDIA Cosmos Reason, along with other compatible foundation models. Agent orchestration runs through NVIDIA NeMo Agent Toolkit, which handles tool use, reasoning workflows, and multi-agent coordination. The framework operates across NVIDIA's accelerated computing platforms—DGX Spark, DGX Station, and RTX PRO systems—enabling deployment in cloud, data center, and edge environments.
Early enterprise implementations
Siemens is exploring the framework in research contexts for factory maintenance, allowing engineers wearing lightweight glasses to query AI agents about programmable logic controller issues while receiving real-time guidance connected to industrial systems and digital twins.
Rana, an AutoBio company, has built its LabOS system on XR AI to provide hands-free guidance for stem cell therapy and gene-editing research at Stanford University School of Medicine and Princeton University. The system helps researchers identify samples and CRISPR gene editors, guides experimental steps, and captures structured records as humans, robots, and AI systems work together.
The University of Pittsburgh Medical Center's Surreality Lab demonstrated surgical applications, running on XR AI and DGX Station to provide context-aware assistance that surfaces information while preserving the surgeon's focus on the patient.
VITURE has integrated the framework into wearable interfaces for hands-free workplace assistance. Innoactive is using DGX Spark-powered systems to capture data during automotive design reviews and preserve context from immersive workflows. Atlantic Studios deployed XR AI to create an interactive exploration of the Titanic wreck site, where voice prompts guide users through the underwater model.
Developer access
The platform is now available to developers through NVIDIA's developer resources portal. The framework provides the foundation for building agents that combine environmental perception, enterprise knowledge access, and real-time reasoning—capabilities the company positions as requirements for AI systems that function as digital workers rather than conversational interfaces.
These details were first reported by NVIDIA in a company blog post announcing the public beta.
This is an original analysis by the Omega editorial team. Source reporting: AI Watch.
Want systems like this working for your business?
Book a Call

