AI

MIT Develops Spatial Memory System for Robots Using Natural Language

DAAAM framework lets robots build detailed, queryable maps of large environments and answer questions like 'Where did I leave my keys?'

Omega Editorial· June 17, 2026· 3 min read

Robots gain human-like spatial memory

MIT researchers have developed a spatial memory framework that enables robots to build and query detailed mental models of large-scale environments using natural language. The system, called Describe Anything, Anywhere, Anytime, at Any Moment (DAAAM), could allow workers to send robotic assistants on errands with simple commands like "go grab the component we started assembling last night."

The breakthrough bridges two traditionally separate domains: computer vision models that richly describe objects but process limited data, and robotic mapping systems that create 3D maps but lack detailed object descriptions or require excessive computation.

How the system works

As a robot explores its environment, DAAAM attaches rich descriptions to objects it encounters. The system might note that a building is the Stata Center with specific architectural features, or that a bike rack holds five bicycles including a red one with a flat tire. This information gets stored in a 3D map organized spatially, grouping objects into regions.

The key innovation addresses a critical speed bottleneck. Existing techniques that capture rich descriptions typically take several seconds to annotate just a few objects—far too slow when a robot might encounter hundreds of objects during minutes of exploration.

DAAM solves this by aggregating nearby objects and using optimization to select key frames for annotation. These are images offering the clearest view of multiple objects, allowing the system to describe several items simultaneously. This approach speeds up computation tenfold compared to previous methods.

When retrieving information, DAAAM employs a large language model that calls specialized tools to quickly search the database. If asked about a sculpture near a campus building, the system can retrieve information based on the word "sculpture" or the building's location.

Why it matters

This advance represents a fundamental shift in how robots interact with their environments and humans. Current industrial robots lack the spatiotemporal memory that allows human workers to naturally recall where they left items or what happened in specific locations. DAAAM's ability to process natural language queries while maintaining real-time performance makes human-robot collaboration more practical in manufacturing, warehousing, and service environments. Beyond robotics, the framework could power augmented reality systems for maintenance workers performing anomaly detection or help commuters navigate complex spaces.

Performance and future development

In testing, DAAAM achieved 21 to 53 percent higher accuracy than competing methods, depending on query type. The system answers user queries in just seconds while operating at speeds suitable for real-time mobile robot deployment.

Lead author Nicolas Gorlo, an MIT graduate student, and his colleagues now plan to expand DAAAM to capture significant events in environments and incorporate confidence levels into system responses. "Ultimately, we want to have robots that can help with any sort of tasks," Gorlo said.

The research was presented at the Conference on Computer Vision and Pattern Recognition and first reported by MIT News. The work was funded in part by the U.S. Army Research Laboratory and the Office of Naval Research. Luca Carlone, associate professor in MIT's Department of Aeronautics and Astronautics and director of the MIT SPARK Laboratory, leads the project alongside Gorlo and Lukas Schmid, now a professor at the University of Technology Nuremberg.

#robotics#spatial memory#computer vision#natural language processing#mit research#human-robot interaction

This is an original analysis by the Omega editorial team. Source reporting: AI Watch.

Want systems like this working for your business?

Book a Call

More in AI

AI· 2 min read

Kling AI Seeks $2B Funding at $18B Valuation from General Atlantic

Kuaishou's video generation spinoff aims to secure major U.S. backing ahead of a planned IPO, after adjusting valuation expectations downward.

Via AI Watch · Jun 17, 2026
AI· 3 min read

NVIDIA XR AI Framework Brings Multimodal Agents to AR Glasses

Public beta enables developers to build spatially aware AI systems that perceive environments, access enterprise data, and assist workers in real time.

Via AI Watch · Jun 17, 2026
AI· 3 min read

Coherent Expands Texas InP Fab for AI Optical Interconnects

Sherman facility will scale production of indium phosphide lasers and optical components that enable data center networking at light speed.

Via AI Watch · Jun 17, 2026