XDOF Raises $70M to Build Robot Training Data Infrastructure

The startup aims to solve AI's physical-world bottleneck by collecting, cleaning, and annotating the manipulation data that robotics models need to learn.

Omega Editorial· June 17, 2026· 3 min read

Key takeaways

XDOF raised $70 million to build data collection infrastructure for robotics training, working with 20 customers including multiple frontier AI labs.
The startup is releasing ABC, a dataset with 130,000 robot manipulation trajectories—the largest high-quality robotics training collection available to academia.
Unlike language models that trained on existing internet text, robotics requires new physical interaction data that barely exists and must be purpose-built.
XDOF plans to operate warehouses with hundreds of robots and trained teleoperators to collect data at three tiers, from deployment-specific to general egocentric footage.
The company emerged from UC Berkeley research on GELLO, a low-cost teleoperation system that became influential for addressing robotics data bottlenecks.

A new startup is betting that the next major constraint in artificial intelligence won't be compute power or model architecture, but something more fundamental: the training data needed to teach robots how to interact with objects in the physical world.

XDOF emerged from stealth this week with $70 million in funding from Thrive Capital, Spark Capital, Andreessen Horowitz, Lux Capital, and WndrCo. The company is building data collection infrastructure, annotation systems, and teleoperation tools specifically designed for robotics applications—work that frontier AI labs are pursuing but struggling to execute at scale.

According to co-founder and CEO Philippe Wu, XDOF is already working with 20 customers, including several major AI research organizations, though he declined to name them. The timing aligns with OpenAI's recent announcement that it would restart its robotics program after shuttering it in 2021.

Why it matters

Large language models succeeded in part because they could train on vast quantities of existing text scraped from the internet. Robotics has no equivalent data reservoir. YouTube videos and footage from gig workers lack the precision and physical grounding needed to train manipulation models. This data scarcity creates an infrastructure opportunity that could determine which companies lead in physical AI—and XDOF is positioning itself as the picks-and-shovels provider for that race.

From academic research to commercial infrastructure

Wu encountered the data problem firsthand as a PhD student at UC Berkeley, where he focused on teaching robots to learn skills from large datasets. The challenge was circular: without substantial training data, researchers couldn't even begin building foundation models for robotics.

With co-founder and CTO Fred Shentu, Wu developed GELLO, a low-cost teleoperation system that allows human operators to control robotic arms and generate training trajectories. The project became influential in robotics research because it addressed a widespread bottleneck.

The founders launched XDOF in October 2024 with third co-founder and COO Nemo Jin, employing roughly 60 people. The company is partnering with UC Berkeley's AI Research lab to release what it describes as the largest collection of high-quality robot training data ever assembled. The dataset, called ABC, contains 130,000 trajectories of robot manipulation data, 300 hours of simulation, and 100 hours of evaluations. Researchers have already used it to train robots on benchmark tasks including folding T-shirts, flattening boxes, and loading AirPods into cases.

A three-tier data strategy

XDOF plans to operate across three levels of data collection. The highest-value tier involves teleoperation data gathered on the specific robot being deployed in production. The second tier uses teleoperated robots like GELLO to collect more general manipulation data. The third tier captures "egocentric" data from humans performing everyday tasks, for which XDOF is developing its own wearable sensors.

Hardware design matters significantly in this work. Camera selection affects data quality, which in turn influences how well hand-tracking algorithms perform. Physical parameters must be carefully calibrated, and operators require proper training.

The company intends to hire and train large teams of teleoperators and data collectors globally—a labor-intensive model that raises questions about why major labs aren't handling this work internally. Wu's answer is operational: the infrastructure requires warehouses spanning hundreds of thousands of square feet, hundreds of robots, ongoing maintenance, calibration, and trained personnel. Most AI labs would prefer to outsource that complexity.

The company's name references "degrees of freedom," the robotics term for independent motions a system can perform. A human arm has seven degrees of freedom from shoulder to wrist; Figure.AI's latest humanoid robot has 30. The X represents unlimited scope.

These details were first reported by TechCrunch.

#robotics#training data#physical ai#teleoperation#machine learning infrastructure#xdof

This is an original analysis by the Omega editorial team. Source reporting: AI Watch.

Want systems like this working for your business?

Book a Call

XDOF Raises $70M to Build Robot Training Data Infrastructure

Why it matters

From academic research to commercial infrastructure

A three-tier data strategy

More in AI

Core Scientific Secures $14B AMD Deal, Doubles AI Capacity to 1.1 GW

U.S. Productivity Surge Driven by Capital Use, Not AI Adoption

Tech Giants to Spend $900B on AI Infrastructure in 2026