AI

Teaching AI to Learn From Failure, Not Just Success

A University of Texas researcher is training autonomous systems by showing them what goes wrong instead of what goes right.

Omega Editorial· June 29, 2026· 4 min read

A new approach to machine learning

For decades, artificial intelligence training has followed a simple formula: show the system how experts succeed, then let it mimic those patterns. Google DeepMind's AlphaGo studied millions of professional Go matches. Robots learned navigation by watching motion-capture data of humans solving mazes. The underlying assumption was that AI needs a perfect model to copy.

Yongcan Cao at The University of Texas at San Antonio is inverting that paradigm. His research focuses on teaching autonomous systems by cataloging what goes wrong rather than what goes right—essentially giving machines the human ability to fail forward.

"Humans take risks and learn from failure," said Cao, who holds the Mary Lou Clarke Endowed Distinguished Professorship. "Think about a baby learning to walk. They stand up, they fall down, and they learn from that fall. We are looking at how to give that same mechanism to AI."

Why it matters

Training AI with expert demonstrations is expensive and time-intensive, requiring hundreds of hours to create perfect training scenarios. In dangerous or novel environments, such demonstrations may be impossible to obtain. A framework that learns from abundant failure data instead could dramatically reduce training costs across manufacturing, autonomous vehicles, and robotics—while producing systems that better recognize and avoid catastrophic errors before they occur.

Solving the sparse reward bottleneck

Reinforcement learning—where AI learns through trial and error—struggles with what researchers call the "sparse reward problem." In complex tasks, an agent only receives feedback when it successfully completes the entire objective. If the task is sufficiently difficult, the system might attempt millions of variations without ever achieving success, leaving it with no useful data to learn from.

Traditionally, engineers bridge this gap with expert demonstrations. But Cao's framework, called On-Policy Reinforcement Learning from Failure (On-F), takes a different path. According to his award-winning abstract presented at the 2025 International Conference on Autonomous Agents and Multiagent Systems, the system uses a "discriminator" that constantly compares the AI's actions against a database of known failures.

Through "reward densification," the AI receives continuous incremental feedback. When its current approach resembles a previous failure, the discriminator issues a penalty, pushing the agent to explore genuinely different strategies. "If you imagine a drone flying a specific flight path and failing to locate a target, you don't want the drone to retrace the same route or fly just a few feet to the left or right," Cao explained. "You'd want the drone to try a significantly new approach, such as changing altitude or switching to a wide-angle view."

Performance validates the concept

In simulated environments using the Gymnasium suite—including the PointMaze navigation challenge—AI models trained with the On-F framework matched or exceeded the performance of systems trained on expensive expert data. When the framework combined failure learning with traditional demonstrations, outcomes improved further.

The research is supported by a $502,051 grant from the Office of Naval Research, part of a multi-year project to make autonomous systems as efficient at decision-making as humans. The ONR funding runs through July 2026.

Industry implications

The ability to train AI on failure data could reshape multiple sectors. Manufacturing and robotics operations could slash training costs by eliminating the need for meticulously crafted success scenarios. Autonomous vehicles and drones could develop more robust obstacle-avoidance systems by recognizing collision patterns before impact.

Cao's team is now exploring how to refine these discriminator systems to handle more subjective failures, potentially enabling AI assistance in healthcare, logistics, and disaster response—domains where success manuals don't exist, only histories of what didn't work.

These details were first reported by The University of Texas at San Antonio.

#reinforcement learning#autonomous systems#machine learning#robotics#ai training#failure analysis

This is an original analysis by the Omega editorial team. Source reporting: AI Watch.

Want systems like this working for your business?

Book a Call

More in AI

AI· 2 min read

FutureHouse Spins Out Edison Scientific With $70M Funding

Top pharma companies offered $30 million deals for AI agents, prompting the nonprofit to launch a commercial venture.

Via AI Watch · Jun 29, 2026
AI· 2 min read

Baidu's AI chip unit Kunlunxin targets $50B Hong Kong IPO

The Chinese search giant's semiconductor subsidiary is asking investors to pre-purchase chips as part of its listing strategy.

Via AI Watch · Jun 29, 2026
AI· 2 min read

Firmus to Build 350MW Nvidia-Powered AI Data Center in Indonesia

Australian data center operator partners with chip giant to establish cloud computing campus on Batam island.

Via AI Watch · Jun 29, 2026