University of Texas at San Antonio researcher develops framework to train artificial intelligence using failure data

A UTSA researcher won a $502,051 Navy grant to build an AI framework training drones and robots on failure data. The On-F model penalizes repeated mistakes to cut training costs.

For years, training artificial intelligence meant showing it how to succeed. Now a researcher at The University of Texas at San Antonio is building systems that learn from failure instead. Yongcan Cao, who holds the Mary Lou Clarke Endowed Distinguished Professorship in the Margie and Bill Klesse College of Engineering and Integrated Design, has developed a framework called On-Policy Reinforcement Learning from Failure, or On-F. The work, backed by a $502,051 grant from the Office of Naval Research, could lower the cost of training autonomous systems and make drones, robots, and vehicles more adaptive in unfamiliar environments.

The sparse reward problem

In reinforcement learning, an AI agent often learns through trial and error, receiving a "reward" only when it completes a task. For complex tasks, that reward may never come, leaving the agent with no data to improve. Engineers traditionally bridge this gap with expert demonstrations-pre-recorded, perfect examples of how to do the job. But that data is expensive, time-consuming to produce, and sometimes impossible to gather for new or dangerous environments.

Cao's On-F framework takes a different approach. It introduces a "discriminator" that acts as a judge. Instead of waiting for a success, the system constantly compares the AI's current actions to a database of known failures-data that Cao calls "cheap and abundant." If an action looks too similar to a past mistake, the discriminator issues a penalty, pushing the agent to try something new. This process, called reward densification, keeps the AI learning even when it hasn't succeeded yet.

"Humans take risks and learn from failure," Cao said. "Think about a baby learning to walk. They stand up, they fall down, and they learn from that fall. We are looking at how to give that same mechanism to AI."

"If you imagine a drone flying a specific flight path and failing to locate a target, you don't want the drone to retrace the same route or fly just a few feet to the left or right. You'd want the drone to try a significantly new approach, such as changing altitude or switching to a wide-angle view," Cao said.

Competitive results without expert data

The findings suggest the fail-forward approach is more than theory. In simulated environments where digital robots learned to stand or walk, models using On-F performed as well as or better than models trained on expensive expert demonstrations. The framework was validated using the Gymnasium simulation suite, specifically on tasks like PointMaze, where an agent must navigate a labyrinth to reach a target with very limited feedback.

"Failure alone can be used to learn desirable actions even if we don't have an expert model," Cao said. He added that mixing learning from failure with learning from demonstration produces "even better outcomes."

Where On-F could make a difference

The ability to train AI on failure data could reshape several industries. In manufacturing and robotics, it could lower the cost of training new systems because engineers no longer need to spend hundreds of hours creating perfect training scenarios. For autonomous vehicles and drones, the technology could lead to navigation systems that recognize the signature of a potential collision before it happens.

Cao's team at UT San Antonio is exploring how to refine these judging systems to handle more subjective failures. That could open the door for AI that assists in healthcare, logistics, and complex disaster-response scenarios where there is no manual for success, only a history of mistakes to avoid. The research with the Office of Naval Research continues through July 2026.

"As humans, we don't take information at face value; we think critically," Cao said. "From a robotics side, if a drone is pursuing a target and one path is risky, the system needs to analyze that risk based on what it knows could go wrong."

Why this matters for IT, development, and research professionals

For teams building or training AI systems, On-F points to a practical shift: failure logs, which are often discarded, can become a training asset. Instead of investing in costly expert demonstrations, engineers can use historical error data to guide models toward better decisions. The approach is particularly relevant for those working on autonomous systems, robotics, or any domain where real-world testing is risky or expensive. As the framework matures, it may offer a template for training models that adapt faster and require less hand-holding-a direct benefit for development cycles and research timelines.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)