ReasoningBank: Why Your Agent Should Learn from Its Mistakes

ReasoningBank: Why Your Agent Should Learn from Its Mistakes

4 0 0

Agents are everywhere now — browsing the web, fixing code, booking flights. But here’s the thing nobody talks about: they’re terrible at learning from experience.

You’d think after failing at the same task a hundred times, an agent would figure out what went wrong. Nope. Most of them just keep making the same mistakes, because they have no mechanism to remember and reason about what happened.

Google’s new ReasoningBank framework, presented at ICLR 2026, tries to fix this. And honestly, it’s one of the more sensible approaches I’ve seen.

The problem with current agent memory

Most agent memory systems today fall into two camps:

  • Trajectory memory — saving every single action taken, like a raw log file. Synapse does this.
  • Workflow memory — documenting only successful workflows, like Agent Workflow Memory.

Both have the same blind spot: they’re storing the what but not the why. A trajectory log tells you “clicked button X at time Y” but not “I clicked button X because the page was loading slowly and I assumed X would trigger the next step.” That’s the kind of reasoning pattern you actually want to transfer to new tasks.

And the second problem is even worse: by focusing only on successes, these systems ignore failures — which is where most of the learning actually happens. If you’ve ever debugged code or cooked a bad meal, you know that.

How ReasoningBank works

ReasoningBank stores what it calls “structured memory items.” Each one has three parts:

  • A title — short identifier for the strategy
  • A description — summary of what this memory covers
  • The content — actual reasoning steps, decision rationales, or operational insights

The key difference? These aren’t action logs. They’re distilled, generalizable lessons.

For example, instead of recording “clicked ‘Load More’ button”, ReasoningBank might store: “always verify the current page identifier first to avoid infinite scroll traps before attempting to load more results.” See the difference? That’s a reusable strategy, not a one-off action.

The workflow runs in a continuous loop:

  1. Before acting, the agent retrieves relevant memories from ReasoningBank
  2. It interacts with the environment
  3. An LLM-as-a-judge self-assesses the trajectory — noting both successes and failures
  4. It extracts insights from that assessment into new memory items
  5. Those get appended back into the ReasoningBank

What I find interesting is that the self-judgment doesn’t need to be perfect. The paper shows ReasoningBank is robust against judgment noise. That’s important because in practice, LLM-as-a-judge is never 100% accurate. But it doesn’t need to be — the system learns from aggregate patterns over time.

Why failures matter more than you think

The really clever part is that ReasoningBank actively mines failures for counterfactual signals. Most systems just throw away failed trajectories. ReasoningBank asks: “What went wrong here? What should the agent have done instead?”

This turns failures into “preventative lessons” — strategic guardrails that stop the agent from repeating the same mistake. Over time, the agent builds a kind of institutional knowledge about what not to do, which is often more valuable than knowing what to do.

I’ve seen this pattern in human expertise too. Senior engineers aren’t just good at writing code — they’re good at knowing which approaches will fail before trying them. ReasoningBank tries to encode that same intuition.

Does it actually work?

According to the paper, yes. On web browsing and software engineering benchmarks, ReasoningBank improved both success rates and efficiency — agents completed more tasks in fewer steps compared to baselines.

Those are the numbers Google wants you to see. What I want to point out is that this approach has been tried before in various forms — case-based reasoning, episodic memory, even old-school expert systems. The difference here is the scale and the use of LLMs to do the distillation automatically. That’s new.

But I have some reservations. The paper notes they just append new memories directly to the ReasoningBank. No deduplication, no conflict resolution, no forgetting mechanism. As the memory bank grows, retrieval quality could degrade. The authors acknowledge this is future work, but it’s a real concern for long-running agents.

Also, the LLM-as-a-judge step adds cost and latency. Every action requires a judgment call before it can be learned from. For high-throughput systems, that overhead might not be worth it.

What this means for the field

ReasoningBank is a step in the right direction. Agents need to learn from experience, and they need to learn from failures. The structured memory format is smarter than raw action logs, and the continuous loop means the agent gets better over time without human intervention.

I’d like to see this integrated into production agent systems — the ones booking meetings, writing code, or managing cloud infrastructure. Right now, most of those agents are stateless or use primitive memory. ReasoningBank could give them the ability to actually improve.

The code is on GitHub, so you can try it yourself. I’m curious to see how it handles long-running scenarios with thousands of memory items. That’s where the real test will be.

For now, it’s a solid research contribution with practical implications. And it’s refreshing to see a paper that doesn’t pretend its system is perfect — the authors are honest about the limitations. That alone deserves respect.

Comments (0)

Be the first to comment!