ReasoningBank: Therapy for Machines That Keep Messing Up

Oct 21, 2025 • 5 min read • Technology, AI Impact, Research

Google just taught AI to remember and reflect. ReasoningBank gives machines a therapist, turning past failures into future wisdom. But when machines start building habits from their mistakes, what could go wrong?

ReasoningBank AI therapy concept - machines learning from their mistakes — When AI gets a therapist: Google’s ReasoningBank teaches machines to remember their failures and learn from them.

Google just taught its AI how to remember and worse, how to reflect.

They call it ReasoningBank, and at first glance, it sounds like the next predictable step in the AI hype cycle. Another shiny acronym, another “scaling breakthrough.” But if you actually read the paper, it’s not hype. It’s something far stranger. Google basically gave their models a therapist.

The Problem They’re Fixing

Every AI system today is cursed with amnesia.

It can process millions of words but forgets the moment the session ends. Each new problem starts from scratch, like a goldfish with a PhD.

Google’s new work changes that.

They built a framework called ReasoningBank, which lets an AI agent learn from its own history — the good, the bad, and the absolutely dumb.

It doesn’t just store data. It distills reasoning patterns from every past attempt, including the failed ones, and turns them into reusable “memory items” — kind of like notes to its future self.

When the model faces a new task, it pulls out the relevant lessons, acts on them, and when it’s done, writes back what it learned.

Successes become strategy. Failures become warning labels.

That’s the loop. Retrieval, action, reflection, repeat.

Over time, the agent self-evolves without retraining, without new data.

What MaTTS Adds to the Mix

Now, ReasoningBank by itself is interesting.

But Google didn’t stop there. They added a twist called MaTTS, short for Memory-aware Test-Time Scaling.

You’ve probably heard of test-time scaling, “run more samples, pick the best one.”

MaTTS takes that chaos and gives it purpose.

It spins up multiple attempts (parallel scaling) or self-refines a single one (sequential scaling), compares those experiences, and uses the contrast to build better memory.

Basically: fail more, learn faster.

It’s like teaching an AI to argue with its own past mistakes until it figures out why it was wrong.

The Results (The Receipts)

Here’s where it gets real.

Across WebArena, Mind2Web, and SWE-Bench-Verified — three serious agent benchmarks, ReasoningBank didn’t just edge out baselines. It demolished them.

On WebArena, success rates jumped by +8.3, +7.2, and +4.6 points, depending on the model backbone.
On SWE-Bench-Verified, Gemini-2.5-Flash climbed from 34.2 → 38.8 %, and Gemini-2.5-Pro from 54.0 → 57.4 %, while cutting the average steps per task by about 16 %.
On Mind2Web, it dominated cross-domain generalization, meaning it didn’t just memorize, it actually adapted.

And the wildest part?

When they fed it failure data — the kind every ML engineer throws away, it got better.

Success jumped from 46.5 → 49.7 % when failed attempts were included.

The machine literally learned from screwing up.

The Real Meaning: AI That Self-Reflects

Let’s be clear, this isn’t about more memory.

It’s about metacognition.

Google didn’t make the window bigger, they taught the model to think, “Hey, I’ve seen this kind of disaster before.”

Each “memory item” is like a reusable principle, a distilled reasoning unit that can guide the next decision.

It’s not saving logs, it’s saving lessons.

Think of it like this.

We’re watching AI transition from information processing to experience processing.

It’s not what it knows anymore, it’s how it learns from what it’s done.

The Catch: Memory Becomes Baggage

But here’s the part that makes me uneasy.

When you give a machine a past, you also give it baggage.

ReasoningBank doesn’t just record the wins, it archives the failures too.

That’s how it gets better, but that’s also how it can get stuck. If the memory bank itself picks up bad reasoning, bias, or half-truths, the system might just start repeating them faster, more confidently, and with fewer steps.

You’re not just building intelligence.

You’re building habit.

And if you’ve ever worked with humans, you know, habits are harder to debug than code.

Scaling Experience, Not Compute

Everyone else in the AI race is obsessed with scaling parameters or context windows.

Google just quietly shifted the axis.

They’re scaling experience.

They’re saying, what if instead of making the model bigger, we make it wiser?

Instead of throwing teraflops at every new problem, let it build memory-driven feedback loops and figure out patterns itself.

That’s a paradigm shift, and a slightly terrifying one.

Because experience doesn’t just scale wisdom, it scales trauma too.

Abstract visualization of AI memory and experience processing — The shift from information to experience: AI learning to build patterns from the chaos.

So What’s the Point of All This?

Maybe the big takeaway isn’t about models or benchmarks at all.

It’s about how close we’re getting to making machines that reflect, that pause and ask, “Why did I fail?”

We’ve taught AI to write code, draw art, and simulate emotion.

Now we’re teaching it to remember its own mistakes.

That’s either the next great leap in intelligence, or the start of a recursive therapy loop that never ends.

Conclusion

Intelligence isn’t about how much you remember.

It’s about what you choose to forget.

ReasoningBank is a hell of a step forward.

But when the machine starts deciding what’s worth keeping,

that’s when it stops being a tool

and starts being a thinker.

References

Google Cloud AI, “ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory,” arXiv:2509.25140v1, 2025. arXiv
WebArena Benchmark - GitHub Repository
Mind2Web Dataset - Official Site
SWE-Bench-Verified - GitHub Repository

About Nischal Skanda

Nischal is a technology enthusiast and designer passionate about the intersection of AI, cognitive science, and human-computer interaction. He explores how emerging technologies impact our daily lives and shares insights on building better digital experiences.

View Portfolio Follow on Twitter Connect on LinkedIn

Tagged with:

#AI #Machine Learning #Research #Google #Artificial Intelligence #ChatGPT

The Problem They’re Fixing

What MaTTS Adds to the Mix

The Results (The Receipts)

The Real Meaning: AI That Self-Reflects

The Catch: Memory Becomes Baggage

Scaling Experience, Not Compute

So What’s the Point of All This?

Conclusion

References

About Nischal Skanda

Tagged with:

You Might Also Like

Everything We Know About AI Is a Lie (And Silicon Valley Hopes You Never Find Out)

The "Free" AI Hoax: Why Google & OpenAI Wants 1.4 Billion Indians

The Cycle of AI: How Your ₹2,000 Subscription Funds a $5,000/Month GPU

Stay in the Loop

Share this post: