Anthropic designs three-agent framework to support long-running autonomous software development

Anthropic built a three-agent system to handle multi-hour coding tasks, splitting work between planning, generation, and evaluation agents. The design fixes context loss and self-grading bias that typically derail long autonomous sessions.

Categorized in: AI News IT and Development

Published on: Apr 05, 2026

Anthropic's Three-Agent System Tackles Long-Running AI Development Tasks

Anthropic has introduced a multi-agent framework designed to handle extended autonomous development sessions, addressing fundamental problems that cause AI systems to lose coherence over multi-hour workflows. The approach divides work among three specialized agents: one for planning, one for generation, and one for evaluation.

The framework targets both frontend design and full-stack software creation. Anthropic engineers built it to solve two critical failures in autonomous coding: context loss between sessions and premature task termination.

How the System Maintains State

Rather than compacting context-a technique that preserves information but makes models cautious about approaching token limits-Anthropic uses structured handoff artifacts. When one agent completes its work, it passes a defined state to the next agent, allowing the workflow to continue without amnesia.

This matters because models operating near context limits often perform worse. The handoff approach sidesteps the problem entirely by resetting context between agents while maintaining continuity through explicit artifacts.

Separating Judgment From Execution

Agents routinely overestimate the quality of their own outputs, especially on subjective tasks like design. Anthropic addressed this by creating a separate evaluator agent, calibrated with specific scoring criteria and few-shot examples.

For frontend work, the evaluator uses four grading criteria: design quality, originality, craft, and functionality. It interacts with live pages using Playwright, then provides detailed feedback that guides the generator through iterative refinement cycles.

Prithvi Rajasekaran, engineering lead at Anthropic Labs, said: "Separating the agent doing the work from the agent judging it proves to be a strong lever to address this issue."

Results From Extended Sessions

Iteration cycles range from five to fifteen per run, with some sessions lasting up to four hours. Each cycle produces progressively refined outputs that combine visual distinction with functional accuracy.

The structured approach enables clear task decomposition. Planning, generation, and evaluation remain separate responsibilities with defined handoffs, making it easier to track progress and identify where breakdowns occur.

What Practitioners Are Seeing

Industry observers have noted the framework's practical advantages. The separation of evaluation from generation improves reliability by removing conflicts of interest-the agent generating code no longer judges its own work.

The structure itself-JSON specifications, enforced testing, commit-by-commit progress-prevents the context amnesia that typically derails long-running agents. Every new session starts from a known working state.

Operational Considerations

Teams implementing this framework need to establish evaluation criteria upfront and calibrate scoring mechanisms. Agents execute evaluations automatically, but human oversight remains necessary for initial setup and quality validation.

The workflow supports both parallel and sequential agent execution, depending on task dependencies. This flexibility allows teams to distribute processing across multiple agents or run them in sequence as needed.

What Comes Next

As models improve, the harness's role will shift. Some tasks may move directly to next-generation models without requiring multi-agent coordination. Simultaneously, better models enable the harness to tackle more complex work.

Engineers should experiment with harness configurations, monitor execution traces, decompose tasks carefully, and adjust workflows as model capabilities evolve. The optimal combination of agents and responsibilities will continue to change.

Learn more about AI Agents & Automation and Generative Code practices for development teams.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Anthropic designs three-agent framework to support long-running autonomous software development

Anthropic's Three-Agent System Tackles Long-Running AI Development Tasks

How the System Maintains State

Separating Judgment From Execution

Results From Extended Sessions

What Practitioners Are Seeing

Operational Considerations

What Comes Next

Related AI News for IT and Development

Bengaluru startup builds tool to track ships and aircraft when GPS is switched off

Oracle adds no-code AI tools and unified data management to its database platform for small businesses

Solana Foundation launches Agent Skills to simplify AI tool development on its network

Anthropic designs three-agent framework to support long-running autonomous software development

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: