Ship, Observe, Unship: How Anthropic Builds Claude Code at AI Speed

Anthropic ships Claude Code with an assistant writing ~90% of prod code. Short loops-ship, observe, adjust-beat heavy plans and make feedback, latency, and shell state real work.

Categorized in: AI News Legal
Published on: Nov 21, 2025
Ship, Observe, Unship: How Anthropic Builds Claude Code at AI Speed

Developing Claude Code at AI Speed: What Shipping Reveals That Planning Doesn't

At QCon San Francisco 2025, Adam Wolff shared how Anthropic builds Claude Code with an AI coding assistant at the center of daily work. Roughly 90% of production code is written with or by Claude Code. Releases go out continuously to internal users and on weekdays to external users.

With an assistant that can generate and refactor code plus tests in minutes, the bottleneck shifts. "Implementation used to be the expensive part of the loop. With AI in the workflow, the limiting factor is how fast you can collect and respond to feedback."

Owning the Input Layer: Keystrokes, Unicode, and Latency

Claude Code needs rich terminal input: slash commands, file mentions, and keystroke-specific behavior. Conventional advice says avoid rebuilding text input because users expect a huge set of editing shortcuts. The team ignored that advice and took full control of every keystroke.

The first version introduced a virtual Cursor class-an immutable model of the text buffer and cursor. It started as a few hundred lines of TypeScript with a solid test suite. Another engineer later added Vim mode in a single pull request, with hundreds of lines of logic and tests generated with Claude Code.

As usage grew across languages, Unicode bugs appeared. The team added grapheme clustering and later refactored to cut worst-case latency from seconds per keystroke to a few milliseconds by deferring work and using better search strategies. Pain decreased over time, and the architecture kept supporting fast changes.

  • Own the keystroke if your UX depends on it.
  • Model editor state immutably for safer refactors and easy test generation.
  • Treat Unicode correctly with grapheme clusters, not code points or bytes. See Unicode UAX #29 text segmentation.
  • Optimize for worst-case latency, not averages.

Shell Execution: From One Long-Lived Process to Snapshots

The first shell design used a PersistentShell: one long-running process behind a command queue. It preserved natural shell semantics-working directory and environment changes carried over-at the cost of serialization. A few hundred lines of code handled queuing, recovery, and pseudo-terminal behavior.

Then the team introduced a batch tool to let the model run many commands at once. The single queue became a bottleneck. They switched to one fresh shell per command for parallelism. After shipping, complaints surfaced about missing aliases and functions. The final move: snapshot aliases/functions once in the user shell and source that script before each transient command.

As Wolff put it, "you do not plan this kind of design, you discover it through experimentation."

  • Design for parallelism early; queues hide bottlenecks until agents go wide.
  • Snapshot implicit state (aliases, functions) and replay it for transient shells.
  • Ship, listen, adjust. Be willing to unship without drama.

Persistence: JSONL Beats SQLite (Here)

Conversation history started as append-only JSONL files on disk. No external services, no install hurdles, and it worked in production. The team wanted stronger querying and migrations, so they adopted SQLite with a type-safe ORM.

Problems landed fast. Native SQLite drivers failed to install on some systems, especially with strict package managers-"native dependencies basically do not work for this distribution model." And SQLite's locking behavior under concurrency didn't match developer expectations set by row-level locks elsewhere. Within 15 days, they removed SQLite and went back to JSONL. For this client-distributed tool, the simpler choice won.

  • Prefer zero-native dependencies for widely distributed clients.
  • Know your database locking model; SQLite's page-level locks surprise teams used to row-level behavior. Reference: SQLite locking.
  • Make changes reversible; the ability to undo is a feature.

The Operating Rhythm: Build, Ship, Observe, Adjust

"Shipping small changes frequently, and being willing to unship when needed, is central to our use of AI in development. The loop of build, ship, observe, and adjust is where most of the value appears."

With AI collapsing implementation time, planning shifts from detailed upfront design to shaping fast experiments. The team ships, watches real behavior, and updates the plan. The outcome is a system that converges through feedback, not heavyweight design docs.

Practical Checklist for Teams Using AI Coding Assistants

  • Set a hard latency budget for key interactions; optimize tail latency.
  • Instrument everything you ship; observe before you theorize.
  • Adopt immutable models for core editor state to enable safe refactors and AI-generated tests.
  • If your agent runs shell commands, snapshot implicit shell state for each invocation.
  • Default to storage without native dependencies for client installs; keep migrations behind a feature flag.
  • Keep the release cadence tight; practice fast rollbacks.
  • Use the assistant to generate tests and refactors, then review with focused human checks.

Where to Go Next

Expect the most progress from short loops: ship, observe, adjust. If you're skilling up your team on AI-assisted coding, you can explore focused learning tracks such as the Claude certification here: Complete AI Training - Claude Certification.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide