Ship, observe, adjust: building Claude Code at AI speed with 90% AI-written code

At QConSF 2025, Adam Wolff showed Claude Code at the center: ~90% of code, daily releases, and a bias for shipping over planning. Own input, use transient shells, and revert fast.

Categorized in: AI News IT and Development
Published on: Nov 21, 2025
Ship, observe, adjust: building Claude Code at AI speed with 90% AI-written code

QConSF 2025: Developing Claude Code at AI Speed

At QCon San Francisco 2025, Adam Wolff shared how the Claude Code team puts an AI coding assistant at the center of its workflow. Around 90% of production code is written with, or by, Claude Code. The team ships continuously to internal users and aims for weekday releases externally. Planning matters less than the speed of shipping, observing behavior, and updating what the team believes the requirements are.

"Implementation used to be the expensive part of the loop. With AI in the workflow, the limiting factor is how fast you can collect and respond to feedback."

Bet 1: Own the input to unlock keystroke-level control

Claude Code needs rich terminal input: slash commands, file mentions, and keystroke-specific behavior. Conventional advice says "don't rebuild text input"-users expect their shortcuts. The team took control anyway, because every keystroke mattered.

The first version shipped with a virtual Cursor class: an immutable model of the text buffer and cursor position in a few hundred lines of TypeScript with a solid test suite. A single pull request later, Vim mode landed-hundreds of lines of logic and tests, much of it generated with Claude Code. As usage grew, Unicode issues surfaced. The team added grapheme clustering, then refactored to cut worst-case latency from seconds per keystroke to milliseconds by deferring work and using smarter search.

Bet 2: Rethink shell execution for agent throughput

The initial design used a PersistentShell-one long-running process behind a queue-so working directory and environment stayed consistent. It worked for humans, but broke down for agents. Once batch execution arrived, the internal queue serialized everything and throttled behavior.

The team flipped the model: one fresh shell per command. Users complained about lost aliases and functions, so they introduced a snapshot approach-capture shell config once, source it before each transient command. You don't plan that on a whiteboard. You discover it by shipping and listening.

Bet 3: Keep storage boring until it isn't

Conversation history started as append-only JSONL files. No external services, no special install steps, and it worked in production. Wanting richer queries and migrations, the team moved to SQLite with a type-safe ORM.

It failed fast. Native drivers caused install issues on some systems, especially with strict package managers. SQLite's locking under concurrency didn't match expectations set by row-level models. Within 15 days, they removed SQLite and went back to JSONL. Shipping exposed fragility early and made the rollback simple.

The filter: detour or dead end?

Across these stories, the key question kept coming up: what does shipping reveal that planning can't? The team treated each design choice like a hypothesis and evaluated it by the direction of pain.

  • If each iteration reduces bugs and latency (Cursor and graphemes), keep going.
  • If a change exposes a better composition method (transient shells + snapshots), keep the learning and move forward.
  • If fragility and user impact grow with no clear path to relief (SQLite), undo it quickly.

"Shipping small changes frequently-and being willing to unship-is central to our use of AI. The loop of build, ship, observe, and adjust is where most of the value appears."

Practical takeaways for engineering teams

  • Put the AI assistant in the loop for code, refactors, and tests. Treat it like a pair programmer with superhuman throughput.
  • Optimize for feedback speed: ship daily, measure real usage, and be ready to revert without drama.
  • Model input as data. An explicit, immutable cursor plus thorough tests let you layer modes and fix Unicode issues without fear.
  • Assume agents need parallel shell execution. Snapshot the user's shell state once and source it per command.
  • Prefer reversible bets. JSONL that "just installs" can beat a fancier datastore if your distribution model punishes native dependencies.
  • Judge architecture by latency at the edges. Milliseconds at the keystroke change how the tool feels; users notice.

Where to learn more


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)