GitHub adds dual-model review feature to Copilot CLI to catch coding errors before deployment

GitHub's Copilot CLI now includes Rubber Duck, a feature that runs a second AI model to review code plans before execution. In testing, it closed 74.7% of the gap between Claude Sonnet and the more capable Claude Opus.

Categorized in: AI News IT and Development
Published on: Apr 08, 2026
GitHub adds dual-model review feature to Copilot CLI to catch coding errors before deployment

GitHub Adds Rubber Duck Mode to Copilot CLI for Code Review

GitHub introduced Rubber Duck, an experimental feature in the Copilot CLI that uses a second AI model to review code plans before execution. The tool launched April 6 as part of GitHub's effort to catch errors that primary coding agents might miss.

Rubber Duck works by pairing a primary coding model with a separate model from a different AI family. The second model acts as an independent reviewer, flagging assumptions, edge cases, and architectural problems the first model overlooked.

Developers access Rubber Duck through the /experimental command in the Copilot CLI alongside other experimental features.

Performance on Real-World Problems

GitHub tested Rubber Duck on SWE-Bench Pro, a benchmark using actual coding problems from open-source repositories. Claude Sonnet 4.6 paired with Rubber Duck running GPT-5.4 closed 74.7% of the performance gap between Sonnet and the more capable Claude Opus 4.6.

The second reviewer proved most useful on difficult problems spanning multiple files. For problems requiring 70 or more steps to solve, Sonnet with Rubber Duck scored 3.8% higher than Sonnet alone. On the hardest problems tested, the improvement reached 4.8%.

What Rubber Duck Catches

GitHub documented three types of issues the second model identified:

  • Architectural problems: Rubber Duck flagged a proposed scheduler that would start and immediately exit without running any jobs, then identified an infinite loop in one of the scheduled tasks.
  • Logic bugs: The tool caught a loop that silently overwrote the same dictionary key each iteration, causing three of four Solr facet categories to drop from search queries with no error message.
  • Cross-file conflicts: Rubber Duck identified three files reading from a Redis key that new code stopped writing to, which would have silently broken confirmation UI and cleanup paths on deployment.

The feature addresses a common problem in generative code work: AI agents can produce syntactically correct code that fails logically or architecturally. A second opinion from a different model family can catch assumptions the primary agent embedded in its solution.

For AI for IT & Development professionals, Rubber Duck represents a practical approach to AI-assisted coding-using multiple models to reduce blind spots rather than trusting a single agent.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)