Grok 4 Leads Dominant Day as AI Language Models Sweep Kaggle Chess Tournament Opening Round
Grok 4 leads Day 1 of Google's Kaggle AI chess tournament, sweeping Gemini 2.5 Flash 4-0. Four LLMs advance to the Semifinals after dominant wins.

Grok 4 Leads AI Chess Tournament Day 1, Advances Alongside Gemini 2.5 Pro, o4-mini, and o3
The first day of the AI chess exhibition match in Google's new Kaggle Game Arena project delivered clear winners. Four Large Language Models (LLMs) — Gemini 2.5 Pro, o4-mini, Grok 4, and o3 — each swept their opponents 4-0 to secure spots in the Semifinals. They defeated Claude 4 Opus, DeepSeek R1, Gemini 2.5 Flash, and Kimi k2 respectively.
The tournament continues on Wednesday, August 6, starting at 1 p.m. ET / 19:00 CEST / 10:30 p.m. IST.
Kaggle Arena Chess Exhibition Tournament Bracket
- Kimi k2 0-4 o3
- DeepSeek R1 0-4 o4-mini
- Gemini 2.5 Pro 4-0 Claude 4 Opus
- Grok 4 4-0 Gemini 2.5 Flash
The Kaggle Game Arena is a new initiative by Kaggle, a Google-owned platform serving data scientists and machine learning professionals worldwide. This project evaluates how LLMs like Gemini, ChatGPT, and DeepSeek perform in a competitive setting. Google sees this experiment as a way to gauge the problem-solving capabilities of these models, offering a glimpse into their strategic thinking and progression toward Artificial General Intelligence (AGI).
To launch the arena, Kaggle partnered with DeepMind, a Google company known for its impact on chess with AlphaZero in 2017. Kaggle provides the neutral playing field while DeepMind designs the tournament structure to ensure scientific rigor.
Unlike traditional computer chess tournaments that use specialized engines, this event features general-purpose LLMs built for writing, coding, and reasoning. Their chess skills are not on par with dedicated engines, but the event reveals how these models approach complex challenges.
The format is single-elimination with eight leading LLMs: Gemini 2.5 Pro, Gemini 2.5 Flash, o3, o4-mini, Claude 4 Opus, Grok 4, DeepSeek R1, and Kimi k2. The AIs use DeepMind's "harness," a universal controller to interpret board positions and submit moves. Each AI has four attempts to make a legal move; failing that results in losing the game.
Match Recaps
Kimi k2 0-4 o3
This was the most one-sided match. None of the games lasted beyond eight moves. Kimi k2 forfeited all four games by failing to find a legal move within four attempts. Despite this, Kimi k2 demonstrated some ability to follow opening theory briefly. However, once it moved beyond familiar lines, it quickly lost track, leading to repeated failures.
DeepSeek R1 0-4 o4-mini
The match between OpenAI's o4-mini and DeepSeek R1 started promisingly, with strong opening moves. Yet, quality declined sharply midgame due to hallucinations and blunders. Still, o4-mini managed to checkmate twice, which is notable given the difficulty AIs face in visualizing the entire chessboard.
Gemini 2.5 Pro 4-0 Claude 4 Opus
This was the only match with more games ending in checkmate than forfeits. Gemini 2.5 Pro's victory margin was clear, though it's uncertain how much stemmed from its strength versus Claude 4 Opus's mistakes. In one game, Gemini 2.5 Pro held a 32-point material advantage — including two queens — yet still lost pieces along the way to delivering checkmate.
Grok 4 4-0 Gemini 2.5 Flash
Grok 4 delivered the strongest performance of the day, winning all four games decisively. While Gemini 2.5 Flash made errors that simplified Grok 4's task, Grok 4 actively identified and exploited undefended pieces. This level of intentional play highlights its advanced approach compared to other LLMs.
Across the board, LLMs reveal three main challenges in chess: visualizing the full board, grasping piece interactions, and consistently making legal moves. Grok 4 appears less affected by these issues so far, making its future matches especially interesting.
The tournament continues with the Semifinals. The event is broadcast live on GM Hikaru Nakamura's Twitch and YouTube channels. Games and updates can also be followed on the official event page.