Scientists use Battleship to improve AI decision-making in research

Researchers trained AI models to make smarter decisions under budget limits using collaborative Battleship. After optimization, a low-cost Llama model beat GPT-5 two-thirds of the time and outpaced human players by seven moves.

Categorized in: AI News Science and Research

Published on: May 09, 2026

Researchers use Battleship to teach AI better scientific decision-making

Researchers tested how well AI models make decisions under resource constraints by having them play collaborative Battleship against humans and each other. The work, presented at the International Conference on Learning Representations in April, offers a framework for improving how AI assists with scientific research.

The core problem: scientists must choose which hypotheses to pursue and which experiments to run with limited budgets. "You can get only so much data because getting data is either expensive or time-consuming," said Valerio Pepe, the research scientist who led the project before joining OpenAI.

The researchers designed a two-player version of Battleship where one participant asked questions about ship locations while the other answered. By tracking how many rounds it took to sink all ships, they compared how large language models (LLMs) performed against each other and 42 human players.

Results

Initially, humans won faster than Meta's Llama-4-Scout, an efficiency-focused AI model. OpenAI's GPT-5 outperformed both.

The researchers then optimized their models using Bayesian experimental design-a statistical approach that estimates the likelihood of outcomes based on prior assumptions. They trained the models to ask questions that maximized both accuracy and information gain, while also planning one move ahead.

The breakthrough came when players switched from natural language to code snippets for communication. Accuracy jumped significantly.

After optimization, Llama-4-Scout beat GPT-5 two-thirds of the time while costing roughly one hundredth as much to run. It also won in seven fewer moves than human players on average.

Application to science

Battleship is far simpler than real scientific problems. Chemical and biological samples don't yield clear answers the way a game board does. But the decision-making strategies the models learned should transfer to actual research work, Pepe said.

Yuanqi Du, a Cornell researcher focused on AI for chemistry, sees value in the framework. "The framework will be very useful to measure whether language models are really making progress in deciding which hypotheses to pursue among all possibilities," Du said. "Understanding the whole hypothesis space you're searching, that's the hardest part."

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Scientists use Battleship to improve AI decision-making in research

Researchers use Battleship to teach AI better scientific decision-making

Results

Application to science

Related AI News for Science and Research

Jan Leike leads Anthropic's alignment science team after departing OpenAI

Chinese AI researchers tell Sanders panel that advanced AI risks require global cooperation

Scientists use Battleship to improve AI decision-making in research

ETH Zurich professor gathers 40 voices from Silicon Valley and beyond to examine how AI is reshaping society

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: