At a Secret Math Meeting, Researchers Struggled to Outsmart AI
In mid-May, thirty top mathematicians gathered secretly in Berkeley, California, to challenge a reasoning chatbot with complex math problems they designed. Over two days, they tested the AI’s ability to solve questions that stretched its limits. The surprising outcome: the AI answered some of the toughest solvable problems, impressing experts with its mathematical skill.
Ken Ono, a mathematician at the University of Virginia and a leader of the event, said, “I have colleagues who literally said these models are approaching mathematical genius.”
What Powers This AI?
The AI in question runs on o4-mini, a reasoning large language model (LLM) developed by OpenAI. Similar models like Google’s Gemini 2.5 Flash share its capabilities. Unlike earlier language models such as ChatGPT, these newer models are lighter and trained on specialized datasets with stronger human guidance. This training allows them to handle complex math problems more deeply.
OpenAI asked Epoch AI, a nonprofit that benchmarks LLMs, to create 300 new math questions without published solutions. Traditional LLMs solved fewer than 2% of these novel questions, revealing a lack of true reasoning ability. But o4-mini performed differently.
Benchmarking AI’s Math Skills
Epoch AI brought in Elliot Glazer, a recent math Ph.D., to help develop FrontierMath, a benchmark with questions ranging from undergraduate to research-level difficulty. By early 2025, o4-mini solved about 20% of these. Glazer then introduced a set of 100 even tougher questions, challenging even expert mathematicians.
The mathematicians signed nondisclosure agreements and communicated only through Signal to avoid unintentionally training the AI with question data. The team made slow progress, so Epoch AI organized an in-person meeting on May 17-18 to finalize the challenges.
The Showdown with o4-mini
Ken Ono divided the 30 mathematicians into groups of six. Their goal: create problems they could solve but that would stump the AI. Any problem the AI failed to solve earned the creator a $7,500 prize.
Instead, the bot’s unexpected skill frustrated Ono. He gave it a Ph.D.-level open question in number theory. Over ten minutes, the AI:
- Reviewed relevant literature
- Tested a simpler “toy” problem to learn
- Solved the complex problem correctly
- Ended with a cheeky remark, claiming credit for computing the mystery number itself
Ono was stunned. Early Sunday morning, he warned the group on Signal: “I was not prepared to be contending with an LLM like this. I’ve never seen that kind of reasoning before in models. That’s what a scientist does. That’s frightening.”
The group eventually found 10 questions that confused the bot, but AI’s progress in just a year was clear. Ono described the AI as a “strong collaborator.” Yang Hui He, a mathematician at the London Institute for Mathematical Sciences, compared it to an exceptionally capable graduate student—faster and often more thorough.
Concerns and Future Outlook
Despite excitement, Ono and He worry about overtrusting AI results. He warned, “There’s proof by induction, proof by contradiction, and then proof by intimidation. If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”
The meeting closed with thoughts on what happens when AI surpasses even the best human mathematicians. At that point, mathematicians might focus more on formulating questions and collaborating with reasoning bots to explore new math. Ono emphasized that fostering creativity in education will be key to keeping math advancing.
“I’ve been telling my colleagues that it’s a grave mistake to say that generalized artificial intelligence will never come, [that] it’s just a computer,” Ono said. “In many ways, these large language models are already outperforming most of our best graduate students in the world.”
For those interested in the evolving intersection of AI and research, exploring latest AI courses can provide valuable insight into how these technologies develop and impact various fields.
Your membership also unlocks: