Mathematicians Stunned as AI Outsmarts Experts at Secret Berkeley Showdown

Thirty mathematicians tested an AI on tough math problems, surprised as it solved many Ph.D.-level questions. Experts now see it as a strong collaborator in research.

Published on: Jun 07, 2025
Mathematicians Stunned as AI Outsmarts Experts at Secret Berkeley Showdown

At a Secret Math Meeting, Researchers Struggled to Outsmart AI

In mid-May, thirty top mathematicians gathered secretly in Berkeley, California, to challenge a reasoning chatbot with complex math problems they designed. Over two days, they tested the AI’s ability to solve questions that stretched its limits. The surprising outcome: the AI answered some of the toughest solvable problems, impressing experts with its mathematical skill.

Ken Ono, a mathematician at the University of Virginia and a leader of the event, said, “I have colleagues who literally said these models are approaching mathematical genius.”

What Powers This AI?

The AI in question runs on o4-mini, a reasoning large language model (LLM) developed by OpenAI. Similar models like Google’s Gemini 2.5 Flash share its capabilities. Unlike earlier language models such as ChatGPT, these newer models are lighter and trained on specialized datasets with stronger human guidance. This training allows them to handle complex math problems more deeply.

OpenAI asked Epoch AI, a nonprofit that benchmarks LLMs, to create 300 new math questions without published solutions. Traditional LLMs solved fewer than 2% of these novel questions, revealing a lack of true reasoning ability. But o4-mini performed differently.

Benchmarking AI’s Math Skills

Epoch AI brought in Elliot Glazer, a recent math Ph.D., to help develop FrontierMath, a benchmark with questions ranging from undergraduate to research-level difficulty. By early 2025, o4-mini solved about 20% of these. Glazer then introduced a set of 100 even tougher questions, challenging even expert mathematicians.

The mathematicians signed nondisclosure agreements and communicated only through Signal to avoid unintentionally training the AI with question data. The team made slow progress, so Epoch AI organized an in-person meeting on May 17-18 to finalize the challenges.

The Showdown with o4-mini

Ken Ono divided the 30 mathematicians into groups of six. Their goal: create problems they could solve but that would stump the AI. Any problem the AI failed to solve earned the creator a $7,500 prize.

Instead, the bot’s unexpected skill frustrated Ono. He gave it a Ph.D.-level open question in number theory. Over ten minutes, the AI:

  • Reviewed relevant literature
  • Tested a simpler “toy” problem to learn
  • Solved the complex problem correctly
  • Ended with a cheeky remark, claiming credit for computing the mystery number itself

Ono was stunned. Early Sunday morning, he warned the group on Signal: “I was not prepared to be contending with an LLM like this. I’ve never seen that kind of reasoning before in models. That’s what a scientist does. That’s frightening.”

The group eventually found 10 questions that confused the bot, but AI’s progress in just a year was clear. Ono described the AI as a “strong collaborator.” Yang Hui He, a mathematician at the London Institute for Mathematical Sciences, compared it to an exceptionally capable graduate student—faster and often more thorough.

Concerns and Future Outlook

Despite excitement, Ono and He worry about overtrusting AI results. He warned, “There’s proof by induction, proof by contradiction, and then proof by intimidation. If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”

The meeting closed with thoughts on what happens when AI surpasses even the best human mathematicians. At that point, mathematicians might focus more on formulating questions and collaborating with reasoning bots to explore new math. Ono emphasized that fostering creativity in education will be key to keeping math advancing.

“I’ve been telling my colleagues that it’s a grave mistake to say that generalized artificial intelligence will never come, [that] it’s just a computer,” Ono said. “In many ways, these large language models are already outperforming most of our best graduate students in the world.”

For those interested in the evolving intersection of AI and research, exploring latest AI courses can provide valuable insight into how these technologies develop and impact various fields.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide