AI Outsmarts Top Mathematicians at Secret Berkeley Showdown

An AI chatbot solved complex math problems faster than top mathematicians at a secret Berkeley meeting. Experts warn against overtrusting its confident proofs.

Categorized in: AI News Science and Research
Published on: Jul 13, 2025
AI Outsmarts Top Mathematicians at Secret Berkeley Showdown

Artificial Intelligence Outsmarts Top Mathematicians in California Meeting

In mid-May, thirty leading mathematicians gathered secretly in Berkeley, California, to challenge an advanced AI chatbot on complex mathematical problems. Some attendees traveled from as far as the U.K. Their goal was to test the AI’s reasoning capabilities using questions they designed themselves.

Over two days, the group presented professor-level problems to the AI, powered by o4-mini—a reasoning large language model (LLM) developed by OpenAI. The bot’s ability to solve some of the hardest solvable math problems left the experts astonished.

What is o4-mini and How Does It Work?

O4-mini is a lighter, more agile LLM trained on specialized datasets with enhanced human reinforcement. Unlike earlier models that simply predict the next word, o4-mini performs intricate logical deductions, allowing it to tackle deeper and more complex mathematical questions than traditional LLMs.

Google’s Gemini 2.5 Flash offers similar capabilities, reflecting a new class of AI designed to excel in reasoning tasks.

Benchmarking AI’s Mathematical Skills

To evaluate o4-mini’s potential, OpenAI collaborated with Epoch AI, a nonprofit that benchmarks LLMs. They developed FrontierMath, a set of 300 novel math questions whose solutions were previously unpublished. These questions spanned undergraduate to research-level challenges.

Earlier LLMs managed to solve less than 2% of these questions, showing limited reasoning ability. By April 2025, o4-mini solved about 20% and moved on to a fourth tier—questions challenging even for academic mathematicians.

The Secret Meeting and Its Surprising Outcomes

To prevent data contamination, participating mathematicians communicated exclusively via the encrypted app Signal and signed nondisclosure agreements. Each unsolved problem earned its creator a $7,500 reward. The group split into teams to devise problems that would stump the AI.

However, the AI consistently impressed. Ken Ono, a University of Virginia mathematician and judge at the meeting, recounts presenting a Ph.D.-level open question in number theory. Within ten minutes, o4-mini researched relevant literature, tried simplified versions of the problem, and delivered a correct and even cheeky solution.

“I was not prepared to be contending with an LLM like this,” Ono said. The bot’s reasoning mimicked a scientist’s approach, and its speed far outpaced human experts, handling tasks in minutes that would take weeks or months for mathematicians to complete.

Achievements and Concerns

Despite the AI’s skill, the mathematicians found 10 problems that remained unsolved by o4-mini. Still, the group recognized the AI as a strong collaborator rather than just a tool.

Yang Hui He, a mathematician at the London Institute for Mathematical Sciences, compared the AI’s performance to that of an exceptional graduate student. However, both He and Ono caution against overtrusting the AI’s results.

He noted, “There’s proof by induction, proof by contradiction, and then proof by intimidation. The model says everything with such confidence that people might be intimidated into accepting its answers.”

Looking Ahead: The Future Role of Mathematicians

The meeting ended with reflections on what “tier five” questions—beyond even the best mathematicians’ reach—might mean. If AI reaches this level, mathematicians may shift to posing problems and guiding AI tools to discover new truths, similar to how professors work with graduate students.

Ono predicts that fostering creativity in higher education will be essential to sustain mathematics. He warned against dismissing AI’s progress, emphasizing that these models already outperform most top graduate students worldwide.

For those interested in advancing their knowledge about AI capabilities and training, resources are available at Complete AI Training.

  • Key Takeaways:
  • AI models like o4-mini can solve advanced mathematical problems at speeds unmatched by humans.
  • Such AI demonstrates reasoning that closely resembles scientific thinking.
  • Experts caution about overreliance on AI-generated proofs due to confidence-based persuasion.
  • The future role of mathematicians may focus more on creativity and collaboration with AI.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide