2,400-year-old math riddle challenges ChatGPT, revealing improvisation and limits
Researchers used Plato's square-doubling puzzle to test if ChatGPT reasons or recalls. A rectangle variant exposed a confident error, urging guided prompts and proof checks.

Can ChatGPT "learn" like a human? A 2,400-year-old math test offers clues
Plato recorded Socrates testing a student with the "doubling the square" problem: how to make a square with twice the area. The trick is non-obvious unless you realize each side of the new square equals the original square's diagonal, introducing a scale factor of √2.
Researchers from the University of Cambridge and the Hebrew University of Jerusalem used this classic puzzle to probe a modern question: does an LLM derive new mathematical reasoning, or simply retrieve patterns from text? Because models like ChatGPT are trained primarily on language, they chose a geometry problem whose exact solution is unlikely to appear verbatim in training data.
The twist came when they extended the task. In a study published Sept. 17 in the International Journal of Mathematical Education in Science and Technology, the team asked ChatGPT to double the area of a rectangle using similar reasoning. The model incorrectly asserted there was no geometric solution via the rectangle's diagonal, even though one exists, suggesting it improvised from prior discussion of the square rather than recalling a known method.
"When we face a new problem, our instinct is often to try things out based on our past experience," said visiting scholar Nadav Marco. "In our experiment, ChatGPT seemed to do something similar. Like a learner or scholar, it appeared to come up with its own hypotheses and solutions."
The authors connect this behavior to the zone of proximal development (ZPD), the gap between what's currently known and what can be reached with guidance. With the right prompts, the model appeared to push into that zone-sometimes productively, sometimes by making confident mistakes.
This is a reminder that AI reasoning remains a black box and its "proofs" are not guaranteed. As professor Andreas Stylianides noted, students and researchers must evaluate AI-generated arguments rather than accept them at face value.
Practical takeaways for scientists, mathematicians, and educators
- Treat AI outputs as hypotheses. Require definitions, lemmas, and a checkable proof outline. Verify steps independently.
- Prompt for process over answers: "Let's explore this together. Propose two distinct approaches, state assumptions, and test for counterexamples."
- Cross-modality helps. Pair LLMs with dynamic geometry systems or formal theorem provers to validate constructions and claims.
- Probe generalization. After a solution, modify constraints (e.g., from square to rectangle) and ask the model to reconcile differences.
- Demand citations or source attributions when the model claims known results. Flag anything uncited for manual review.
Why the rectangle example matters
Doubling area is a scaling problem: multiply linear dimensions by √2. There are geometric constructions to achieve this for rectangles, too. The model's confident refusal signals analogy-driven improvisation rather than grounded retrieval, which is useful for research-if and only if you keep verification in the loop.
Research directions
- Benchmark newer models on out-of-distribution geometry tasks and compare against curated training corpora to separate retrieval from synthesis.
- Integrate LLMs with theorem provers and dynamic geometry tools to produce verifiable, constructible outputs in STEM settings.
- Develop classroom protocols for evaluating AI-generated proofs as a core skill in mathematics education.
Study reference: International Journal of Mathematical Education in Science and Technology (journal homepage). For background on the zone of proximal development, see an overview (external reference).
Further learning
If you're refining prompts for research or teaching, explore concise prompt strategies and classroom workflows curated for practical use (prompt engineering resources).