Study shows OpenAI's latest model performs no better on law school exams than last year's version

OpenAI's GPT-5.5 failed to beat the older o3 model on law exams, possibly scoring worse. The newer AI fell below the predecessor's A+ to B grades despite extra compute.

Categorized in: AI News Legal
Published on: Jun 30, 2026
Study shows OpenAI's latest model performs no better on law school exams than last year's version

OpenAI's latest reasoning model, GPT-5.5, did not outperform its predecessor on law school exams, and may have even scored worse, according to a new study from the University of Maryland Francis King Carey School of Law. The finding challenges assumptions that steady model upgrades automatically translate into better legal reasoning, a capability many in the profession are watching closely as AI tools proliferate.

Last year, researchers gave OpenAI's o3 model the same final exams that law professors administer to students, grading the AI's answers on the same curve. The model earned grades ranging from A+ to B, a performance the study's authors described as impressive. When the team repeated the experiment this year with the updated GPT-5.5 model, the results did not improve. In fact, the newer model's performance may have dipped.

The study's findings

The researchers noted that the plateau emerged despite using a more advanced model and allocating extra-high inference-time compute-the computational resources devoted to generating each answer. "This apparent plateau, despite a better model and using extra-high inference-time compute, may reflect trends observed in other legal benchmarks for AI and makes theoretical sense," they wrote. The team plans to continue the experiment in future semesters to track whether the pattern holds.

The study, covered by the TaxProf Blog, adds to a small but growing body of work testing AI on legal tasks. Other benchmarks have similarly suggested that raw model scaling does not reliably translate into better performance on complex legal analysis.

Why this matters for legal professionals

For lawyers, judges, and legal educators, the plateau signals that the newest AI model is not automatically the best for legal work. It underscores the risk of assuming that each update brings a meaningful leap in reasoning ability. Professionals who rely on AI for research, drafting, or exam preparation should test outputs against their own standards rather than trusting that a newer version will produce stronger results. The study also reinforces the value of domain-specific evaluations, which can reveal gaps that general benchmarks miss.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)