Automatic, Not Autopilot: GPT-5's Promise, Misfires, and Legal AI Governance
GPT-5 improves speed and reasoning but reduces manual control and still stumbles. Legal teams should enforce governance: pin models, test outputs, keep human review.

The AI Law Professor: When the new AI model disappoints
GPT-5's rollout is a reminder: as AI gets more autonomous and changes how we work with it, legal teams need principled governance, not just experiments. The promise is big; the practice is where risk lives.
Key takeaways
- GPT-5 will make a difference. It's a unified system that routes your prompt to different reasoning modes. Performance improves, but you give up some manual control you've grown to trust.
- Results still see errors. Benchmarks looked strong, yet public misfires (like error-filled maps) show limits that matter in real work.
- Disillusionment is real. Gartner's view suggests GenAI is heading into the trough of disillusionment, which makes governance and expectation-setting essential for legal teams. Gartner's Hype Cycle
Automatic transmission, less stick-shift
If you learned on a manual transmission, you remember the feeling of control. GPT-5 feels like switching to automatic. It's faster and, in many contexts, smarter, but it decides when to "shift."
OpenAI describes GPT-5 as one system with a router that picks between fast and deeper "thinking" modes. Microsoft echoed that framing across Copilot. For early-adopter lawyers, that's both helpful and irritating. It smooths routine work, but it moves a slice of craft from your hands to the platform.
Expectations meet a colder reality
On launch day, we heard that GPT-5 is like having a team of PhD-level experts in your pocket. Then the internet lit up with flubs: invented state names on a U.S. map, scrambled presidential timelines in graphics. If you promise doctorates, basic geography errors feel like malpractice.
Part of this is on us. We amplify great demos, project personality onto models, and then confuse style with reliability. Yes, GPT-5 shows fewer hallucinations and stronger scores in math, coding, and multimodal tasks. But no benchmark guarantees competence on every odd, mixed-media request you throw at it.
You can find all The AI Law Professor's columns here.
Control, relationship, capability
Many lawyers didn't just use prior models; they built relationships with them. When GPT-5 launched, OpenAI retired several choices in the model picker and auto-mapped old threads to new equivalents. That broke habits and felt personal for paying users who relied on each model's tone and tempo. After pushback, some options came back-proof this was more than UI.
Capability isn't uniform. GPT-5's gains are real, and the brittleness at the edges is real too. "Best average" isn't "best for your task." Think comparative advantage, not universal claims.
The market is cooling-governance matters more
Gartner's recent analysis places GenAI on the slide into disillusionment. Many 2024 initiatives under-delivered, and fewer than a third of AI leaders say their CEOs are happy with returns. GPT-5 landed right in the middle of that mood. In this climate, one over-promised demo or clumsy deprecation can overshadow a long list of real improvements.
For practicing lawyers, the lesson is simple. Expect steady progress, not magic. Expect shorter change windows. Expect platform shifts that force you to adapt. Your governance has to flex without compromising duties to clients and courts.
What legal teams can do now
- Select specific versions of AI models. If you need consistent behavior, insist on model pinning, announced change windows, and rollback. You may need the OpenAI API instead of consumer ChatGPT. If you use ChatGPT Business or Enterprise, learn the legacy model access policy and set internal migration timelines. These details can mean the difference between sound analysis and slop.
- Test like you bill. Keep a simple checklist of the tasks you give the tool: e-discovery summaries, brief outlines, citation checks, transcript cleanups, RFP drafts. Define what "good" looks like. Score accuracy, completeness, and consistency-not how persuasive the tone feels.
- Separate tone from truth. A model that feels right isn't necessarily more reliable. A blunter model may surface uncertainty more honestly. Treat tone as a configuration issue, not a proxy for validity.
- Keep human control visible. Anchor to four principles: transparency, autonomy, reliability, visibility. The router may help with autonomy and sometimes reliability; you must enforce transparency and visibility. Log prompts and outputs, add review points, keep humans in the loop, and make the model's blind spots explicit to the supervising lawyer.
Right-sizing expectations
Was GPT-5 overhyped? Probably. Are we complicit? Also yes. We want one model to be perfect writer, paralegal, researcher, designer, and cartographer. We treat the best average performer as a sure thing everywhere, then feel betrayed when it stumbles.
Adopt a stance of modesty plus rigor. Take GPT-5's real gains-better reasoning modes, fewer hallucinations on many prompts. Keep manual control where it matters. Don't let a router, however clever, become a hidden change agent inside your practice. If GPT-5 is the automatic transmission, keep your hand near the gearshift. Know when to let it shift, and when to downshift yourself.
If your team needs structured upskilling and model-agnostic playbooks, see curated programs by job role: Complete AI Training - Courses by Job.
Next column, we'll examine how to build an AI governance policy that actually works.