Why AI still slips on logic-and the Saarland team aiming to fix it
News Release - 20 Nov 2025
Saarland University | Grant and Award Announcement
AI assistants are useful, but they still get basic reasoning wrong. Michael Hahn, Professor of Computational Linguistics at Saarland University, argues the issue is architectural-more data and better prompts won't fix it. Backed by €1.4 million from the German Research Foundation's Emmy Noether Programme, his new research group will study these limits and prototype alternatives.
The core issue: architecture, not just data
Modern large language models (LLMs) rely on transformer architecture. It prioritizes relevant information and learns associations across massive datasets. That approach is strong for pattern matching, but it can hard-code mistakes when associations are wrong-and fixed-depth networks cap what the model can compute.
Hahn's team has shown mathematically that these systems make systematic errors that training alone can't remove. The result: persistent failures in areas where precision matters most.
Three failure modes holding LLMs back
- Poor handling of changing conditions. In tests where two books move around a group, accuracy drops as the number of passes grows. In medicine, this maps to timeline errors: mixing up sequences of symptoms, tests, diagnoses, and medications can lead to harmful recommendations.
- Weak logical reasoning. Tasks like choosing a medication for a condition require inferring symptom-disease links and applying exclusion rules. Today's models struggle with rule-based inference even when the facts are available.
- Breakdowns on nested, layered inputs. Legal liability assessments need both doctrinal rules and a correct chronology. These multi-step chains regularly trip up current neural networks.
What this project will do
The group will first build a tighter mathematical theory of how transformers compute. That includes analyzing which functions are learnable, where depth limits bite, and how many layers are needed for stronger reasoning behavior.
Next, they will explore hybrid systems and potentially new architectures with more predictable capabilities. The goal: models that reason more reliably than today's LLMs.
Practical takeaways for researchers and developers
- Treat time as a first-class variable. Add explicit state tracking or external memory for dynamic tasks. Stress-test models with counterfactual timeline changes, not just longer prompts.
- Wrap neural outputs with logic checks. Use symbolic post-validators or constraint solvers for rule-heavy tasks. Don't rely on prompting to enforce inference rules.
- Flatten nested inputs where possible. Break cases into modular steps with schemas (facts, rules, chronology), then verify each step. Chain verification beats one-shot answers.
Funding, team, and context
Funding: €1.4 million via the German Research Foundation's Emmy Noether Programme. The group at Saarland University will include five doctoral researchers and focus on "Understanding and Overcoming Architectural Limitations in Neural Language Models."
Context: This is the third Emmy Noether group approved for computer science in Saarbrücken in 2025, alongside two at the Max Planck Institute for Computer Science. For comparison, only three computer science-focused Emmy Noether groups were funded nationwide last year.
Learn more
- DFG Emmy Noether Programme
- Transformer architecture: Attention Is All You Need (arXiv)
- Department of Language Science and Technology, Saarland University
- Prof. Michael Hahn's website
Contact
Prof. Michael Hahn
Language, Computation and Cognition Lab
Tel. +49 681 302-4343
Email: mhahn@lst.uni-saarland.de
Your membership also unlocks: