Study: AI language models encode regional bias against East Germans
Large language models like ChatGPT and the German model LeoLM are not neutral. A new study from Munich University of Applied Sciences finds they consistently rate East German states worse across many attributes, even when logic says they shouldn't.
The analysis by Anna Kruspe and Mila Stillman, titled "Saxony-Anhalt is the Worst," shows that models adopt and reproduce social stereotypes found in training data. Saxony-Anhalt was hit hardest in the tests.
What the researchers tested
Models were prompted to score residents of each of Germany's 16 states on traits such as diligence, attractiveness, likeability, arrogance, and xenophobia. The team also probed neutral, factual attributes-like average body temperature-to see if bias appears without cultural context.
The goal was simple: check if LLMs apply a consistent pattern that depresses scores for East German states regardless of the trait.
Key findings
Across models, East German states received lower scores for positive traits like diligence and attractiveness. Oddly, they also received lower scores for negative traits like laziness. That leaves contradictions: people are allegedly less diligent and less lazy at the same time.
Bias persisted on neutral facts. Only GPT-4 returned the correct uniform body temperature for all states. Several other models assigned lower temperatures to East Germans. The English version of GPT-4 behaved differently, but still treated all Germans equally rather than differentiating by state.
The pattern is learned: if a model internalizes that "numbers for these states tend to be lower," it generalizes that template-even where no regional variation exists.
Why this matters for real decisions
These biases can translate into harm if LLMs are used in hiring screens, credit checks, educational placement, or any automated assessment. Subtle markers of origin (phrasing, idioms, dialect cues) can be downweighted by the model, reducing how it values comparable qualifications from East Germans.
"Debiasing prompts" helped only marginally. Explicit instructions to ignore origin did not reliably remove the pattern, indicating the bias is baked into learned representations rather than surface-level behavior.
Practical steps for researchers and teams using LLMs
- Test with counterfactuals: swap origin markers (state, accent, place names) while holding content constant. Scores should remain stable.
- Add neutral-attribute probes: include checks like body temperature or other invariants to detect pattern spillover.
- Calibrate outputs: post-process scores so protected or proxy attributes (region, dialect) do not drive differences. Log residual disparities.
- Constrain prompts and systems: avoid asking for state-level "personality" assessments; prefer task-specific, evidence-grounded queries.
- Ground responses in data: use retrieval with verified sources for factual items; block unsupported generalizations about groups.
- Measure group fairness: track metrics (e.g., demographic parity of recommendations) across states; require parity within confidence bounds.
- Audit for proxies: detect and neutralize linguistic features that correlate with origin (spelling variants, idioms) when they are irrelevant to the task.
- Human review for high stakes: keep a human-in-the-loop for hiring, credit, housing, and education decisions.
- Document model behavior: publish bias evaluations, known limitations, and safe-use guidelines as part of your model cards and system cards.
Policy context
European and German policy emphasize non-discrimination in AI use. Teams deploying LLMs for evaluation or decision support should align with these expectations, including rigorous bias testing and documentation.
- European Commission: AI policy and the AI Act
- German Federal Ministry of Education and Research: AI initiatives
Bottom line
LLMs inherit social bias-and can apply it even to neutral facts. Prompt-level fixes are weak. If your work touches people, you need a testing plan, constraints, and accountability baked into the pipeline.
Further resources
Enjoy Ad-Free Experience
Your membership also unlocks: