Polish tops AI prompting, English ranks sixth in Microsoft study

Polish topped 25 languages for long-context prompts in a Microsoft-UMD study, while English ranked sixth. Switching prompt language can lift accuracy on retrieval tasks.

Categorized in: AI News Science and Research

Published on: Nov 12, 2025

Polish beats English for prompting AI, Microsoft-UMD study finds

Polish came out on top as the most effective language for prompting large language models, ahead of 25 other languages. English placed sixth, despite its heavy presence in training corpora. The study, run by researchers from Microsoft and the University of Maryland, evaluated long-context behavior across multiple model families.

The core task was long-context retrieval using a needle-in-a-haystack setup: hide a keyword in a lengthy passage, then ask the model to find it. Across OpenAI, Google Gemini, Qwen, Llama, and DeepSeek systems, Polish showed the highest accuracy at 88%. English landed at 83.9%, while Chinese trailed at 62.1%.

Top 10 languages for long-context prompting

Polish - 88%
French - 87%
Italian - 86%
Spanish - 85%
Russian - 84%
English - 83.9%
Ukrainian - 83.5%
Portuguese - 82%
German - 81%
Dutch - 80%

What the researchers suggest

Performance gaps between high- and low-resource languages widen as context length grows. Likely drivers include pretraining data availability, script and language family, and tokenizer specifics. They also observed that Latin-based scripts and languages with large Wikipedia corpora tend to score better on the retrieval task.

Why this matters for research and engineering

If your work relies on long-context behavior-literature triage, systematic reviews, RAG over long documents, or compliance scanning-prompt language isn't a small detail. It can move accuracy by several points without changing the model. That's a cheap lever to test before you throw more compute or context window at the problem.

Practical steps you can try now

Run bilingual prompts: state the instruction in Polish and ask for the answer in English. Many models follow this cleanly.
Code-switch key directives: keep documents in your native language, but write control phrases (e.g., "search for," "return exact match") in Polish.
Shorten contexts for lower-performing languages or chunk them more aggressively. Track accuracy versus chunk size per language.
Tokenizer awareness matters: compare token counts for the same prompt across languages. Fewer, cleaner tokens often help retrieval.
RAG pipelines: index parallel translations (source + Polish) for queries, and re-rank across both. Watch for MT artifacts; evaluate with held-out needles.
Reproduce the test in your environment: plant unique keys in long passages and measure hit rate as you vary language, model, and context length.

Caveats and open questions

This ranking reflects long-context retrieval and adherence to instructions, not general reasoning or factual accuracy. Results may shift by model version, tokenizer updates, or domain vocabulary. Treat language choice as a variable to optimize, not a fixed belief.

Want structured practice?

Bottom line: if long-context accuracy matters in your stack, include Polish in your next prompt-language evaluation. Measure, compare, then standardize what wins for your workload.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

Polish tops AI prompting, English ranks sixth in Microsoft study

Polish beats English for prompting AI, Microsoft-UMD study finds

Top 10 languages for long-context prompting

What the researchers suggest

Why this matters for research and engineering

Practical steps you can try now

Caveats and open questions

Further reading and tools

Want structured practice?

Related AI News for Science and Research

How AI Slipped Into Peer Review: Faster Publishing, Murky Transparency, Untapped Rigor

From Busywork to Breakthroughs: Building Reliable Scientific AI Agents with NeMo Gym and NeMo RL

AI tips off scientists to a new monkeypox weak spot, opening the door to simpler vaccines and antibody therapies

AI spots chronic stress on routine CT: adrenal volume index tracks cortisol and predicts heart failure risk

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: