Qalb: An Urdu-first large language model built to serve 230M+ speakers
A Pakistani student in the United States has launched Qalb, an AI model built exclusively for Urdu. The goal is simple: bring high-quality language technology to users who work, learn, and build in Urdu every day.
Most mainstream models skew English. That leaves gaps in accuracy, context, and access for teams shipping products in Pakistan, India, and global diaspora markets.
What we know
"Qalb is now recognized as the world's largest Large Language Model created exclusively for the Urdu language," said developer Taimoor Hassan. Trained on 1.97 billion tokens and benchmarked across seven-plus international evaluation frameworks, the team says Qalb outperforms existing Urdu-focused models on key real-world indicators.
This is a development model for now. The next phase includes mobile and web apps so people can try "Qalb ChatGPT."
Hassan completed his undergraduate degree at FAAST University Peshawar and is pursuing a master's in computer science and software engineering at Auburn University. He's joined by collaborators Jawad Ahmed and Muhammad Awais-both FAAST graduates currently studying in Germany.
Why it matters for engineers and product teams
Urdu is spoken by more than 230 million people across Pakistan, India, and beyond, yet it's still under-represented in advanced AI systems. An Urdu-first model can reduce friction across intent detection, generation quality, and domain accuracy-especially where users mix Urdu and English in the same sentence.
For teams building local interfaces, better handling of idioms, honorifics, and right-to-left scripts means less guesswork and fewer edge cases in production.
Practical use cases
- Support and chat: intent classification, slot filling, and NER on mixed Urdu-English messages.
- Education: tutoring flows, study help, and curriculum Q&A in native language.
- Voice services: IVR, call routing, captions, and summaries when paired with ASR/TTS for Urdu.
- Public services: form assistance, benefits guidance, and localized FAQs.
Integration notes for developers
- Evaluation: build Urdu-centric test sets. Consider cross-lingual benchmarks like MTEB for standardized comparisons.
- Retrieval: use RAG with Urdu corpora; normalize diacritics and Arabic-script variants; handle ZWNJ; test right-to-left rendering in UI components.
- Code-switching: many users blend Urdu and English. Create mixed-language evaluation suites to reflect real traffic.
- Fine-tuning: use PEFT/LoRA on domain data. Watch for small-corpus overfitting and ensure privacy controls if customer data is involved.
- Latency and cost: look beyond training tokens. Parameter count, context window, quantization, and throughput will decide feasibility.
What the team says
"Trained on a massive dataset of 1.97 billion tokens and benchmarked across seven-plus international evaluation frameworks, Qalb outperforms existing Urdu-focused AI models⦠setting a new standard for natural language processing in Pakistan," Hassan said. He added, "Together with my undergraduate roommates and teammates, Jawad Ahmed and Muhammad Awais, we are committed to continuously fine-tuning localized models for niche industries."
On access, Hassan noted: "Technology is no longer locked behind big budgets or big teams. With the right mindset, even a small group can build products that educate, automate, and serve millions."
What's next
The team plans to ship mobile and web apps. For engineering leaders, the key questions will be API availability, model weights access, licensing, and deployment options (cloud, on-prem, edge).
Resources
- Urdu language overview
- MTEB leaderboard (evaluation reference)
- Complete AI Training: AI courses by job role
Your membership also unlocks: