Africa’s Missing Languages in AI: Why Inclusion Matters for Millions

Africa hosts over a quarter of the world’s languages, yet many are missing from AI development due to limited data. The African Next Voices project created open-access datasets for 18 languages to bridge this gap.

Categorized in: AI News IT and Development
Published on: Sep 06, 2025
Africa’s Missing Languages in AI: Why Inclusion Matters for Millions

The Linguistic Diversity of Africa and Its Absence in AI Development

Africa hosts over a quarter of the world’s languages, yet many of these languages are absent from artificial intelligence (AI) development. This gap stems from limited investment and a lack of accessible data. Most AI tools, including widely used ones like ChatGPT, are trained on English, European, and Chinese languages—languages with vast amounts of online text available.

Many African languages are predominantly spoken rather than written, creating a shortage of textual data needed to train AI models effectively. This results in millions of African language speakers being excluded from AI-driven tools and services.

Addressing the Data Gap: The African Next Voices Project

Researchers recently released what is considered the largest dataset of African languages to date. The African Next Voices project combined efforts from linguists and computer scientists to develop AI-ready datasets covering 18 African languages. While this represents only a fraction of the estimated 2,000+ languages on the continent, it is a critical first step with plans for expansion.

Over two years, the team recorded 9,000 hours of speech from Kenya, Nigeria, and South Africa, capturing real-life scenarios in farming, healthcare, and education. Languages included Kikuyu and Dholuo (Kenya), Hausa and Yoruba (Nigeria), and isiZulu and Tshivenda (South Africa), some spoken by millions.

Prof Vukosi Marivate of the University of Pretoria, who led the South African research, highlights the importance of this work: “We think and dream in our own languages. If technology doesn’t reflect that, a whole group risks being left behind.” Kenyan computational linguist Lilian Wanzare adds that the project ensured inclusivity by recording voices from diverse regions, ages, and backgrounds to capture authentic language use.

Open Access Data and Real-World Applications

The project received a $2.2 million grant from the Gates Foundation and will provide open access data for developers. This enables the creation of AI tools that can translate, transcribe, and respond in African languages.

One practical example is AI-Farmer, an app used by Kelebogile Mosime, a farmer in South Africa’s Rustenburg region. The app recognizes several South African languages, including Sesotho, isiZulu, and Afrikaans, offering solutions to farming challenges. Mosime explains how using her home language Setswana on the app helps her diagnose plant diseases and find insect control options—critical support for someone with limited exposure to technology.

Breaking Down Language Barriers in Business and Services

Lelapa AI, a South African startup, builds AI tools for banks and telecom companies in African languages. CEO Pelonomi Moiloa points out that English often acts as a barrier, excluding many from essential services like healthcare and banking. She stresses the need for AI solutions in indigenous languages to improve access.

Preserving Culture and Knowledge Through Language

Beyond practical uses, language is a vessel for culture, knowledge, and perspectives. Prof Marivate warns that excluding African languages from AI risks losing more than data—it risks losing entire ways of seeing and understanding the world.

For IT and development professionals, this highlights an urgent opportunity: building AI systems that reflect linguistic diversity. Incorporating African languages into AI models not only expands market reach but also supports inclusion and preserves cultural identity.

  • Focus on collecting diverse, high-quality data that reflects real speech patterns.
  • Collaborate with linguists and local communities to ensure cultural accuracy.
  • Leverage open access datasets like African Next Voices to accelerate development.
  • Build AI applications that address specific local challenges, enhancing adoption.

For those interested in deepening their AI skills and exploring language-based AI projects, platforms like Complete AI Training offer courses that cover natural language processing and AI development.