The world of artificial intelligence (AI) has long been dominated by just a few major players. As AI creeps into more and more of daily life—from how we search for information to the very fabric of our public discourse—it’s become abundantly clear that this technology is built primarily for a specific demographic. For those who speak English, Mandarin and a handful of commercially valuable languages, AI has been ready to go from its very conception. Most of us take this for granted. It’s all too easy to overlook the privilege of navigating an online landscape that was inherently designed, optimised and built with you in mind.
But what about the speakers of the other thousands of languages that these models can’t comprehend? The implications of AI’s failure to account for minority languages are profound and far-reaching. And, at the same time, grossly under-researched.
A different vision for AI
Researchers, engineers and public institutions across the Global South are increasingly challenging the notion that AI leadership must be the exclusive domain of linguistic and technological superpowers.
Nowhere is this more evident than in Africa. The continent, which spans almost a fifth of planet Earth’s landmass, boasts over 2,000 languages. However, a March 2025 study by AfroBench, a benchmark assessing large language model (LLM) performance across 64 African languages, revealed “significant performance disparities” between English and the majority of the languages examined.
Odunayo Ogundepo, one of AfroBench’s researchers, told RESET that he wasn’t shocked at all. “Our initial reaction was that this was unsurprising, though still sobering to see the extent of the gaps.” He explained that the training of AI models is dependent on the amount of web data available. This means that languages which are less represented online pose a hurdle for AI optimisation as there’s simply less training material.
Ogundepo also points to market dynamics as an explanation for the huge gap in performance. “Much of the [global] consumer market speaks dominant global languages”. Therefore, it’s more appealing for manufacturers to focus on markets where they will be able to, put simply, make the most profit on their products and services. And, even though Africa is a linguistically diverse continent of almost 1.5 billion people, a number which is growing rapidly, “a large portion of the African community already speaks English, French or Portuguese, which may reduce the perceived urgency around local language support.” But, “this does not change the fact that it is important to address these gaps.”
AI’s lack of linguistic diversity has real-life implications
Societies whose languages lack representation in the digital systems of today are becoming ensnared in a repeating cycle of exclusion. It leaves them unable to participate in the benefits of AI and the advancement it brings. For many countries in the Global South, the stakes are geopolitical, too. Dependence on Western or Chinese AI infrastructure likely means distorted global narratives and diminished sovereignty over crucial information.
The consequences can be life-threatening. For example, local governments in rural areas of Indonesia who tried to implement health IT systems found grave translation errors in the minority Javanese and Sundanese languages, which resulted in grave misunderstandings about medication dosages.
The business case for enhanced linguistic diversity in AI is also interesting. Organisations looking to expand into emerging markets are discovering that linguistic inclusion boosts adoption rates. Research by Common Sense Advisory in 2025 showed that more than three-quarters of online consumers prefer buying products with information written in their mother tongue.
The technological bias creates a subtle but powerful message that some languages—and by extension, their speakers—matter less. And, as always, it’s the same speakers that matter more. “If English dominates the training process, the answers will be filtered through a Western lens,” says Mekki Habib, a robotics professor at the American University in Cairo. Without deliberate intervention to enhance linguistic diversity in AI, we’re heading into a technological monoculture that will mirror and magnify existing power imbalances, rather than mitigating them.
InkubaLM is a blueprint for small, powerful AI models
However, important work is being done to close the gap. InkubaLM, developed by Lelapa AI, is a generative model trained on five African languages. IsiZulu, Yoruba, Hausa, Swahili and IsiXhosa are spoken by roughly 364 million speakers across the continent. Named after the Zulu name for dung beetle—an animal capable of moving 250 times its weight—InkubaLM aims to lay a blueprint for the power and efficiency of smaller AI models.
InkubaLM-0.4B is trained from scratch on a dataset totalling 2.4 billion tokens. This includes 1.9 billion tokens specifically from the African languages, complemented by English and French data. The model marks the first of hopefully many initiatives to ensure African communities are empowered to access tools Machine Translation, Sentiment Analysis, Named Entity Recognition (NER), Parts of Speech Tagging (POS), Question Answering and Topic Classification for their languages.
As Ogundepo puts it, it’s not only the volume of data that needs to be improved in order to increase usability for communities. It’s vital that creators of “low-resource language” LLMs account for the specific use cases for the regions they will be deployed in. “The single most urgent action the global AI community needs to take is to create high-quality evaluation datasets in African languages that truly reflect how these models would be used in real-world contexts.” It’s no use for a team of developers in Silicon Valley to develop an LLM for speakers of Yoruba in Benin. “We need benchmarks that capture actual utility—not just academic exercises, but evaluations that represent genuine use cases for African communities.”
InkubaLM leverages two key datasets: Inkuba-Mono, a monolingual collection of the five African languages as well as English and French, and Inkuba-Instruct, designed to enhance instruction understanding in these languages. It was developed through collaboration with local linguists and communities to ensure the model’s functionality extends beyond simple language to account for both resource efficiency and cultural relevance, making it genuinely useful for African communities.
Can AI ever really do speakers of minority languages justice?
AI has already done a great job of entrenching unjust global power structures, deepening economic divides and eroding culture. Linguistic exclusion is just one example; AI bias in image generation, photo recognition and using literature as training models without consent are all compounding factors. But, the implications stretch still further.
Speakers of minority languages, already struggling for digital inclusion, are often on the frontlines of the climate crisis. And, AI’s hidden ecological toll exacerbates their vulnerability in concerning ways.
Let’s be clear: AI is incredibly resource-intensive. The sophisticated LLMs we marvel at—the very ones that struggle with Xhosa or Yoruba—demand staggering amounts of energy and, as we’re just beginning to understand, water. Data centres, the giant, whirring brains that AIs need to function, require colossal volumes of water for cooling. Reports indicate that global water demand from AI could reach 4.2-6.6 billion cubic meters by 2027. That exceeds 50 percent of the UK’s annual water use in 2023.
Now, consider where these data centres are often built. Regions that can offer vast swathes of land in climates with lower humidity are often the most appealing for data centre planners. These regions also happen to be ones that are also already water-scarce. This means that AI’s insatiable thirst is directly competing with local communities for vital water resources, exacerbating existing droughts.

In 2004, campaigners in Uruguay successfully fought for the right to fresh drinking water. Two decades later, they’re invoking that very legislation to challenge Google’s new data centre, projected to consume an estimated two million gallons of water daily, all while Uruguay endures its most severe drought in 70 years.
Read more here: Uncovering the Hidden Water Footprint of AI: Solutions for Quenching Its Insatiable Thirst
The mining of rare earth minerals for AI hardware also comes with a heavy environmental and social cost. These essential materials are notoriously difficult to extract and purify, and are often found in countries with weaker environmental regulations and labour protections. Communities living near these mines, often indigenous or minority groups, regularly face land degradation, water contamination and human rights abuses. Much of this can be directly linked to the AI hardware. Yet, they’re rarely linked directly.
And then, what happens to this hardware once it’s cooked by heat and rendered useless? Why, regurgitated as e-waste back into low-resource communities in the Global South, of course.
Let that sink in. Whilst it’s undoubtedly the Global North that is developing, using and benefiting from AI, finite resources that are being hacked out of land in the Global South are then being dumped back in as e-waste, without benefiting the region at all.

The engines behind many popular AI applications demand vast quantities of computational hardware for data processing and iterative training. This hardware has an extremely limited lifespan. It’s not uncommon for data centre Graphics Processing Units (GPUs) used for AI to last for just a couple of years.
Read more here: Tomorrow’s AI, Today’s Problem: How Toxic GAI E-Waste Could Engulf the Planet
Is a green, equitable future for AI possible?
A green and equitable future for AI is not only possible but imperative. The current trajectory, where the AI ecosystem often overlooks minority languages, is deeply problematic. It’s not merely an issue of linguistic exclusion; it’s also ecologically exploitative, propelling us toward a future where the advantages of a select few are maintained at the expense of the many—and the planet.
Local-led initiatives offer a powerful and practical starting point for a more sustainable path. While these models “operate at a much smaller scale compared to their global counterparts due to capacity restrictions,” as Ogundepo points out, this very constraint has a surprising benefit: it forces them to be smarter.
“The reality is that limited computational resources force African researchers and developers to prioritise efficiency over scale, which could lead to innovations in model optimisation and resource-limited approaches. This necessity-driven focus on efficiency could serve as a valuable model for the global AI community to adopt greener practices.”
However, for a truly sustainable and just future, AI development needs a revolution. There are various perspectives regarding how we do this, or if it’s even possible. Ultimately, it must be decolonised from Western hegemony, democratised to ensure that benefits are fairly distributed and held accountable for its ecological footprint. Time will tell whether the claws of Big Tech will loosen enough to allow this to happen.


