Will training AI to be more “African” undermine our literary ownership?

20 May, 2024

Man hand holding virtual world with internet connection for metaverse. Global business marketing and banking financial pass thru application technology concept.

By Seth Onyango, bird story agency

Recent pushback by Singaporean writers against state efforts to use their work for AI training, has cast the spotlight on a similar conundrum in Africa.

As efforts to imbue Large Language Models (LLMs) with a deeper understanding of African perspectives and narratives gain traction, the African literary community stands at a crossroads.

Will they protect their intellectual property like their Singaporean counterparts, or seize the opportunity to influence the burgeoning field of AI? Is it possible to do both?

Africa boasts a vibrant literary tradition that has significantly influenced global literary circles.

Prominent African writers such as Chinua Achebe, Wole Soyinka, Chimamanda Ngozi Adichie, and Ngũgĩ wa Thiong’o have garnered international acclaim, contributing profoundly to world literature.

Their works offer rich, nuanced portrayals of African life, culture, and history, challenging stereotypes and broadening the global understanding of the continent.

This rich literary landscape presents a unique opportunity for Africa to shape the future of AI. By incorporating African narratives into LLMs, AI systems can become more inclusive and representative of the diverse human experience. However, this endeavour is not without its challenges and controversies.

Miguel Botero, Director of Social Impact at Biografika, highlights a critical disparity in AI training data. “African languages make up only about 0.1% of the languages represented on the internet,” Botero told bird story agency.

“That means anything you ask an LLM will be driven by data predominantly from the global north, marginalizing Africa’s influence in the responses these models generate.”

Africa, home to roughly a third of the world’s languages, represents an unparalleled linguistic diversity. With approximately 2,000 languages spoken across the continent, 75 of these languages are used by populations exceeding one million people.

However, many African languages remain oral rather than written, posing significant challenges in developing digital databases and training LLMs, which require large amounts of high-quality data.

This complexity, coupled with high costs, threatens to skew representations of global knowledge and experiences. Without significantly more training on African content, including nuanced narratives that represent a fast-modernising Africa, there is a danger that LLM’s will skew narratives of Africa in much the same way that international media has in the past.

Despite the challenges, the integration of African perspectives into LLMs presents enormous opportunities. By contributing their work to AI training, African writers can help shape AI technologies that accurately reflect their cultures and societies. This could lead to more inclusive and representative AI systems, fostering a global understanding that truly encompasses the diversity of human experiences.

Botero posits that addressing this disparity is not only crucial for AI’s effectiveness but also for rectifying the narrow, often inaccurate portrayal of cultures from the global south.

“The current imbalance perpetuates a vision of humanity that is heavily influenced by data from the global north,” he noted. “This limits the AI’s ability to accurately represent and respond to the diverse cultures and societies of the global south.”

Yet African content producers, much like the Singaporean literary community, are increasingly wary of how their works might be used. In Singapore, writers such as Gwee Li Sui have voiced strong opposition to government plans to train AI on their publications without clear assurances on copyright protection and compensation.

This resistance is part of a global trend where creators are pushing back against the use of their works to train AI technologies without proper consent or remuneration.

In Africa, the stakes are similarly high. A lack of clear guidelines on copyright protection and the absence of detailed plans for how work might be safeguarded adds to the general apprehension.

The situation in Singapore offers a cautionary tale. Plans by the Singaporean government’s National Multimodal LLM Programme (NMLP) to train AI systems on local literary works to better reflect the nation’s history and culture received an overwhelmingly negative response, due to concerns over intellectual property rights and the lack of consultation.

African writers are similarly wary of the impact of AI, particularly when it comes to representation.

“Technology can be used as a solution to those issues that have plagued Africa for so long. But technology can also be a monster. It can be a beast, a source of destruction,” said Nnedi Okorafor, a Nigerian-American writer known for her Africanfuturist works told Huawei Editor-in-Chief Gavin Allen at the 81st World Science Fiction Convention (Worldcon) last November.

“AI can’t create those things that are from the human heart. They can only create what’s already out there. I know the way I write, the reason that I write, and where it comes from. That’s not something that AI can imitate. It’s not possible,” she added.

For African AI initiatives to succeed, developers will be forced to address the concerns of the literary community head-on.

This means ensuring robust copyright protections, providing fair compensation, and engaging in transparent dialogue with writers and publishers.

Without these assurances, experts warn the risk of alienating the very creators whose works are essential to building culturally relevant AI systems remains high.

As Peter Schoppert, director of National University of Singapore Press, noted, there are still many “gray areas” in the legal status of training LLMs on copyrighted content. African literary community, together with government, will have to navigate these complexities carefully to avoid potential pitfalls.

The dilemma facing African writers and content producers is reflective of a broader global debate about the role of intellectual property in the age of AI. As efforts to build LLMs that reflect African perspectives gain traction, the literary community’s response will be crucial in shaping the future of these technologies.

Ultimately, the path forward will require a delicate balance between innovation and protection.

bird story agency

Written by: Contributed

Africa African literary community African perspectives AI training intellectual property Large Language Models Singaporean writers