The post explores AI’s impact on translating internet freedom resources from the perspective of organizations such as Localizaton Lab. While AI offers faster translations, it struggles with minority languages due to limited data. The blog post highlights the risk of bias and inaccuracy in these translations, potentially harming end-users. It calls for regulation, bias reduction, and ethical data collection, emphasizing community involvement and open-source tools to ensure equitable localization for all languages.
So we have been hearing a couple of people in the community ask about AI in the work that we do: whether we will be incorporating it, how we would do it or simply an opinion on its potential impact.
So, let’s talk about AI- its long overdue anyway. It’s everywhere these days, literally being treated as this shiny new toy promising to change how we do, well, pretty much everything. One area where it’s making inroads is localization – translating stuff- in our case tools and resources geared towards enhancing Internet Freedom so they can be understood in different languages and within local contexts.
Sounds great, right? Faster translations, potentially reaching more people quicker… but what about the smaller languages, the ones not spoken by millions? That’s where things get tricky. You see, AI, especially the Large Language Models (LLMs) like those from Big Tech, learn by gobbling up tons of text data or datasets. The more data they have for a language, the better they get at translating it. So, languages like English, Spanish, or Chinese are well-represented already, and the AI can do a decent job. But what about, say, Shona, Zulu, Nambya, or even smaller regional dialects? There just isn’t as much data out there for them, and this is where the issue is.
If AI systems are trained primarily on major languages, they will most likely struggle with minority languages on the basis of availability of data alone. What’s even more worrying is, they might introduce errors or biases from the dominant languages into the translations of minority ones. So imagine a crucial healthcare document translated into a minority language with AI, but it contains inaccuracies because the AI was “influenced” by its training in a more ‘common’ language. The resulting consequences could be detrimental to both end users and the owners of the document.
Article by Chido Musodza
Programme Associate
Localization Lab
So, what can we do? Localization Lab, along with other digital rights and internet freedom advocates, have a big role to play. We need to start questions around the following areas:
Regulation: How can we encourage the responsible use of AI in localization? Do we need guidelines or regulations to make sure minority languages aren’t left behind?
Addressing Bias: Big Tech’s LLMs often carry biases mainly stemming from the people responsible for training the platforms and these biases are usually shaped from the perspective of the “middle-aged white man” who tend to be the bulk of the demographic found working within Silicon Valley. How can we push for more inclusive training data and algorithms that don’t perpetuate these biases in translations?
Data: More data for minority languages! But how do we collect it ethically, respecting privacy and community ownership? Can we create secure datasets that AI can learn from without compromising sensitive information or undercutting the people who will have helped play a major role in the collection of the data?
Community-driven data collection: Finding ways to create or support initiatives where communities themselves contribute to language datasets, ensuring their voices and nuances are captured accurately.
Human-in-the-loop systems: Even with AI, we still need human translators, especially for minority languages, to review and correct the AI’s output and ensure its within the correct context. This ensures quality control and cultural sensitivity.
Open-source tools and resources: The promotion of the development of open-source AI tools specifically designed to support minority languages. This can help reduce reliance on Big Tech’s potentially biased systems.
This isn’t about rejecting AI altogether, we cannot do that unfortunately, it’s a revolution that has arrived. It has the potential to be a powerful tool for localization. However, it’s about making sure that these tools are used responsibly, fairly, and in a way that benefits all languages, not just the dominant ones. We need to be proactive, raise our voices, and work together to ensure that the future of localization is inclusive and equitable. Let’s get the conversation going.
What are your thoughts?
TO FIND THE ORIGINAL POST...