A world where language isn't a barrier
In recent years, the popularity and quality of translation engines has increased enormously. With a few clicks of the mouse, a text can be translated into good quality English, Russian or Mandarin in a matter of seconds. With some languages, however, things aren't so easy. Many languages have never been digitised, for the simple reason that there's no commercial incentive. That can create problems for refugees trying to start new lives in unfamiliar surroundings. The Travis Foundation wants to change that. Its mission is to promote equal opportunities for all through language digitisation. Digitisation of the Tigrinya language is the foundation's first project. Cornelis Jansen, the Travis Foundation's Managing Director, talks about the digitisation process, Travis's ambitions and collaboration with SIDN Fund.
The Travis Foundation was set up in 2017 by Travis B.V., a Dutch company that makes a handy translation device. You speak into the gadget, and it immediately responds with audio in your chosen target language. "The translator came out in 2017 and was a big hit," says Cornelis. "We soon had Dutch government agencies and aid organisations getting in touch to ask whether we could add support for Tigrinya." Europe alone has about half a million Eritrean refugees, for whom digitisation of their native Tigrinya would be a big help. Tigrinya is spoken in Eritrea and the north of Ethiopia. "If a language hasn't been digitised, it isn't available in applications such as Google Translate or any of the other language and translation tools," Cornelis explains. Ultimately, what the Travis Foundation wants is to remove language-based communication barriers. Its vision is a world where everyone understands everyone else, regardless of their mother tongue. As Cornelis emphasises, "Lowering language barriers promotes integration. So addressing the language problem enables you to get to grips with a host of other problems."
How does it work?
"Digitisation of a language has three stages. First, you have to gather a vast amount of data. That's because you need a 'corpus' -- a huge body of sentences about all sorts of different things -- in order to achieve the fullest possible coverage of all the ways the language can be used. The sentences have to be in matching pairs -- in this case Tigrinya and English. We currently have a corpus of about sixty to eighty thousand sentence pairs, and it's growing every day. The next stage involves feeding the paired sentences into the computer and applying machine learning technology. That means getting the computer to identify patterns of correlation, without any knowledge of the grammar. Basically, the computer teaches itself the language. The product of the machine learning process is algorithms, which can be used in stage 3 to power a translation engine, such as Google Translate," Cornelis adds. Going through the three stages will enable the Travis Foundation to offer text-to-text translation.
The item continues below the image.
The Sentence Society
Digitising a language is a complex and time-consuming business. So Cornelis and his team came up with an innovative approach. "Initially, we hired a number of Eritrean refugees to help us translate as many sentences as possible," says Cornelis. "It was great fun working together and a great way to get to know the people and their culture. But it turned out to be a very inefficient way of building our corpus. So we decided to use our Eritrean colleagues as community ambassadors to recruit people from all over the world to contribute to the corpus-building process." At the beginning of March, the Travis Foundation launched The Sentence Society to accelerate the translation process. The Sentence Society is a game, though which people anywhere who speak both English and Tigrinya can lend a hand. "We present the 'player' with an English sentence and ask them to provide a translation in Tigrinya. That's because translating from your second or third language into your native language is easier than doing it the other way around. After submitting their own translation, the player is asked to rate two other translations by giving each of them one, two or three thumbs up or thumbs down. It means players can score other translations anonymously, providing us with a peer-review-based quality assurance tool," explains Cornelis. Any translation that gets a thumb down or just one thumb up from more than two players is removed from the database.
The item continues below the image.
Financial support by SIDN Fund
Realisation of the game was made possible partly by a grant from SIDN Fund. "An acquaintance told me about SIDN Fund," recalls Cornelis. "And I was delighted to find that the Fund liked the sound of our initiative. The grant enabled us to balance the budget for the project, realise the game and raise the profile of our activities. It wasn't only financial support that we received, though: the Fund also gave us access to its network. For example, when a legal question cropped up about ownership of the translations, SIDN Fund referred us to a specialist lawyer for advice. They also ran a brainstorming session on follow-up funding."
Marieke van der Kruijs, Project Coordinator at SIDN Fund: "The use of machine learning for translation isn't itself anything new, but using it for minority or threatened languages such as Tigrinya is an innovation. That's one of the aspects that attracted us to the project. It has demonstrable social significance. If it succeeds, it'll be a valuable addition to the public domain and it can be extended to support other minority and threatened languages. So there's ample scope for innovation. What's more, there's the prospect of improving life for a significant group of new arrivals in the Netherlands."
At the start of April, Cornelis hopes to bring out the first version of Travis's translation device. However, that doesn't represent the end of the road for the Foundation. The next step is to add voice functionality. "We want to offer spoken translations as well," says Cornelis. "To do that, we need to collect about two hundred hours of spoken language. That'll then be processed by self-teaching computer systems, in much the same way as the written language. We'd also like to build a smartphone app, so that you can say an English sentence, and the app will come back with the equivalent in Tigrinya."
Many more languages remain to be digitised as well. "From our contact with refugee organisations, we know that there's great demand for the digitisation of Kurmancî, for example. That's one of the most widely spoken Kurdish languages. Then there's Pashto, which is used mainly in Afghanistan and Pakistan." The Travis Foundation believes that everyone should have the right to be understood, so there's a very long way to go with its mission to bring down language barriers.
Want to know more about the Travis Foundation initiative? Visit https://travis.foundation/.