Translation between languages, whether by human or machine, is an inherently lossy process, converting concepts, structures, narrative devices and even latent emotions that may have no ready counterpart in the target language. Machine translation has made enormous strides in recent years with the transition from SMT (statistical machine translation) to NMT (neural machine translation) and the introduction of advanced architectures like Transformers, but the results are still heavily dependent on the training data.
To illustrate how these differences in training data can affect the resulting translations, we started with the following English sentence, machine translated using Google Translate from a Spanish language article in El Herlado:
- "The UN Human Rights Office in Colombia also spoke out about the strikers saying that we follow up on their situation and continue to call for spaces for dialogue."
We then used Google Translate to translate the sentence above into each of the languages below using Google Translate, verifying that the reverse translation of the sentence back into English using Google Translate and Microsoft Bing Translate (where it supported the language), as well as manually spot checking the phrase using a translation guide to ensure it retained key details, both yielded an identical or highly similar version of the original English sentence above that included all of the key details. We then translated the Google Translate translated phrase into English using a prominent Transformer-based pretrained open NMT system.
Languages with the highest quality translations according to human review are the following, which all contain the key details about the Columbian UN Human Rights Office monitoring strikers and calling for dialog. All of the translations below include these key details other than the Georgian translation, which is lacking any mention of Colombia, despite the Google Translate sentence clearly containing the Georgian word for Colombia ("კოლუმბიაში"). At the same time, each of the sentences below expresses the text slightly differently, with Bengali notably placing Colombia before the UN in its sentence ordering.
Of course, it is important to recognize that these different formulations represent the combined impact of Google Translate formulating them into the target language and the open NMT model translating them back into English, so the different formulations are not exclusively the result of the open NMT model. Regardless, these slight differences remind us of the lossy nature of translation and the myriad ways of saying the same thing in slightly different ways.
Afrikans | The UN Human Rights Office in Colombia also spoke about the strikers, saying that we follow up their situation and keep asking for room for dialogue. |
Albanian | The UN Office for Human Rights in Colombia also spoke of strikers saying that we follow their situation and continue to call for space for dialogue. |
Arabic | The United Nations Office for Human Rights in Colombia also spoke of the strikers, saying that we are following their situation and continuing to advocate for spaces for dialogue. |
Armenian | The U.N. government authorities of the United Nations also publicly said we're following their situation, and we're still calling the places of communication. |
Bengali | In Colombia, the UN Human Rights Office also told the strikers that we are following their situation and calling for a place to dialogue. |
Bulgarian | The United Nations Human Rights Office in Colombia also speaks of strikers, saying that we are monitoring their situation and continue to call for a space for dialogue. |
Chinese Simplified | The United Nations Human Rights Office in Colombia also addressed the strikers, saying that we would follow up on their situation and continue to call for space for dialogue. |
Chinese Traditional | The United Nations Human Rights Office in Colombia has also spoken to strikers, claiming that we will follow their lead and continue to call for dialogue space. |
Czech | The United Nations Human Rights Office in Colombia also spoke of the strikers and said that we were monitoring their situation and continued to call for a space for dialogue. |
Danish | The UN Human Rights Office in Colombia also spoke about the strikes and said that we are following up their situation and continuing to call for room for dialogue. |
Dutch | The UN Human Rights Office in Colombia also expressed its views on the strikers and said that we are monitoring their situation and continue to call for room for dialogue. |
Estonian | The UN Human Rights Office in Colombia also spoke about strikers, saying that we will monitor their situation and continue to call for dialogue rooms. |
Finish | The UN Human Rights Office in Colombia also spoke about the strike that we are monitoring their situation and continue to call for places for dialogue. |
French | The United Nations Human Rights Office in Colombia also spoke about strikers, saying that we are monitoring their situation and continue to call for spaces for dialogue. |
Galician | The Rights Office of the U.N. at Colombia also spoke about the strikers saying that we followed their situation and we kept asking for spaces for the dialogue. |
Georgian | The United Nations Office of Human Rights also spoke of the possibility that we observe their situation and continue to call for a dialogue. |
German | The UN Human Rights Bureau in Colombia also spoke about the strikers and said that we are continuing to pursue their situation and continue to call for spaces for dialogue. |
Hungarian | The United Nations Office on Human Rights in Colombia also spoke about the strikers, saying that we are monitoring their situation and that we are still asking for a dialogue. |
Icelandic | The United Nations Office of Human Rights in Colombia also reported that the strikers said we watch their circumstances and continue calling for room for discussion. |
Indonesian | The United Nations Human Rights Office in Colombia also spoke of the strikers who said that we followed up on their situation and kept calling for room for dialogue. |
Irish | The United Nations Office for Human Rights in Colombia spoke about the strikers saying that we continue with their situation and continue to demand spaces for dialogues. |
Italian | The United Nations Human Rights Office in Colombia also spoke of the strikers saying that we follow their situation and continue to call for spaces for dialogue. |
Korean | The United Nations Human Rights Office in Colombia also said that we're going to keep track of the situation and keep asking for space to talk about the strike. |
Latvian | The UN Human Rights Office in Colombia also spoke of strikers saying that we are following their situation and continuing to call for the creation of spaces for dialogue. |
Macedonian | The UN Office for Human Rights in Colombia also spoke about strikers saying that we are following their situation and continue to call for room for dialogue. |
Maltese | The UN Human Rights Office in Colombia also spoke of the attackers and said that we follow their situation and continue to call spaces for dialogue. |
Polish | The United Nations Human Rights Office in Colombia also spoke about strikers who said that we were following their situation and we continue to call for space for dialogue. |
Russian | The United Nations Office for Human Rights in Colombia has also referred to strikes, stating that we are following their situation and continue to call for dialogue. |
Slovak | The United Nations Human Rights Office in Colombia has also spoken of the strikers, which said that we are following their situation and continue to call for a space for dialogue. |
Swedish | The UN Human Rights Office in Colombia also talked about the strikes saying that we are following up their situation and continuing to call for spaces for dialogue. |
Turkish | The UN Office for Human Rights in Colombia also talked about the strikers and said we were monitoring their situation and continued to demand space for dialogue. |
Ukrainian | The UN Office on Human Rights in Colombia also reported on the strikers, stating that we are watching their situation and continue to call for a dialogue space. |
Vietnamese | The U.N. Human Rights Office in Colombia also spoke up about the strikers saying that we were following their situation and continuing to call for space for conversation. |
Conversely, the following translations of the same sentence have lower fidelity, are missing key details or are less fluent. In some cases the automatic language detection of the NMT model failed to correctly recognize the language, forcing it to be manually specified. Many of the errors below clearly result from incorrect translation of noun phrases, suggesting the model's dictionary did not encompass the words in question, despite this sentence being selected specifically for its use of common words and a so-called "well-known entity" that is typically well represented in the bitext and monolingual training data typically used to train open translation models.
Notably, most of the languages below have less training data than those above. One notable exception is Japanese, for which the translation is largely fluent with the exception of the mistranslation of "strikers" into "strike car," suggesting that particular word was not well-represented in its training data.
Basque | The United States Bureau of the United States of the United States of the United States spoke of the strikers also and said that we track their situation and that we are still asking for space for the interview. |
Catalan | The United Nations Human Rights Office in Colombia also spoke about the vains saying that we follow their situation and continue demanding spaces for the dialogue. |
Cebuano | The UN Human Rights Office of Colombia also spoke of welists speaking that we would follow up with their plight and continue to call for the spaces for the dial. |
Chichewa | The UN Human Rights Office in Colombia also said of those who have left behind that we are following what they have met and getting a place to talk. |
Filipino | The UN Human Rights Office in Colombia also spoke of welgists who claimed that we were going to monitor their situation and keep calling for gaps for a trickologist. |
Hausa | The world's staff of the World's Service Committee on earth in SSyct and speaks of its workers as he thinks we are following their traits and continue to call them where we talk. |
Hindi | In Colombia the United Nations Rights office also spoke of strikers saying we follow their position and ask for space for conversation. |
Japanese | The United Nations Human Rights Agency in Colombia also said that regarding the strike car, we are following up their situation and continuing to seek space for dialogue. |
Malayalam | The U.S. government of the United States spoke about human rights, asking for places where we would follow their position and talk. |
Punjabi | The United Nations rights authority in Colombia also said that we follow their position and require places for communication. |
Samoan | The UN Human Rights Office, also spoken of as opposers, told us to keep their position and keep calling for opportunities to open up. |
Sesotho | The UN Human Rights Office in Colombia also spoke of opposers saying that we are following their position in search of interviews. |
Shona | The UN Human Rights Office office in Colombia also spoke of demonstrations saying that we test their situation and keep shouting places for communication. |
Urdu | In Colombia, the United Nations Human Rights Department spoke to the United Nations Department, saying that we are doing their situation and ask for a conversation. |
Welsh | The United States spoke of the UN Rights were also raising because we were returned to their situation and continued to call a dialog for a dialog. |
Xhosa | The Office of the UN Human Rights in Colombia has also said about the protesters that we are following their situation and continue to call the parts of the debate. |
Yoroba | THAT is how Columbiao THAT is said to have been published by the IMOO OAMO OAA OAGA.. |
Hopefully this has offered a useful look at the myriad ways one can say the same thing and the impact of training data on translation quality.