Experiments With Machine Translation: When Machines Hallucinate & Write Their Own Stories

Yesterday we presented a number of experiments translating the results of Google Translate through a prominent open NMT translation system. Highly resourced languages for which there is ample machine-friendly training data performed very well, largely yielding slight variations on the original input sentence. On the other hand, languages for which there is far less training data with less variation performed more poorly, missing key nouns or having grammatical issues. Yet, even for the worst translations, they at least bore a passing resemblance to the source text or had sufficient error that it was clear the sentence was outside the scope of model's encoding of that language.

For two languages, however, a most interesting phenomena emerged: the models hallucinated entirely new stories from the input translations.

Recall that our English-language source sentence was:

  • "The UN Human Rights Office in Colombia also spoke out about the strikers saying that we follow up on their situation and continue to call for spaces for dialogue."

Azerbajani

Azerbajani translation (Google Translate):

  • "BMT -nin Kolumbiyadakı İnsan Hüquqları Bürosu, vəziyyətlərini izlədiyimizi və dialoq üçün yerlər axtarmağa davam etdiyimizi söyləyən tətilçilər haqqında da danışdı."

NMT model translation of above back into English:

  • "The U.S.A. also spoke of directors who told us that we were able to look for Braille, accommodations, and a dimension."

The Colombian UN Human Rights Office becomes the "USA," "strikers" become "directors," and the remainder of the sentence about "Braille, accommodations and a dimension" is fabricated out of thin air. In short, the model took the input sentence and hallucinated an entirely new story from it, though its nonsensical nature offers an immediate clue to human readers that there is a problem.

Marathi

Marathi translation (Google Translate):

  • "कोलंबियातील संयुक्त राष्ट्र मानवाधिकार कार्यालयानेही स्ट्राइकर्सबद्दल सांगितले की आम्ही त्यांच्या परिस्थितीचा पाठपुरावा करतो आणि संवादासाठी जागा मागवत राहतो."

NMT model translation of above back into English:

  • "In the United States, for example, a number of children are raised by their parents from the age of 11 and are now serving at the United States branch office of Jehovah's Witnesses in the United States."

This is an extraordinary example of hallucination in action, in which the sentence above bears absolutely no resemblance of any kind to the source sentence. Yet, unlike the Azerbajani example above, it is so perfectly fluent that if one was not aware of the source sentence being translated, it would appear to be a nearly flawless translation, rather than one invented from whole cloth. Notably, searching Google for the sentence above yields zero results, suggesting it was not simply harvested from the open web (though it may be present in unindexed training data).

Both of these examples unsurprisingly hail from lower-resourced languages for which training data is more limited and less varied. Yet the perfect fluency of the latter offers a textbook warning of the dangers of neural machine translation compared with traditional statistical architectures and the need for vigilance when relying on their translations.