Experiments With Machine Translation: The Perils Of Fluency & Untrained Languages

One of the greatest dangers with neural machine translation (NMT) is the way in which the seemingly human-like fluency of its translations can mask deep flaws and even outright hallucination in its fidelity to the source material. In other words, an NMT system can generate a translation into English that appears at first glance to be nearly flawless English prose, but upon further inspection has absolutely nothing to do with the original source material (hallucination) or which so egregiously changes the factual information of the source material that it is entirely wrong.

This "peril of fluency" is especially dangerous when it comes to translating languages for which the NMT system has not been trained. Take the Abkhazian language, written in Cyrillic script. Looking at the entry for Dmitry Iosifovich Gulia, one major state-of-the-art NMT system identifies it as the Tatar language and translates the first few lines as: "On the 21st of July, 1874, Drymith Iosif-i-Galiya was sentenced to life imprisonment. The issue's end has the recaptured Doomsday in the control of the KGB again and again. He was diagnosed with cancer in 1877, and is survived by his wife, a son-in-law, a son-in-law, a son-in- law , a son-in- law , a son-in- law , a son-in-law. In the wake of the Russian uprising, he was released on bail in 1878. "It simply came to our notice then." Other lines include ones like ""The rescue squad wasn't called for him," she told the Associated Press. "Spanish" jagiit sandui sabi. Sarah's mother-in-law, Sarah, has been charged with felony criminal mischief for firing on a sculpture with a shotgun. I call it the Sage Sage Sage. " Middle – Ivani Ekaterinei cancer."

At first glance, the translation appears to be an extremely poor, but serviceable, translation that at least captures some basic details and general gist of his life. Yet a closer look shows that the date of July 21st should in fact have been February 21st and rather than "sentenced to life imprisonment" refers to his birth, while the reference to the "KGB" is a hyperlink to the entry for the nation of Turkey, and so on.

Thus, what at first glance appears to be a heavily stilted but serviceable translation offering basic gist details of the person's life actually bears no resemblance to the factual details of their entry. From a criminal sentenced to life imprisonment on his birthday for capturing Doomsday on behalf of the KGB, dying of cancer 3 years later while leaving behind a wife and extended family, and being released on bail a year after dying after being noticed, his actual life bears no resemblance to this story.