Experiments With Machine Translation: Loanwords

Loanwords form an often underappreciated foundation of the world's languages when it comes to news analysis, especially as they relate to names and fast-breaking events, in which names from one language may be reproduced without transliteration in another language, including one with an entirely different script. Sometimes loanwords persist indefinitely without formal adaptation into a language, other times the loanword is entirely subsumed into a replacement term and in some cases both loanword and new word continue to be used.

In simple terms, this can be seen in the use of Arabic numerals in online news coverage in non-Latin charactersets in place of their own numerals – a trend visible in the Global Numeric Graph. Some languages, like English, rarely incorporate loanwords from non-Latin scripts, meaning that finding an Arabic, Burmese, Chinese or Russian name in an English article in its original script is quite rare, even while names from Latin-script languages are routinely reproduced as-is. Instead, such names are almost always transliterated into English. In contrast, untransliterated names in Latin charactersets may often be found in other languages, posing unique challenges both in language identification and translation, as the translation graph must incorporate the loanword into its understanding of the surrounding native language content in order to fully understand the sentence.