Experiments With Machine Translation: The Difficulties In Teaching Neural Machine Translation New Words

Despite its enormous fluency advantages over traditional statistical SMT systems, Neural Machine Translation (NMT) systems have a critical weakness: the extreme difficulty in adding new words and phrases to their vocabularies to keep pace with the world's ever-changing linguistic landscape. Unlike SMT systems, whose models can be readily expanded over time to keep pace with new terminology, neural models are extremely brittle. Some architectures can, in theory, be adjusted to learn new words, but the addition of a single word or context can have a catastrophic impact on the model. In some cases, adding a single new word can cause an entirely unrelated vocabulary region to dramatically drop in performance or even be forgotten entirely. It can be almost impossible to predict the impacts of vocabulary additions and many model architectures do not support ongoing additions and must instead be retrained.

The end result is that today, nearly two years after the start of the Covid-19 pandemic, many of the world's most advanced commercial and research NMT architectures still do not accurately translate words relating to Covid-19, even in high-resource languages with vast amounts of pandemic-related training material. Mainstream Thai news mentions of "Covid-19" are still frequently translated as "cow video 19," while Chinese news mentions of "coronavirus" are frequently translated as a "crown virus" or "royal virus," to name but two common examples.

As machine translation continues to seep into daily life, there is a critical need for new architectures that can readily and robustly learn new vocabulary and contexts over time in a consistent and stable manner that allows purely additive (rather than today's subtractive) learning.

Such needs are one of the reasons that Translingual 2.0 makes use of a hybrid neural-statistical architecture that allows the system to constantly learn new vocabulary in an additive fashion, with its separate high-order semantic representation of entities adding a secondary layer of translation robustness.