The GDELT Project

Experiments With Machine Translation: Neural Machine Translation, Fidelity, Entity Recognition And Grammatical Fluency

Neural machine translation (NMT) has emerged as the dominate approach to machine translation today, replacing statistical (SMT) approaches, which in turn had replaced rules-based (RBMT) systems. NMT has become the go-to translation architecture because of its ability to achieve seemingly human-like fluency on many texts and across highly dissimilar language pairs. Yet, this high degree of apparent fluency masks deeper challenges that complicate their use in at-scale news translation.

Fidelity To Source Text

SMT systems yielded highly fidelity to original source texts, since they translated in short phrases with minimal and highly localized clause restructuring. While the resulting translations may have exhibited poor grammatical structure, they did not reach beyond the confines of the source material. In contrast, NMT systems exhibit far more interpretive behavior akin to a human translator and can even hallucinate entirely unrelated passages.

Smaller NMT models tend to be more faithful to the original text, preserving more of the original argument structure, while larger models, especially state of the art models, tend to rewrite arguments and add rhetorical flourishes absent from the original text that can fundamentally affect their meaning. For example, one major commercial system translated a Spanish sentence that translated literally as "He didn't comment." to "He made it clear to everyone who would listen that he would never make a public statement that leaned either way."

Both statements convey that the individual did not provide comment, but the latter ascribes a set of actions and state of mind that are unsupported by the original text. In fact, we have observed that such rhetorical or hallucinated expansions are surprisingly common in state of the art commercial systems, meaning their translations frequently deviate from the source text in ways that imply events, actions, beliefs or intent that go beyond (often far beyond) the original text.

Human translators can also add such rhetorical flourishes, interpretations and "added intent" when translating works for a contemporary or unfamiliar audience to contextualize the ideas within, especially in the field of literature. However, when it comes to news coverage, the need to precisely recover the intent, mindset and nuance expressed in the source text requires a high degree of fidelity to the source text and minimal interpretation. Today's NMT models typically do not have an inference-time parameter to adjust the degree of rewriting they perform – this must be tuned during training through a combination of train/test data and model design/parameters. Simpler models, especially those tuned for mobile use, tend to exhibit less interpretation of source material, but the degree to which NMT systems imply large-scale meaning that is not present (as opposed to connotative and imperfect translations) is an unsolved challenge with significant implications to downstream analytic tasks.

Entity Recognition

At first glance, the results of any major commercial translation system will typically appear almost human-like, with a natural grammatical structure that avoids the stilted and abrupt transitions of SMT systems and that provides sentence-level arrangement of structure. Look again and that fluency can mask a major limitation: inaccurate translation of proper names and other entities.

Speed

Separate from accuracy, it is worth noting that even a very small and highly optimized NMT model and highly tuned pipelining can fully saturate an A100 or V100 GPU at just a few hundred translated words per second or less, depending on the language and requires complex memory management and adaptive batching to achieve maximal throughput. This means that even with extremely small and highly optimized models and access to very large quantities of GPU accelerators, it can be difficult to scale NMT systems to handle realtime content firehoses. GPU-equipped VMs also don't currently support live migration and GPU memory pressure can result in unpredictable application failure patterns at scale, requiring careful pipelining and making it more difficult to provide fixed-time latency guarantees.

Importantly, unlike SMT systems, current NMT architectures do not widely support adaptive model modulation strategies to cope with time or resource pressures. SMT systems can dynamically monitor latency, throughput, memory pressure, CPU utilization and hardware availability to trade off translation accuracy for speed at the resolution of a single clause – a strategy we rely on heavily to maintain fixed-time pipeline latency during unexpected volume surges. In contrast, the primary strategy for time/space tradeoff in NMT systems is to maintain multiple models, yet even here the tradeoff is not as linear as SMT systems, which can be statistically pruned in-place based on real-world observed model utilization – another major strategy we use.

Conclusion

While NMT systems have emerged as the go-to solution for machine translation today, they present a number of challenges for the news domain where source text fidelity and entity recovery are of particular importance.