Generative AI: Using LLMs To Produce Culturally Recent Translations Vs Classical NMT – "Dropping" A Song

Last month we explored the ability of Large Language Models (LLMs) to produce higher-quality translations than traditional Neural Machine Translation (NMT) solutions for social media posts in which the LLMs achieved superior fluency in adapting the underlying meaning of the source language to the grammatical structure and wording more common to the target language. A key challenge to traditional NMT systems is that they tend to generate overly formal and literal translations that typically struggle to overcome colloquial and emergent expressions. LLMs, on the other hand, are proving uniquely capable of both interpreting and producing such language, creating fundamentally new possibilities for the use of translation in everyday life.

Take for example the contemporary synonym of "release": to "drop", with new products being "dropped" rather than "released". How do both NMT and LLM translations handle this colloquial use? To add to the complexity, we'll make the band releasing the song be The Eagles so that the models have the added ambiguity of resolving whether the text is about a famous American band releasing a new album or a group of large birds falling from the sky.

The sentence we will translate is:

The Eagle's new song dropped yesterday.

Let's first try translating into Spanish, which is a high-resource language with large volumes of informal speech readily available for ML training.

Google Translate yields:

La nueva canción de The Eagle salió ayer.

While Bing Translator yields:

La nueva canción de The Eagle se lanzó ayer.

Both correctly translate "dropped" into "released" in this context, demonstrating that the underlying NMT training data is recent enough to capture this use and correctly maps the English slang into the Spanish equivalent meaning.

What about ChatGPT? The prompt used was:

Translate into Spanish: "The Eagle's new song dropped yesterday."

ChatGPT yields:

La nueva canción del Águila se lanzó ayer.

Here ChatGPT has translated the sentence identically to Bing, but incorrectly translated The Eagles (it is still "Eagles" in Spanish). Thus, even with its recent knowledge cutoff, it struggled, likely due to The Eagles having less representation in more modern social-era colloquial datasets in Spanish.

At the same time, Spanish is highly represented in most ML training datasets and slang Spanish is likely to be captured reasonably well due to its prevalence in the kinds of sources that many ML models are trained on today.

In contrast, Estonian typically has far poorer representation in ML training datasets. How does it perform?

Google Translate:

Eile langes The Eagle'i uus laul.

Here "dropped yesterday" is translated to "eile langes" which means to literally fall, such as falling temperatures.

Bing Translator:

Kotka uus laul langes eile.

This is even worse, combining both "eile langes" with incorrectly translating Eagles to the bird (it is "Eagles" in Estonian).

In contrast, this is ChatGPT's translation:

Kotkas laskis eile välja uue laulu.

Eagles is incorrectly translated like Bing's translation. However, uniquely among the three tools, the Estonian translation correctly translates to "released a new song yesterday." While it formalizes the language from the English slang, it correctly preserves its overall meaning, much as Google and Bing did for Spanish.

Thus, in both languages, ChatGPT correctly preserved the meaning of a new song being released, but in both cases failed to understand The Eagles as a band and instead translated it as a group of birds. This is a fascinating example of "understanding" versus "replication". A model that "understands" what it reads would recognize The Eagles as a proper name due to its context as a musical band and that it should therefore only be transliterated if needed and in either case preserved as a loanword, rather than translated into the literal equivalent of a bird. In contrast, ChatGPT merely learns patterns in word correlations and thus fails to apply the concept of transliteration and loan words for proper names. Given the entrenched nature of transliterated loanwords across the world's languages, it is notable that ChatGPT was unable to extrapolate that concept to this specific usage – reflecting once again the brittleness of these models.

In the end, LLMs appear to do a better job at encoding and applying colloquial language use, but can struggle just as much as traditional NMTs with the rules for translation versus preserving transliterated loanwords.