The GDELT Project

Experiments With Meta's SeamlessM4T Open Machine Translation Model: Medium Model Outperforming Large Model

Continuing our series of evaluating Meta's new SeamlessM4T multimodal translation model, there are actually two versions of the model: large and medium. We've been testing the large model up until now under the assumption that it would provide the highest accuracy, while the medium model would provide lower accuracy but faster performance. Here we will test that theory. Surprisingly, the medium model actually correctly handles sentences that the large model struggles with: where the large model enters a failure state or truncates, the medium model provides a more comprehensive translation. However, the medium model also truncates text and mistranslates portions, as well as providing more stilted translations. Equally surprising is that both models take around the same time to load and translate on a V100 GPU, despite their different sizes. Overall, the medium model provides substantially superior results to the large model, offering an unexpected reminder that for AI models, larger is not always better and organizations should not automatically assume that the largest available model will yield the best possible results.

Returning to the full-length news article we examined, let's take one of the longer sentences and test it under both large and medium models:

time m4t_predict "李志辉告诉新黄河记者,他目前在沧州,女朋友是刁窝镇东辛庄村人,8月1日几位家人有的已经搬到附近的白塔村住,有的还在东辛庄村,跟张俊一样,李志辉女朋友一家也没想到,当地的水涨得如此快。" t2tt eng --src_lang cmn --model_name seamlessM4T_large
time m4t_predict "李志辉告诉新黄河记者,他目前在沧州,女朋友是刁窝镇东辛庄村人,8月1日几位家人有的已经搬到附近的白塔村住,有的还在东辛庄村,跟张俊一样,李志辉女朋友一家也没想到,当地的水涨得如此快。" t2tt eng --src_lang cmn --model_name seamlessM4T_medium

Below are the results. Note how the large model truncates the last part of the sentence about his girlfriend's family not expecting the water to rise so quickly. In contrast, the medium model yields more stilted results and transliterates the names differently, but translates the complete passage, matching Google Translate's results more closely:

What about the sentence that caused the large model to fail? Recall that one of the sentences yielded the phrase "the Little River" repeated over and over again instead of the correct translation. Could the medium model yield correct results?

time m4t_predict "目前,涿州境内北拒马河、小清河、白沟河等多条河流流量较 大,小清河分洪区、兰沟洼蓄滞洪区已相继启动。" t2tt eng --src_lang cmn --model_name seamlessM4T_large
time m4t_predict "目前,涿州境内北拒马河、小清河、白沟河等多条河流流量较 大,小清河分洪区、兰沟洼蓄滞洪区已相继启动。" t2tt eng --src_lang cmn --model_name seamlessM4T_medium

Below are the results. Note how the large model enters a failure state and cannot translate the text. The medium model, however, does partially translate the sentence, but it truncates a substantial portion and mistranslates several of the names, suggesting this sentence is problematic across both models: