Experiments With Machine Translation: Global Geography & Unexpected Places In The News

Prior to the pandemic, the city of Wuhan in China, despite a population of more than 11 million, was hardly a household name across most of the world and was rarely, if ever, mentioned in much of the world's news landscape. The end result is that our Translingual 1.0 translation models, built on years of news monitoring in each language, do well with locations commonly discussed in those languages (typically locations within regions where a language is spoken or where there are strong economic, political or cultural ties), but struggle with locations they never before saw in that language. Thus, while Wuhan was well-represented in our Chinese-language translation models, the city was not well represented in many of our lower-resourced languages. At the same time, the overnight rise to global prominence of Wuhan reminds us how suddenly the most unexpected places can become central to understanding breaking events.

Towards this end, in Translingual 2.0 we have a dedicated effort to ensure it recognizes the majority of medium-sized cities around the world in each of our supported languages, which we hope will additionally open new doors to extended kinds of geographic analysis.