Discrimination, Bias & Western Utopia In LLM-Based Machine Translation

Yesterday we explored how LLM-based machine translation poses novel challenges to translation workflows, encoding a set of "Western values" that they apply to their translation tasks. Classical rules-based, SMT and NMT machine translation systems neutrally and transparently translated texts regardless of what they said. In contrast, the emergent world of LLM-based machine translation is infused with explicit "values" that it uses to decide whether or not to translate a given passage. If a news article conflicts with the LLM's encoded values it will simply refuse to translate it.

The Values-Based Refusals of LLMs are designed to reduce societal harms, but actually end up increasing them given that the topical areas which LLMs refuse to translate for tend to center on specific kinds of content. For example, news coverage of Islam and non-Western religions, indigenous peoples, LGBTQ+ and women's issues, coverage of discrimination, etc are all blocked by LLMs at vastly higher rates than other topics. Feed an LLM translation system a stream of daily news coverage and the generated output will typically reflect a Western utopia, with negative stories that undermine that image (documentation of discrimination, calls for greater rights for underrepresented groups, the existence of religions other than Christianity, etc) often quietly excluded for violating "corporate values." This means that when translating news coverage at scale, whole swaths of perspectives and lived experiences are simply wiped away as inconvenient.

What will it mean for the future of society as the tools through which we see the world increasingly enforce "values" defined and enforced by private corporations?