Transcribing 2.5M Hours Of TV News: How Television News Across The World Relies More On Subtitles Than Dubbing For Multilingual Speech

When American televisions news channels broadcast a clip of someone speaking a language other than English, they typically begin with a few seconds of the original audio and then switch to English overdubbing in which the English translation is heard loudly, with the original audio playing faintly under the overdubbed translation. While faint, the original audio is typically clear enough that in the past we've been able to use audio fingerprinting to track State of the Union and debate quotes across global television.

We've never before been able to examine how the rest of the world handles multilingual content in their television news broadcasts because we've never had speech recognition (ASR) tools capable of seamlessly transcribing multiple intermixed languages in a single video. With the advent of ASR LSMs (Large Speech Models) like Chirp, for the first time in history we're able to begin exploring this question at global scale. As we begin to examine the Internet Archives' TV News Archive's 2.5 million hours of global television news spanning 50 countries over portions of the last 20 years that we transcribed through Chirp, a fascinating finding we are discovering is that outside of the United States, most of the broadcasts we're examining don't use dubbing: instead they play the original foreign language audio and display the translation using subtitling in the chyron or other onscreen text.

As we scale up our analysis of the complete archive we'll be learning far more about this phenomena, but thus far it is fascinating to see this strong divergence between overdubbing and subtitling in multilingual content across the world.