The GDELT Project

Comparing Human Vs Machine Transcription And Translation Of A Russian Television News Broadcast

Yesterday the Russian Media Monitor tweeted out a clip from a Russian television news broadcast with human-translated English language subtitles. Given the enormous advances in speech recognition and machine translation, could a fully automated transcription and translation come close to those human subtitles? To test this, we applied OpenAI's Whisper ASR to translate the original broadcast into English and compared it with the Russian Media Monitor's translations. The end result is that Whisper's fully automated English translation is extremely similar to Russian Media Monitor's human translation to the point of being nearly interchangeable. Only five significant differences are observed, two of which change the underlying meaning.

It turns out that despite Russian Media Monitor's clip appearing to be a single extracted excerpt from the broadcast, it is actually composed of 9 independent clips from the broadcast woven together out of order. While we are working on fully autonomous clip searching and deconstruction analysis, for this comparison the TV News Archive's founder Roger Macdonald manually deconstructed the Russian Media Monitor clip into its component excerpts from the original broadcast to make it possible to more directly compare their transcripts.

In addition to running Whisper over the original NTV broadcast, we also applied it to RMM's excerpted video, just in case the edits made in the video clipped any of the words from the original broadcast in a way that would change how Whisper translated it.

Original Broadcast:

View The Entire Broadcast In Visual Explorer.
Whisper-Generated Transcription (Broadcast).
Whisper-Generated Translation Into English (Broadcast).
Google STT-Generated Transcript (Broadcast).

Twitter Video:

View The Russian Media Monitor Twitter Video.
Whisper-Generated Translation Into English (Twitter Video). [SRT] [VTT]

Comparison