Transcribing 9.5M Minutes Of Global Television News Through Google's Chirp

TheĀ TV News Visual Explorer encompasses selections of television news from 108 channels spanning 50 countries and territories in 35 languages and dialects over 20 years on 5 continents in collaboration with the Internet Archive's Television News Archive, totaling more than 5 million broadcasts over 3.4 million hours of programming. For the past year and a half we have been live transcribing and translating six of those channels covering the Russian invasion of Ukraine, demonstrating the immense power of automated speech transcription systems to make television news "searchable" and analyzable by journalists and scholars. Today we are immensely excited to unveil the next major leap towards that vision: complete machine-generated transcripts spanning more than 9.5 million minutes of airtime across 14 channels from 12 countries and territories, generated through Google's Chirp large speech transcription model, offering the very first glimpse at the unprecedented new research questions and journalistic possibilities such transcripts can enable across massive global multilingual audio-visual archives. Already, these transcriptions have begun to shed light on just how common codeswitching and multilingual broadcasts are in parts of the world.

You can see the complete list of processed channels below and view them in the Visual Explorer. The start dates of the channels vary, but transcripts for all channels end around September 14, 2023. We are currently working on expanding these transcription archives further back historically across a larger set of channels and enabling them in realtime for all active channels. Note that for many of these channels, the majority of the broadcasts are not viewable at this time, but the transcript can be used alongside the thumbnails to understand the broadcast's focus. While broadcasts are not translated, if you launch the Visual Explorer in the Google Chrome browser, you can make use of its Google Translate integration to automatically translate broadcasts into your own language as you view them.

  • Algeria's Canal Algerie: January 2022 to mid-September 2023
  • Catalonia's TV3: January 2023 to mid-September 2023
  • China's CCTV-1 & CCTV-13: January 2023 to mid-September 2023
  • Iran's Press TV: January 2023 to mid-September 2023
  • Jordan's Jordan TV: January 2023 to mid-September 2023
  • Palestine's Palestinian Satellite Channel: January 2023 to mid-September 2023
  • Portugal's RTP Internacional: January 2023 to mid-September 2023
  • Republic of Congo's Tele Congo: January 2020 to mid-September 2023
  • South Sudan's Southern Sudan Television: January 2020 to mid-September 2023
  • Sudan's Sudan State Television: January 2020 to mid-September 2023
  • Taiwan's CTV & TTV: January 2023 to mid-September 2023
  • United Arab Emirates' Sharjah TV: January 2023 to mid-September 2023

Given that for most of these channels the broadcasts are not viewable at this time, you can chose to just look at China's CCTV-1 and CCTV-13 and Taiwan's CTV and TTV from the start of this year through September 14, 2023 to see the transcripts alongside the video.

Launch The Visual Explorer.
Launch The Visual Explorer With Just Chinese & Taiwanese Channels.