Applying GCP's Speech-to-Text ASR To 101 Sample TV News Broadcasts Spanning 50 Countries

How does state-of-the-art speech transcription (ASR) like Google's Speech-to-Text API (STT) work across the rich diversity of languages, dialects and accents found in television news from around the world? The Internet Archive's Television News Archive offers an ideal testbed through which to explore real-world ASR performance, with global holdings spanning more than 100 channels across 50 countries and territories on 5 continents in at least 35 languages and dialects over 20 years. What would it look like to process one sample broadcast from each of these channels through the STT API? To explore this further, today we are releasing 100 fully automated transcripts generated by the STT API across a selection of television news broadcasts from around the world spanning two decades.

In collaboration with the Television News Archive, we selected one representative broadcast from each of the 100 channels available in the Visual Explorer. The majority of the Archive's international channels do not have web-playable video clips, meaning that you will only have the thumbnail gallery in the Visual Explorer to examine alongside the STT-generated transcript. However, for some international channels the Archive has over the years made one or two broadcasts playable as part of special collections, such as the 9/11 Archive, in which case that was the video we examined here. This means that for some channels, the specific broadcast examined may be extremely short or not as representative of the channel's overall coverage, but has the benefit of being able to compare the transcript with the actual audio of the broadcast. For the other channels we emphasized older broadcasts in many cases to test STT's ability to handle poorer-quality audio. Each broadcast below includes a notation beside it as to whether it has a playable video clip or not.

The audio of each broadcast was extracted from the MP4 container via ffmpeg to generate a FLAC file:

time find *.mp4 | parallel --eta 'ffmpeg -nostdin -hide_banner -loglevel panic -i ./{} -filter_complex "[0:a]channelsplit=channel_layout=stereo:channels=FL[left]" -map "[left]" -f flac ./{.}.flac'

We then submitted each video to the STT API using the following query:

curl -s -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" -H "x-goog-user-project: [YOURPROJECTID]" --data "{
  'config': {
    'encoding': 'FLAC',
    'languageCode': '[LANGCODE]',
    'enableWordTimeOffsets': true,
    'enableWordConfidence': true,
    'enableAutomaticPunctuation': true,
    'maxAlternatives': 30,
    'model': '[MODEL]'
  'audio': {
  'output_config': {

Where possible, we use the "latest_long" model which uses the most recent available model that is tailored for long-form spoken word content and is roughly equivalent to the "video" model nomenclature of the Video API. In some cases, only the "default" model is available for certain languages or dialects, which may result in reduced accuracy.

In cases where the API determines multiple possible transcriptions for a given utterance, we request up to 30 alternatives ordered by confidence. To avoid receiving a single massive blob of text, we enable automatic punctuation, which splits the text into sentences. We also ask the API to return the precise timestamp of each recognized word and its confidence in its recognition of that word.

You can see the final results in the table below, with the language code and model used for each broadcast, along with a link to the STT-generated transcript JSON.

Channel LangCode Model Visual Explorer STT Transcript
ABC (KGO) en-US latest_long View KGO_20120102_013000_ABC_World_News_With_David_Muir
Algeria's Canal Algérie fr-FR latest_long View CANALALGERIE_20120101_070000
Azerbaijan's AzTV az-AZ default View AZTV_20150330_120000_Azerbaijani_Russian_and_English_Programming_from_Azerbaijan
BBC News London en-GB latest_long View BBCNEWS_20120101_170000_BBC_NEWS
Belarus 24 ru-RU latest_long View BELARUSTV_20221005_161500
Bloomberg en-US latest_long View BLOOMBERG_20200212_183000_Bloomberg_Markets_Americas
CBS (KPIX) en-US latest_long View KPIX_20181018_003000_CBS_Evening_News_with_Jeff_Glor
China's CCTV News en-US latest_long View CCTVNEWS_20120916_131312
China's CCTV-3 zh default View CCTV3_20010830_123000_China_Central_TV
China's CCTV-4 zh default View CCTV4_20090903_190000
China's CCTV-9 News en-US latest_long View CCTV9_20120101_103000
CNBC en-US latest_long View CNBC_20200212_100000_Worldwide_Exchange
CNN en-US latest_long View CNNW_20120101_220000_CNN_Newsroom
Cubavision International es-US latest_long View CUBA_20110315_233000
Deutsche Welle (DW) English en-US latest_long View DW_20181017_200000_DW_News_-_News
Dubai TV ar-AE latest_long View DUBAI_20111229_080000
Egypt's Al Masriyah ar-EG latest_long View ESC1_20110806_190000
Ethiopia's ETV am-ET default View ETV_20181011_100000
FOX (KTVU) en-US latest_long View KTVU_20120102_010000_News_at_5pm
Fox Business en-US latest_long View FBC_20200212_170000_Cavuto_Coast_to_Coast
Fox News en-US latest_long View FOXNEWSW_20200213_010000_Tucker_Carlson_Tonight
France 24 en-US latest_long View FRANCE24_20120101_170000
France's ARTE de-DE latest_long View ARTEDE_20130103_230000
France's TV5Monde fr-FR latest_long View TV5MONDE_20090617_113000_Le_Journal_de_la_RTBF
Germany's ARD de-DE latest_long View ARD_20130103_213000
Germany's WDR de-DE latest_long View WDR_20120101_181000_Aktuelle_Stunde
Greece's ANT1 el-GR default View ANT1_20010914_043000_Antenna_1_Greece
India's NDTV en-IN latest_long View NDTV_20111230_183000_India
India's Zee TV hi-IN latest_long View ZEETV_20120101_050000_Hindi_New
Iran's Al-Alam fa-IR default View ALALAM_20121028_130000
Iran's IRIB TV2 fa-IR default View IRIB2_20120101_070000
Iran's IRINN fa-IR default View IRINN_20120101_053000
Iran's Press TV fa-IR default View PRESSTV_20111228_130000
Iran's Simaye Azadi fa-IR default View SAMAYEAZADI_20120101_140100
Iraq TV ar-IQ latest_long View IRAQ_20010917_043000_Iraq_Satellite_Channel
Iraq's Al Forat Network ar-IQ latest_long View ALFORAT_20111229_183000
Iraq's Al Iraqiya ar-IQ latest_long View ALIRAQUIA_20120101_050000
Iraq's Al-Etejah TV ar-IQ latest_long View ALETEJAHTV_20130817_133000
Iraq's Al-Fayhaa TV ar-IQ latest_long View ALFAYHAA_20120101_050100
Italy's RAI 1 it-IT latest_long View RAI1_20130102_050000
Italy's RAI International it-IT latest_long View RAI_20010313_003000_Telegiornale_RAI
Italy's RAI News it-IT latest_long View RAINEWS_20130101_230000
Jordan TV ar-JO latest_long View JORDANTV_20120101_030000
KRON (MyNetworkTV) en-US latest_long View KRON_20120102_040000_KRON_4_News_at_9
Kurdistan Regions Kurdsat ar-EG latest_long View KURDSAT_20120101_170100
Kuwait Television ar-KW latest_long View KUWAIT_20090809_210000
Lebanon's Al Jadeed (New TV) ar-LB latest_long View NEWTV_20111228_120000
Lebanon's Future Television ar-LB latest_long View FUTURE_20111229_183000
Libya's LJBC ar-EG latest_long View LIBYA_20100910_170000
Mexico’s TV Azteca es-ES latest_long View AZT_20010917_030000_Noticiario_Hechos
Morocco's Al Maghribia ar-MA latest_long View ALMAGHRIBIA_20120101_090000
Morroco's 2M Monde ar-MA latest_long View M2MOROCCO_20120101_140100
MSNBC en-US latest_long View MSNBCW_20120101_190000_Meet_the_Press
NBC (KNTV) en-US latest_long View KNTV_20120119_013000_NBC_Nightly_News
Nigeria's NTA International en-NG default View NTA_20120101_201500
North Macedonia's MRT Sat mk-MK latest_long View MKTV_20121024_210000
Oman TV ar-OM latest_long View OMAN_20120101_183000
Palestine Satellite Channel ar-PS latest_long View PSC_20120101_163000
PBS (KQED) en-US latest_long View KQED_20111231_020000_PBS_NewsHour
Portugal’s RTP Internacional (RTPi) pt-PT latest_long View RTPI_20120101_201600
Qatar TV ar-QA latest_long View QATARTV_20120101_160000
Qatar's Al Jazeera English en-US latest_long View ALJAZ_20120101_070100
Radio Television of Serbia sr-RS default View RTSSAT_20120419_060000
Republic of Congo’s Télé Congo fr-FR latest_long View TELECONGO_20120101_200100
Romania's TVR Info ro-RO latest_long View TVRI_20120101_183100
Russia 1 ru-RU latest_long View RUSSIA1_20221005_143000_60_minut
Russia 24 ru-RU latest_long View RUSSIA24_20221005_170200_Vesti_s_Alekseem_Kazakovim
Russia Today en-US latest_long View RT_20120101_180100
Russia's 1TV ru-RU latest_long View 1TV_20221005_062000_AntiFeik
Russia's NTV ru-RU latest_long View NTV_20221005_160000_Segodnya
Russia's TV Rain ru-RU latest_long View TVRAIN_20180420_020000
Saudi Arabia's Al Saudiya ar-SA latest_long View SAUDI_20120101_190000
SCOLA Jordan News ar-JO latest_long View SCOLA_20120102_193000_Jordan_News
SCOLA Lebanon News ar-LB latest_long View SCOLA3_20120101_060000_Lebanon_News
SCOLA Qatar News ar-QA latest_long View SCOLA2_20120102_213000_Qatar_News
SCOLA Syria News ar-EG latest_long View SCOLA4_20120102_223000_Syria_News
SCOLA UAE News ar-AE latest_long View SCOLA5_20120101_235500_United_Arab_Emirates
Senegal's RTS Diaspora fr-FR latest_long View RTSDIASPORA_20110805_033000
South Korea's KBS World ko-KR latest_long View KBSWORLD_20100613_040000_KBS_News_9
South Korea's MBC ko-KR latest_long View MBC_20111230_145000_MBCNewsDesk
Southern Sudan Television ar-EG latest_long View SOUTHERNSUDAN_20120101_190000
Sudan State TV ar-EG latest_long View SUDAN_20120101_150000
Sweden's SVT1 sv-SE default View SVT1_20111027_140500_Gomorron_Sverige
Switzerland's TSR 1 fr-CH default View TSR1_20120101_103000_Le_Journal
Syria TV ar-EG latest_long View SYRIANTV_20120101_190000
Telemundo (KSTS) es-US latest_long View KSTS_20200213_013000_Noticiero_Telemundo_48
Thailand's Thai TV Global Network th-TH latest_long View TGN_20120102_003100
Tunisia's El Watania 1 ar-TN latest_long View TV7TUNIS_20120101_190000
Turkey's TRT 1 tr-TR latest_long View TRT1_20120101_000100
Turkey's TRT Türk tr-TR latest_long View TRTTURK_20120101_173100
Ukraine's Espreso TV uk-UA latest_long View ESPRESO_20221005_143000
United Arab Emirates' Sharjah TV ar-AE latest_long View SHARJAHTV_20120101_200000
United Kingdom's BBC Arabic Television ar-EG latest_long View BBCARABIC_20111229_161000
United Kingdom's Sky News en-GB latest_long View SKY_20090618_160000_Live_At_Five_With_Jeremy_Thompson
Univision (KDTV) es-US latest_long View KDTV_20120101_170000_Al_Punto
US-based Galavisión es-US latest_long View GALA_20121005_070000_Hasta_Que_el_Dinero_Nos_Separe
Venezuela's teleSUR es-US latest_long View TELESUR_20120101_133000
VietFace TV vi-VN latest_long View VIETFACETV_20120101_070100
Vietnam's VTV4 vi-VN latest_long View VTV4_20111230_170000_VTV4Newsreel
Yemen TV ar-YE latest_long View YEMENTV_20120101_130000