A First Look At The Linguistic Geography Of The Internet Archive's TV News Archive: 2.5M Hours Spanning 159 Languages

We are tremendously excited to unveil today a first look at the linguistic geography of the Internet Archive's TV News Archive spanning more than 100 channels from 50 countries on 5 continents over portions of the last 24 years. In collaboration with the Archive, we recently completed the machine transcription of all 2.5 million uncaptioned hours using GCP's Universal Speech Model Chirp ASR model in what we believe to be the single largest application of ASR to global television ever performed. Like most LSM transcription systems, Chirp only provides a textual transcript of each broadcast, it does not identify the underlying language represented by each word. Thus, after extensive testing of a range of language detection tools, we ultimately found CLD2 to be the most robust across the highly multilingual and codeswitching world of global television news and used it to process our entire archive.

We are excited today to present the findings of that massive analysis of the linguistic landscape of the Archive. In all, 75.2 billion characters totaling 107GB of text was transcribed spanning just under 2.5 million hours of airtime across 4.28 million broadcasts, with CLD identifying text in 159 languages. Using Chirp's precise begin/end timestamps for each individual transcribed word, we calculated the total percent of airtime in which speech was being uttered vs the tiny pauses between words and non-spoken airtime like music, silence, etc. In all, 80.53% of the total airtime of the archive contains a spoken word utterance. Breaking the Archive into "words" is complicated both by presence of scriptio-continua languages (non-space-delimited languages like Chinese) and Chirp's propensity when transcribing such languages to randomly alternate between correct transcription and inserting spaces between every character or "word". With those acknowledgements, the transcribed Archive contains an estimated 13.7 billion words. Across the entire archive, there were 9.44 million instances of language transitions such as code switching, in which at least 10 characters of one language were followed by at least 10 characters of a different language (shorter transitions were ignored as likely Chirp errors).

You can see the complete breakdown below in order of the number of transcribed characters, showing that Arabic dominates the Archive's holdings. Some of the languages, such as Klingon, are clearly incorrect, but with a surprising twist: in this case CLD2's Klingon detection appears to correspond to specific character sequences such as certain Romanization forms and certain examples of code switching.

ARABIC: 4632207900 (33.80%) words / 25234212170 (33.55%) chars / 45701278667 (42.64%) bytes / 1867174 (43.67%) shows  46852078.25 est min
ENGLISH: 3366543905 (24.56%) words / 19066809344 (25.35%) chars / 19069447127 (17.79%) bytes / 1771467 (41.44%) shows  35401130.71 est min
FRENCH: 1266182850 (9.24%) words / 7339612272 (9.76%) chars / 7541470343 (7.04%) bytes / 543328 (12.71%) shows  13627375.65 est min
RUSSIAN: 509762027 (3.72%) words / 3369440924 (4.48%) chars / 6087842033 (5.68%) bytes / 223986 (5.24%) shows  6256003.11 est min
VIETNAMESE: 723764782 (5.28%) words / 3160572303 (4.20%) chars / 4164899298 (3.89%) bytes / 189701 (4.44%) shows  5868199.09 est min
SPANISH: 408968223 (2.98%) words / 2395315786 (3.19%) chars / 2441154610 (2.28%) bytes / 179220 (4.19%) shows  4447355.91 est min
PORTUGUESE: 414537019 (3.02%) words / 2315845903 (3.08%) chars / 2389300151 (2.23%) bytes / 115175 (2.69%) shows  4299805.07 est min
PERSIAN: 471736156 (3.44%) words / 2272381886 (3.02%) chars / 4057767386 (3.79%) bytes / 201412 (4.71%) shows  4219105.92 est min
CATALAN: 345530535 (2.52%) words / 1916145929 (2.55%) chars / 1960532401 (1.83%) bytes / 41722 (0.98%) shows  3557686.62 est min
SERBIAN: 273376479 (1.99%) words / 1607559976 (2.14%) chars / 1660302069 (1.55%) bytes / 167582 (3.92%) shows  2984738.55 est min
THAI: 228211015 (1.67%) words / 1004056162 (1.34%) chars / 2539790759 (2.37%) bytes / 115826 (2.71%) shows  1864219.80 est min
AMHARIC: 193110032 (1.41%) words / 955156690 (1.27%) chars / 2466377081 (2.30%) bytes / 81426 (1.90%) shows  1773428.70 est min
AZERBAIJANI: 112629437 (0.82%) words / 829236097 (1.10%) chars / 959657808 (0.90%) bytes / 43016 (1.01%) shows  1539633.35 est min
UKRAINIAN: 117033606 (0.85%) words / 760686433 (1.01%) chars / 1373954937 (1.28%) bytes / 58136 (1.36%) shows  1412357.95 est min
TURKISH: 58798038 (0.43%) words / 417295862 (0.55%) chars / 453601664 (0.42%) bytes / 46678 (1.09%) shows  774788.54 est min
HINDI: 81059033 (0.59%) words / 381613682 (0.51%) chars / 961058132 (0.90%) bytes / 45634 (1.07%) shows  708537.84 est min
Chinese: 177538144 (1.30%) words / 367039503 (0.49%) chars / 716412923 (0.67%) bytes / 75834 (1.77%) shows  681478.12 est min
KURDISH: 56276929 (0.41%) words / 352781116 (0.47%) chars / 644677721 (0.60%) bytes / 54561 (1.28%) shows  655004.73 est min
SWEDISH: 46299294 (0.34%) words / 247884830 (0.33%) chars / 260280513 (0.24%) bytes / 22263 (0.52%) shows  460244.98 est min
GERMAN: 31837734 (0.23%) words / 204136636 (0.27%) chars / 206683727 (0.19%) bytes / 76791 (1.80%) shows  379018.20 est min
CROATIAN: 32243292 (0.24%) words / 176330240 (0.23%) chars / 181661517 (0.17%) bytes / 76298 (1.78%) shows  327390.38 est min
ChineseT: 33827008 (0.25%) words / 155207673 (0.21%) chars / 386886450 (0.36%) bytes / 45913 (1.07%) shows  288172.34 est min
ITALIAN: 20743468 (0.15%) words / 125129589 (0.17%) chars / 126261582 (0.12%) bytes / 35447 (0.83%) shows  232326.70 est min
BOSNIAN: 12039961 (0.09%) words / 69900165 (0.09%) chars / 71722013 (0.07%) bytes / 65685 (1.54%) shows  129782.85 est min
LINGALA: 11360058 (0.08%) words / 69827047 (0.09%) chars / 69990455 (0.07%) bytes / 40274 (0.94%) shows  129647.09 est min
Korean: 15474593 (0.11%) words / 58742821 (0.08%) chars / 138457229 (0.13%) bytes / 16748 (0.39%) shows  109067.14 est min
Unknown: 6975564 (0.05%) words / 45844805 (0.06%) chars / 83080431 (0.08%) bytes / 163977 (3.84%) shows  85119.53 est min
MACEDONIAN: 6753962 (0.05%) words / 41121677 (0.05%) chars / 74255914 (0.07%) bytes / 10219 (0.24%) shows  76350.16 est min
BELARUSIAN: 4473715 (0.03%) words / 30337943 (0.04%) chars / 54979302 (0.05%) bytes / 23209 (0.54%) shows  56328.12 est min
URDU: 6172542 (0.05%) words / 26301844 (0.03%) chars / 46377402 (0.04%) bytes / 17624 (0.41%) shows  48834.34 est min
SOMALI: 3691127 (0.03%) words / 24929192 (0.03%) chars / 24945750 (0.02%) bytes / 14229 (0.33%) shows  46285.75 est min
Japanese: 8673444 (0.06%) words / 20057383 (0.03%) chars / 41708482 (0.04%) bytes / 28366 (0.66%) shows  37240.32 est min
OROMO: 2245302 (0.02%) words / 18229302 (0.02%) chars / 18408856 (0.02%) bytes / 12282 (0.29%) shows  33846.14 est min
NORWEGIAN: 2729029 (0.02%) words / 13915265 (0.02%) chars / 14297447 (0.01%) bytes / 14491 (0.34%) shows  25836.32 est min
TIGRINYA: 2281939 (0.02%) words / 11647609 (0.02%) chars / 30298238 (0.03%) bytes / 5147 (0.12%) shows  21625.98 est min
DANISH: 2165171 (0.02%) words / 11167591 (0.01%) chars / 11490496 (0.01%) bytes / 48444 (1.13%) shows  20734.74 est min
SWAHILI: 1602058 (0.01%) words / 10435696 (0.01%) chars / 10467739 (0.01%) bytes / 18919 (0.44%) shows  19375.84 est min
HEBREW: 1730032 (0.01%) words / 9070395 (0.01%) chars / 16075986 (0.02%) bytes / 11732 (0.27%) shows  16840.90 est min
GALICIAN: 1526426 (0.01%) words / 8719818 (0.01%) chars / 8917868 (0.01%) bytes / 24178 (0.57%) shows  16189.99 est min
GREEK: 1282677 (0.01%) words / 7628951 (0.01%) chars / 13787652 (0.01%) bytes / 10376 (0.24%) shows  14164.59 est min
HAUSA: 1156431 (0.01%) words / 6111825 (0.01%) chars / 6141307 (0.01%) bytes / 10043 (0.23%) shows  11347.76 est min
INDONESIAN: 804657 (0.01%) words / 4909057 (0.01%) chars / 4930359 (0.00%) bytes / 44582 (1.04%) shows  9114.59 est min
SLOVAK: 867248 (0.01%) words / 4837321 (0.01%) chars / 4969729 (0.00%) bytes / 36644 (0.86%) shows  8981.40 est min
ROMANIAN: 830513 (0.01%) words / 4607275 (0.01%) chars / 4937805 (0.00%) bytes / 16408 (0.38%) shows  8554.28 est min
POLISH: 545121 (0.00%) words / 3587387 (0.00%) chars / 3800446 (0.00%) bytes / 8699 (0.20%) shows  6660.66 est min
DUTCH: 603703 (0.00%) words / 3167569 (0.00%) chars / 3177929 (0.00%) bytes / 33540 (0.78%) shows  5881.19 est min
FINNISH: 446762 (0.00%) words / 3026870 (0.00%) chars / 3193928 (0.00%) bytes / 7982 (0.19%) shows  5619.96 est min
WOLOF: 454207 (0.00%) words / 2378414 (0.00%) chars / 2440538 (0.00%) bytes / 7924 (0.19%) shows  4415.97 est min
HUNGARIAN: 313051 (0.00%) words / 2048764 (0.00%) chars / 2216896 (0.00%) bytes / 8225 (0.19%) shows  3803.92 est min
KINYARWANDA: 310624 (0.00%) words / 2014520 (0.00%) chars / 2035423 (0.00%) bytes / 13110 (0.31%) shows  3740.34 est min
TAGALOG: 287971 (0.00%) words / 1637419 (0.00%) chars / 1643592 (0.00%) bytes / 9788 (0.23%) shows  3040.18 est min
MALAY: 238709 (0.00%) words / 1540066 (0.00%) chars / 1558961 (0.00%) bytes / 21913 (0.51%) shows  2859.42 est min
NORWEGIAN_N: 272633 (0.00%) words / 1455994 (0.00%) chars / 1498078 (0.00%) bytes / 21943 (0.51%) shows  2703.33 est min
IGBO: 244491 (0.00%) words / 1412956 (0.00%) chars / 1503972 (0.00%) bytes / 7804 (0.18%) shows  2623.42 est min
BURMESE: 234518 (0.00%) words / 1346703 (0.00%) chars / 3566609 (0.00%) bytes / 2617 (0.06%) shows  2500.41 est min
ALBANIAN: 232183 (0.00%) words / 1252130 (0.00%) chars / 1347718 (0.00%) bytes / 3548 (0.08%) shows  2324.82 est min
PASHTO: 234669 (0.00%) words / 1203068 (0.00%) chars / 2139946 (0.00%) bytes / 15251 (0.36%) shows  2233.72 est min
UZBEK: 183341 (0.00%) words / 1173488 (0.00%) chars / 1600230 (0.00%) bytes / 20558 (0.48%) shows  2178.80 est min
CZECH: 187892 (0.00%) words / 1081360 (0.00%) chars / 1160102 (0.00%) bytes / 15431 (0.36%) shows  2007.75 est min
LAOTHIAN: 238298 (0.00%) words / 1070721 (0.00%) chars / 2712285 (0.00%) bytes / 11994 (0.28%) shows  1988.00 est min
NEPALI: 176148 (0.00%) words / 910517 (0.00%) chars / 2345062 (0.00%) bytes / 6068 (0.14%) shows  1690.55 est min
SLOVENIAN: 155365 (0.00%) words / 906667 (0.00%) chars / 930074 (0.00%) bytes / 7164 (0.17%) shows  1683.40 est min
ARMENIAN: 133145 (0.00%) words / 866072 (0.00%) chars / 1565679 (0.00%) bytes / 1809 (0.04%) shows  1608.03 est min
YORUBA: 176688 (0.00%) words / 863219 (0.00%) chars / 941756 (0.00%) bytes / 5985 (0.14%) shows  1602.73 est min
GUARANI: 129844 (0.00%) words / 837548 (0.00%) chars / 851616 (0.00%) bytes / 9103 (0.21%) shows  1555.07 est min
SINDHI: 175040 (0.00%) words / 835381 (0.00%) chars / 1474890 (0.00%) bytes / 10135 (0.24%) shows  1551.04 est min
KHMER: 182590 (0.00%) words / 833962 (0.00%) chars / 2125508 (0.00%) bytes / 3442 (0.08%) shows  1548.41 est min
BULGARIAN: 125511 (0.00%) words / 772579 (0.00%) chars / 1295363 (0.00%) bytes / 12619 (0.30%) shows  1434.44 est min
MARATHI: 122444 (0.00%) words / 667032 (0.00%) chars / 1666988 (0.00%) bytes / 9837 (0.23%) shows  1238.47 est min
SANSKRIT: 99107 (0.00%) words / 632241 (0.00%) chars / 1417199 (0.00%) bytes / 11459 (0.27%) shows  1173.87 est min
LATIN: 92278 (0.00%) words / 585398 (0.00%) chars / 599444 (0.00%) bytes / 23733 (0.56%) shows  1086.90 est min
ICELANDIC: 101928 (0.00%) words / 566075 (0.00%) chars / 628970 (0.00%) bytes / 2700 (0.06%) shows  1051.03 est min
AFAR: 64885 (0.00%) words / 524631 (0.00%) chars / 530677 (0.00%) bytes / 7072 (0.17%) shows  974.08 est min
OCCITAN: 88282 (0.00%) words / 504840 (0.00%) chars / 514991 (0.00%) bytes / 5078 (0.12%) shows  937.33 est min
AFRIKAANS: 98207 (0.00%) words / 490272 (0.00%) chars / 492626 (0.00%) bytes / 4063 (0.10%) shows  910.28 est min
LITHUANIAN: 71589 (0.00%) words / 463378 (0.00%) chars / 493664 (0.00%) bytes / 6922 (0.16%) shows  860.35 est min
BENGALI: 69914 (0.00%) words / 443360 (0.00%) chars / 1094148 (0.00%) bytes / 4894 (0.11%) shows  823.18 est min
NYANJA: 61162 (0.00%) words / 409403 (0.00%) chars / 411780 (0.00%) bytes / 2568 (0.06%) shows  760.13 est min
SCOTS_GAELIC: 65091 (0.00%) words / 381410 (0.00%) chars / 388152 (0.00%) bytes / 12203 (0.29%) shows  708.16 est min
LATVIAN: 59656 (0.00%) words / 369067 (0.00%) chars / 393014 (0.00%) bytes / 7856 (0.18%) shows  685.24 est min
XHOSA: 48070 (0.00%) words / 348361 (0.00%) chars / 353109 (0.00%) bytes / 7063 (0.17%) shows  646.80 est min
ZULU: 44839 (0.00%) words / 346612 (0.00%) chars / 347931 (0.00%) bytes / 2943 (0.07%) shows  643.55 est min
GANDA: 46769 (0.00%) words / 317285 (0.00%) chars / 319389 (0.00%) bytes / 4774 (0.11%) shows  589.10 est min
BIHARI: 66942 (0.00%) words / 309739 (0.00%) chars / 766344 (0.00%) bytes / 6486 (0.15%) shows  575.09 est min
BASQUE: 50153 (0.00%) words / 306354 (0.00%) chars / 313363 (0.00%) bytes / 9479 (0.22%) shows  568.80 est min
MALAGASY: 43714 (0.00%) words / 304258 (0.00%) chars / 310058 (0.00%) bytes / 13646 (0.32%) shows  564.91 est min
TAJIK: 44468 (0.00%) words / 288368 (0.00%) chars / 510163 (0.00%) bytes / 3874 (0.09%) shows  535.41 est min
SHONA: 36520 (0.00%) words / 249561 (0.00%) chars / 254836 (0.00%) bytes / 3222 (0.08%) shows  463.36 est min
HAWAIIAN: 32856 (0.00%) words / 235956 (0.00%) chars / 239080 (0.00%) bytes / 4794 (0.11%) shows  438.10 est min
JAVANESE: 36285 (0.00%) words / 234151 (0.00%) chars / 240560 (0.00%) bytes / 8811 (0.21%) shows  434.75 est min
GEORGIAN: 31990 (0.00%) words / 219467 (0.00%) chars / 574288 (0.00%) bytes / 1722 (0.04%) shows  407.48 est min
CORSICAN: 33716 (0.00%) words / 203982 (0.00%) chars / 206947 (0.00%) bytes / 4361 (0.10%) shows  378.73 est min
SCOTS: 36027 (0.00%) words / 199255 (0.00%) chars / 202998 (0.00%) bytes / 10884 (0.25%) shows  369.95 est min
GUJARATI: 38757 (0.00%) words / 188435 (0.00%) chars / 482937 (0.00%) bytes / 1973 (0.05%) shows  349.87 est min
IRISH: 34390 (0.00%) words / 174082 (0.00%) chars / 177748 (0.00%) bytes / 13023 (0.30%) shows  323.22 est min
TELUGU: 25260 (0.00%) words / 172346 (0.00%) chars / 461564 (0.00%) bytes / 446 (0.01%) shows  319.99 est min
SINHALESE: 29174 (0.00%) words / 170058 (0.00%) chars / 442607 (0.00%) bytes / 1415 (0.03%) shows  315.74 est min
ESTONIAN: 25813 (0.00%) words / 161687 (0.00%) chars / 165538 (0.00%) bytes / 4770 (0.11%) shows  300.20 est min
MONGOLIAN: 26888 (0.00%) words / 160527 (0.00%) chars / 283249 (0.00%) bytes / 2943 (0.07%) shows  298.05 est min
KYRGYZ: 24140 (0.00%) words / 147539 (0.00%) chars / 260045 (0.00%) bytes / 3322 (0.08%) shows  273.93 est min
MAORI: 24570 (0.00%) words / 134334 (0.00%) chars / 138822 (0.00%) bytes / 4954 (0.12%) shows  249.42 est min
TAMIL: 17379 (0.00%) words / 131552 (0.00%) chars / 354369 (0.00%) bytes / 812 (0.02%) shows  244.25 est min
X_PIG_LATIN: 15217 (0.00%) words / 121385 (0.00%) chars / 125180 (0.00%) bytes / 8475 (0.20%) shows  225.37 est min
MAURITIAN_CREOLE: 19286 (0.00%) words / 117019 (0.00%) chars / 126407 (0.00%) bytes / 6215 (0.15%) shows  217.27 est min
FIJIAN: 18018 (0.00%) words / 116984 (0.00%) chars / 122014 (0.00%) bytes / 2888 (0.07%) shows  217.20 est min
MALTESE: 17449 (0.00%) words / 113334 (0.00%) chars / 117271 (0.00%) bytes / 3025 (0.07%) shows  210.43 est min
WARAY_PHILIPPINES: 16832 (0.00%) words / 112434 (0.00%) chars / 114800 (0.00%) bytes / 3804 (0.09%) shows  208.75 est min
KANNADA: 15912 (0.00%) words / 102226 (0.00%) chars / 273192 (0.00%) bytes / 637 (0.01%) shows  189.80 est min
FAROESE: 14257 (0.00%) words / 98389 (0.00%) chars / 105263 (0.00%) bytes / 4018 (0.09%) shows  182.68 est min
KAZAKH: 15456 (0.00%) words / 98093 (0.00%) chars / 174058 (0.00%) bytes / 2001 (0.05%) shows  182.13 est min
TSONGA: 15649 (0.00%) words / 97303 (0.00%) chars / 98504 (0.00%) bytes / 2689 (0.06%) shows  180.66 est min
QUECHUA: 14147 (0.00%) words / 97160 (0.00%) chars / 99729 (0.00%) bytes / 2534 (0.06%) shows  180.40 est min
BISLAMA: 16153 (0.00%) words / 97012 (0.00%) chars / 98946 (0.00%) bytes / 6381 (0.15%) shows  180.12 est min
SUNDANESE: 14526 (0.00%) words / 94915 (0.00%) chars / 96378 (0.00%) bytes / 2844 (0.07%) shows  176.23 est min
X_KLINGON: 17914 (0.00%) words / 93549 (0.00%) chars / 97196 (0.00%) bytes / 4619 (0.11%) shows  173.69 est min
TONGA: 15828 (0.00%) words / 92460 (0.00%) chars / 95396 (0.00%) bytes / 4614 (0.11%) shows  171.67 est min
LUXEMBOURGISH: 16509 (0.00%) words / 91879 (0.00%) chars / 95734 (0.00%) bytes / 8980 (0.21%) shows  170.59 est min
BRETON: 15468 (0.00%) words / 90101 (0.00%) chars / 94280 (0.00%) bytes / 8276 (0.19%) shows  167.29 est min
TSWANA: 16585 (0.00%) words / 89916 (0.00%) chars / 91423 (0.00%) bytes / 4313 (0.10%) shows  166.95 est min
WELSH: 15195 (0.00%) words / 89820 (0.00%) chars / 92195 (0.00%) bytes / 7940 (0.19%) shows  166.77 est min
KHASI: 15711 (0.00%) words / 84530 (0.00%) chars / 87079 (0.00%) bytes / 6334 (0.15%) shows  156.95 est min
SAMOAN: 14553 (0.00%) words / 81696 (0.00%) chars / 85054 (0.00%) bytes / 4890 (0.11%) shows  151.68 est min
VOLAPUK: 14909 (0.00%) words / 81637 (0.00%) chars / 86758 (0.00%) bytes / 4581 (0.11%) shows  151.57 est min
INTERLINGUE: 13985 (0.00%) words / 80611 (0.00%) chars / 81823 (0.00%) bytes / 6297 (0.15%) shows  149.67 est min
SESOTHO: 14406 (0.00%) words / 78874 (0.00%) chars / 80143 (0.00%) bytes / 2307 (0.05%) shows  146.44 est min
TATAR: 13535 (0.00%) words / 76716 (0.00%) chars / 96882 (0.00%) bytes / 5853 (0.14%) shows  142.44 est min
ESPERANTO: 11985 (0.00%) words / 76544 (0.00%) chars / 78622 (0.00%) bytes / 5319 (0.12%) shows  142.12 est min
RHAETO_ROMANCE: 14373 (0.00%) words / 75813 (0.00%) chars / 78989 (0.00%) bytes / 4704 (0.11%) shows  140.76 est min
AKAN: 14851 (0.00%) words / 75523 (0.00%) chars / 80647 (0.00%) bytes / 2824 (0.07%) shows  140.22 est min
TURKMEN: 10626 (0.00%) words / 72122 (0.00%) chars / 93092 (0.00%) bytes / 3972 (0.09%) shows  133.91 est min
MANX: 12096 (0.00%) words / 70503 (0.00%) chars / 72957 (0.00%) bytes / 4909 (0.11%) shows  130.90 est min
SESELWA: 11179 (0.00%) words / 67878 (0.00%) chars / 69159 (0.00%) bytes / 3674 (0.09%) shows  126.03 est min
ORIYA: 29783 (0.00%) words / 66972 (0.00%) chars / 142411 (0.00%) bytes / 17574 (0.41%) shows  124.35 est min
RUNDI: 9932 (0.00%) words / 65632 (0.00%) chars / 67552 (0.00%) bytes / 3494 (0.08%) shows  121.86 est min
HMONG: 9571 (0.00%) words / 65223 (0.00%) chars / 69673 (0.00%) bytes / 5223 (0.12%) shows  121.10 est min
INTERLINGUA: 9772 (0.00%) words / 61043 (0.00%) chars / 62136 (0.00%) bytes / 3005 (0.07%) shows  113.34 est min
GREENLANDIC: 8502 (0.00%) words / 53984 (0.00%) chars / 59299 (0.00%) bytes / 2678 (0.06%) shows  100.23 est min
CEBUANO: 8080 (0.00%) words / 51297 (0.00%) chars / 51761 (0.00%) bytes / 1207 (0.03%) shows  95.24 est min
PEDI: 8130 (0.00%) words / 47294 (0.00%) chars / 47984 (0.00%) bytes / 1631 (0.04%) shows  87.81 est min
FRISIAN: 6823 (0.00%) words / 46524 (0.00%) chars / 49229 (0.00%) bytes / 2992 (0.07%) shows  86.38 est min
HAITIAN_CREOLE: 8965 (0.00%) words / 46324 (0.00%) chars / 47654 (0.00%) bytes / 4119 (0.10%) shows  86.01 est min
BASHKIR: 6182 (0.00%) words / 46180 (0.00%) chars / 73117 (0.00%) bytes / 3583 (0.08%) shows  85.74 est min
AYMARA: 6453 (0.00%) words / 39335 (0.00%) chars / 43159 (0.00%) bytes / 2066 (0.05%) shows  73.03 est min
PUNJABI: 8290 (0.00%) words / 38158 (0.00%) chars / 97679 (0.00%) bytes / 236 (0.01%) shows  70.85 est min
SISWANT: 5303 (0.00%) words / 37007 (0.00%) chars / 38575 (0.00%) bytes / 939 (0.02%) shows  68.71 est min
VENDA: 6329 (0.00%) words / 34679 (0.00%) chars / 35240 (0.00%) bytes / 2391 (0.06%) shows  64.39 est min
UIGHUR: 4243 (0.00%) words / 28092 (0.00%) chars / 45386 (0.00%) bytes / 2037 (0.05%) shows  52.16 est min
NAURU: 4373 (0.00%) words / 27147 (0.00%) chars / 27749 (0.00%) bytes / 1689 (0.04%) shows  50.40 est min
ABKHAZIAN: 3178 (0.00%) words / 25299 (0.00%) chars / 43191 (0.00%) bytes / 925 (0.02%) shows  46.97 est min
INUPIAK: 1373 (0.00%) words / 12027 (0.00%) chars / 12078 (0.00%) bytes / 635 (0.01%) shows  22.33 est min
KASHMIRI: 2317 (0.00%) words / 9898 (0.00%) chars / 15906 (0.00%) bytes / 763 (0.02%) shows  18.38 est min
ZHUANG: 1506 (0.00%) words / 9100 (0.00%) chars / 9429 (0.00%) bytes / 841 (0.02%) shows  16.90 est min
MALAYALAM: 1107 (0.00%) words / 8106 (0.00%) chars / 21810 (0.00%) bytes / 202 (0.00%) shows  15.05 est min
SANGO: 1115 (0.00%) words / 6531 (0.00%) chars / 6727 (0.00%) bytes / 296 (0.01%) shows  12.13 est min
ASSAMESE: 223 (0.00%) words / 679 (0.00%) chars / 1528 (0.00%) bytes / 146 (0.00%) shows  1.26 est min
YIDDISH: 96 (0.00%) words / 458 (0.00%) chars / 776 (0.00%) bytes / 51 (0.00%) shows  0.85 est min
NDEBELE: 72 (0.00%) words / 444 (0.00%) chars / 448 (0.00%) bytes / 14 (0.00%) shows  0.82 est min
TIBETAN: 1 (0.00%) words / 13 (0.00%) chars / 24 (0.00%) bytes / 1 (0.00%) shows  0.02 est min