We are tremendously excited to unveil today a first look at the linguistic geography of the Internet Archive's TV News Archive spanning more than 100 channels from 50 countries on 5 continents over portions of the last 24 years. In collaboration with the Archive, we recently completed the machine transcription of all 2.5 million uncaptioned hours using GCP's Universal Speech Model Chirp ASR model in what we believe to be the single largest application of ASR to global television ever performed. Like most LSM transcription systems, Chirp only provides a textual transcript of each broadcast, it does not identify the underlying language represented by each word. Thus, after extensive testing of a range of language detection tools, we ultimately found CLD2 to be the most robust across the highly multilingual and codeswitching world of global television news and used it to process our entire archive.
We are excited today to present the findings of that massive analysis of the linguistic landscape of the Archive. In all, 75.2 billion characters totaling 107GB of text was transcribed spanning just under 2.5 million hours of airtime across 4.28 million broadcasts, with CLD identifying text in 159 languages. Using Chirp's precise begin/end timestamps for each individual transcribed word, we calculated the total percent of airtime in which speech was being uttered vs the tiny pauses between words and non-spoken airtime like music, silence, etc. In all, 80.53% of the total airtime of the archive contains a spoken word utterance. Breaking the Archive into "words" is complicated both by presence of scriptio-continua languages (non-space-delimited languages like Chinese) and Chirp's propensity when transcribing such languages to randomly alternate between correct transcription and inserting spaces between every character or "word". With those acknowledgements, the transcribed Archive contains an estimated 13.7 billion words. Across the entire archive, there were 9.44 million instances of language transitions such as code switching, in which at least 10 characters of one language were followed by at least 10 characters of a different language (shorter transitions were ignored as likely Chirp errors).
You can see the complete breakdown below in order of the number of transcribed characters, showing that Arabic dominates the Archive's holdings. Some of the languages, such as Klingon, are clearly incorrect, but with a surprising twist: in this case CLD2's Klingon detection appears to correspond to specific character sequences such as certain Romanization forms and certain examples of code switching.
ARABIC: 4632207900 (33.80%) words / 25234212170 (33.55%) chars / 45701278667 (42.64%) bytes / 1867174 (43.67%) shows 46852078.25 est min ENGLISH: 3366543905 (24.56%) words / 19066809344 (25.35%) chars / 19069447127 (17.79%) bytes / 1771467 (41.44%) shows 35401130.71 est min FRENCH: 1266182850 (9.24%) words / 7339612272 (9.76%) chars / 7541470343 (7.04%) bytes / 543328 (12.71%) shows 13627375.65 est min RUSSIAN: 509762027 (3.72%) words / 3369440924 (4.48%) chars / 6087842033 (5.68%) bytes / 223986 (5.24%) shows 6256003.11 est min VIETNAMESE: 723764782 (5.28%) words / 3160572303 (4.20%) chars / 4164899298 (3.89%) bytes / 189701 (4.44%) shows 5868199.09 est min SPANISH: 408968223 (2.98%) words / 2395315786 (3.19%) chars / 2441154610 (2.28%) bytes / 179220 (4.19%) shows 4447355.91 est min PORTUGUESE: 414537019 (3.02%) words / 2315845903 (3.08%) chars / 2389300151 (2.23%) bytes / 115175 (2.69%) shows 4299805.07 est min PERSIAN: 471736156 (3.44%) words / 2272381886 (3.02%) chars / 4057767386 (3.79%) bytes / 201412 (4.71%) shows 4219105.92 est min CATALAN: 345530535 (2.52%) words / 1916145929 (2.55%) chars / 1960532401 (1.83%) bytes / 41722 (0.98%) shows 3557686.62 est min SERBIAN: 273376479 (1.99%) words / 1607559976 (2.14%) chars / 1660302069 (1.55%) bytes / 167582 (3.92%) shows 2984738.55 est min THAI: 228211015 (1.67%) words / 1004056162 (1.34%) chars / 2539790759 (2.37%) bytes / 115826 (2.71%) shows 1864219.80 est min AMHARIC: 193110032 (1.41%) words / 955156690 (1.27%) chars / 2466377081 (2.30%) bytes / 81426 (1.90%) shows 1773428.70 est min AZERBAIJANI: 112629437 (0.82%) words / 829236097 (1.10%) chars / 959657808 (0.90%) bytes / 43016 (1.01%) shows 1539633.35 est min UKRAINIAN: 117033606 (0.85%) words / 760686433 (1.01%) chars / 1373954937 (1.28%) bytes / 58136 (1.36%) shows 1412357.95 est min TURKISH: 58798038 (0.43%) words / 417295862 (0.55%) chars / 453601664 (0.42%) bytes / 46678 (1.09%) shows 774788.54 est min HINDI: 81059033 (0.59%) words / 381613682 (0.51%) chars / 961058132 (0.90%) bytes / 45634 (1.07%) shows 708537.84 est min Chinese: 177538144 (1.30%) words / 367039503 (0.49%) chars / 716412923 (0.67%) bytes / 75834 (1.77%) shows 681478.12 est min KURDISH: 56276929 (0.41%) words / 352781116 (0.47%) chars / 644677721 (0.60%) bytes / 54561 (1.28%) shows 655004.73 est min SWEDISH: 46299294 (0.34%) words / 247884830 (0.33%) chars / 260280513 (0.24%) bytes / 22263 (0.52%) shows 460244.98 est min GERMAN: 31837734 (0.23%) words / 204136636 (0.27%) chars / 206683727 (0.19%) bytes / 76791 (1.80%) shows 379018.20 est min CROATIAN: 32243292 (0.24%) words / 176330240 (0.23%) chars / 181661517 (0.17%) bytes / 76298 (1.78%) shows 327390.38 est min ChineseT: 33827008 (0.25%) words / 155207673 (0.21%) chars / 386886450 (0.36%) bytes / 45913 (1.07%) shows 288172.34 est min ITALIAN: 20743468 (0.15%) words / 125129589 (0.17%) chars / 126261582 (0.12%) bytes / 35447 (0.83%) shows 232326.70 est min BOSNIAN: 12039961 (0.09%) words / 69900165 (0.09%) chars / 71722013 (0.07%) bytes / 65685 (1.54%) shows 129782.85 est min LINGALA: 11360058 (0.08%) words / 69827047 (0.09%) chars / 69990455 (0.07%) bytes / 40274 (0.94%) shows 129647.09 est min Korean: 15474593 (0.11%) words / 58742821 (0.08%) chars / 138457229 (0.13%) bytes / 16748 (0.39%) shows 109067.14 est min Unknown: 6975564 (0.05%) words / 45844805 (0.06%) chars / 83080431 (0.08%) bytes / 163977 (3.84%) shows 85119.53 est min MACEDONIAN: 6753962 (0.05%) words / 41121677 (0.05%) chars / 74255914 (0.07%) bytes / 10219 (0.24%) shows 76350.16 est min BELARUSIAN: 4473715 (0.03%) words / 30337943 (0.04%) chars / 54979302 (0.05%) bytes / 23209 (0.54%) shows 56328.12 est min URDU: 6172542 (0.05%) words / 26301844 (0.03%) chars / 46377402 (0.04%) bytes / 17624 (0.41%) shows 48834.34 est min SOMALI: 3691127 (0.03%) words / 24929192 (0.03%) chars / 24945750 (0.02%) bytes / 14229 (0.33%) shows 46285.75 est min Japanese: 8673444 (0.06%) words / 20057383 (0.03%) chars / 41708482 (0.04%) bytes / 28366 (0.66%) shows 37240.32 est min OROMO: 2245302 (0.02%) words / 18229302 (0.02%) chars / 18408856 (0.02%) bytes / 12282 (0.29%) shows 33846.14 est min NORWEGIAN: 2729029 (0.02%) words / 13915265 (0.02%) chars / 14297447 (0.01%) bytes / 14491 (0.34%) shows 25836.32 est min TIGRINYA: 2281939 (0.02%) words / 11647609 (0.02%) chars / 30298238 (0.03%) bytes / 5147 (0.12%) shows 21625.98 est min DANISH: 2165171 (0.02%) words / 11167591 (0.01%) chars / 11490496 (0.01%) bytes / 48444 (1.13%) shows 20734.74 est min SWAHILI: 1602058 (0.01%) words / 10435696 (0.01%) chars / 10467739 (0.01%) bytes / 18919 (0.44%) shows 19375.84 est min HEBREW: 1730032 (0.01%) words / 9070395 (0.01%) chars / 16075986 (0.02%) bytes / 11732 (0.27%) shows 16840.90 est min GALICIAN: 1526426 (0.01%) words / 8719818 (0.01%) chars / 8917868 (0.01%) bytes / 24178 (0.57%) shows 16189.99 est min GREEK: 1282677 (0.01%) words / 7628951 (0.01%) chars / 13787652 (0.01%) bytes / 10376 (0.24%) shows 14164.59 est min HAUSA: 1156431 (0.01%) words / 6111825 (0.01%) chars / 6141307 (0.01%) bytes / 10043 (0.23%) shows 11347.76 est min INDONESIAN: 804657 (0.01%) words / 4909057 (0.01%) chars / 4930359 (0.00%) bytes / 44582 (1.04%) shows 9114.59 est min SLOVAK: 867248 (0.01%) words / 4837321 (0.01%) chars / 4969729 (0.00%) bytes / 36644 (0.86%) shows 8981.40 est min ROMANIAN: 830513 (0.01%) words / 4607275 (0.01%) chars / 4937805 (0.00%) bytes / 16408 (0.38%) shows 8554.28 est min POLISH: 545121 (0.00%) words / 3587387 (0.00%) chars / 3800446 (0.00%) bytes / 8699 (0.20%) shows 6660.66 est min DUTCH: 603703 (0.00%) words / 3167569 (0.00%) chars / 3177929 (0.00%) bytes / 33540 (0.78%) shows 5881.19 est min FINNISH: 446762 (0.00%) words / 3026870 (0.00%) chars / 3193928 (0.00%) bytes / 7982 (0.19%) shows 5619.96 est min WOLOF: 454207 (0.00%) words / 2378414 (0.00%) chars / 2440538 (0.00%) bytes / 7924 (0.19%) shows 4415.97 est min HUNGARIAN: 313051 (0.00%) words / 2048764 (0.00%) chars / 2216896 (0.00%) bytes / 8225 (0.19%) shows 3803.92 est min KINYARWANDA: 310624 (0.00%) words / 2014520 (0.00%) chars / 2035423 (0.00%) bytes / 13110 (0.31%) shows 3740.34 est min TAGALOG: 287971 (0.00%) words / 1637419 (0.00%) chars / 1643592 (0.00%) bytes / 9788 (0.23%) shows 3040.18 est min MALAY: 238709 (0.00%) words / 1540066 (0.00%) chars / 1558961 (0.00%) bytes / 21913 (0.51%) shows 2859.42 est min NORWEGIAN_N: 272633 (0.00%) words / 1455994 (0.00%) chars / 1498078 (0.00%) bytes / 21943 (0.51%) shows 2703.33 est min IGBO: 244491 (0.00%) words / 1412956 (0.00%) chars / 1503972 (0.00%) bytes / 7804 (0.18%) shows 2623.42 est min BURMESE: 234518 (0.00%) words / 1346703 (0.00%) chars / 3566609 (0.00%) bytes / 2617 (0.06%) shows 2500.41 est min ALBANIAN: 232183 (0.00%) words / 1252130 (0.00%) chars / 1347718 (0.00%) bytes / 3548 (0.08%) shows 2324.82 est min PASHTO: 234669 (0.00%) words / 1203068 (0.00%) chars / 2139946 (0.00%) bytes / 15251 (0.36%) shows 2233.72 est min UZBEK: 183341 (0.00%) words / 1173488 (0.00%) chars / 1600230 (0.00%) bytes / 20558 (0.48%) shows 2178.80 est min CZECH: 187892 (0.00%) words / 1081360 (0.00%) chars / 1160102 (0.00%) bytes / 15431 (0.36%) shows 2007.75 est min LAOTHIAN: 238298 (0.00%) words / 1070721 (0.00%) chars / 2712285 (0.00%) bytes / 11994 (0.28%) shows 1988.00 est min NEPALI: 176148 (0.00%) words / 910517 (0.00%) chars / 2345062 (0.00%) bytes / 6068 (0.14%) shows 1690.55 est min SLOVENIAN: 155365 (0.00%) words / 906667 (0.00%) chars / 930074 (0.00%) bytes / 7164 (0.17%) shows 1683.40 est min ARMENIAN: 133145 (0.00%) words / 866072 (0.00%) chars / 1565679 (0.00%) bytes / 1809 (0.04%) shows 1608.03 est min YORUBA: 176688 (0.00%) words / 863219 (0.00%) chars / 941756 (0.00%) bytes / 5985 (0.14%) shows 1602.73 est min GUARANI: 129844 (0.00%) words / 837548 (0.00%) chars / 851616 (0.00%) bytes / 9103 (0.21%) shows 1555.07 est min SINDHI: 175040 (0.00%) words / 835381 (0.00%) chars / 1474890 (0.00%) bytes / 10135 (0.24%) shows 1551.04 est min KHMER: 182590 (0.00%) words / 833962 (0.00%) chars / 2125508 (0.00%) bytes / 3442 (0.08%) shows 1548.41 est min BULGARIAN: 125511 (0.00%) words / 772579 (0.00%) chars / 1295363 (0.00%) bytes / 12619 (0.30%) shows 1434.44 est min MARATHI: 122444 (0.00%) words / 667032 (0.00%) chars / 1666988 (0.00%) bytes / 9837 (0.23%) shows 1238.47 est min SANSKRIT: 99107 (0.00%) words / 632241 (0.00%) chars / 1417199 (0.00%) bytes / 11459 (0.27%) shows 1173.87 est min LATIN: 92278 (0.00%) words / 585398 (0.00%) chars / 599444 (0.00%) bytes / 23733 (0.56%) shows 1086.90 est min ICELANDIC: 101928 (0.00%) words / 566075 (0.00%) chars / 628970 (0.00%) bytes / 2700 (0.06%) shows 1051.03 est min AFAR: 64885 (0.00%) words / 524631 (0.00%) chars / 530677 (0.00%) bytes / 7072 (0.17%) shows 974.08 est min OCCITAN: 88282 (0.00%) words / 504840 (0.00%) chars / 514991 (0.00%) bytes / 5078 (0.12%) shows 937.33 est min AFRIKAANS: 98207 (0.00%) words / 490272 (0.00%) chars / 492626 (0.00%) bytes / 4063 (0.10%) shows 910.28 est min LITHUANIAN: 71589 (0.00%) words / 463378 (0.00%) chars / 493664 (0.00%) bytes / 6922 (0.16%) shows 860.35 est min BENGALI: 69914 (0.00%) words / 443360 (0.00%) chars / 1094148 (0.00%) bytes / 4894 (0.11%) shows 823.18 est min NYANJA: 61162 (0.00%) words / 409403 (0.00%) chars / 411780 (0.00%) bytes / 2568 (0.06%) shows 760.13 est min SCOTS_GAELIC: 65091 (0.00%) words / 381410 (0.00%) chars / 388152 (0.00%) bytes / 12203 (0.29%) shows 708.16 est min LATVIAN: 59656 (0.00%) words / 369067 (0.00%) chars / 393014 (0.00%) bytes / 7856 (0.18%) shows 685.24 est min XHOSA: 48070 (0.00%) words / 348361 (0.00%) chars / 353109 (0.00%) bytes / 7063 (0.17%) shows 646.80 est min ZULU: 44839 (0.00%) words / 346612 (0.00%) chars / 347931 (0.00%) bytes / 2943 (0.07%) shows 643.55 est min GANDA: 46769 (0.00%) words / 317285 (0.00%) chars / 319389 (0.00%) bytes / 4774 (0.11%) shows 589.10 est min BIHARI: 66942 (0.00%) words / 309739 (0.00%) chars / 766344 (0.00%) bytes / 6486 (0.15%) shows 575.09 est min BASQUE: 50153 (0.00%) words / 306354 (0.00%) chars / 313363 (0.00%) bytes / 9479 (0.22%) shows 568.80 est min MALAGASY: 43714 (0.00%) words / 304258 (0.00%) chars / 310058 (0.00%) bytes / 13646 (0.32%) shows 564.91 est min TAJIK: 44468 (0.00%) words / 288368 (0.00%) chars / 510163 (0.00%) bytes / 3874 (0.09%) shows 535.41 est min SHONA: 36520 (0.00%) words / 249561 (0.00%) chars / 254836 (0.00%) bytes / 3222 (0.08%) shows 463.36 est min HAWAIIAN: 32856 (0.00%) words / 235956 (0.00%) chars / 239080 (0.00%) bytes / 4794 (0.11%) shows 438.10 est min JAVANESE: 36285 (0.00%) words / 234151 (0.00%) chars / 240560 (0.00%) bytes / 8811 (0.21%) shows 434.75 est min GEORGIAN: 31990 (0.00%) words / 219467 (0.00%) chars / 574288 (0.00%) bytes / 1722 (0.04%) shows 407.48 est min CORSICAN: 33716 (0.00%) words / 203982 (0.00%) chars / 206947 (0.00%) bytes / 4361 (0.10%) shows 378.73 est min SCOTS: 36027 (0.00%) words / 199255 (0.00%) chars / 202998 (0.00%) bytes / 10884 (0.25%) shows 369.95 est min GUJARATI: 38757 (0.00%) words / 188435 (0.00%) chars / 482937 (0.00%) bytes / 1973 (0.05%) shows 349.87 est min IRISH: 34390 (0.00%) words / 174082 (0.00%) chars / 177748 (0.00%) bytes / 13023 (0.30%) shows 323.22 est min TELUGU: 25260 (0.00%) words / 172346 (0.00%) chars / 461564 (0.00%) bytes / 446 (0.01%) shows 319.99 est min SINHALESE: 29174 (0.00%) words / 170058 (0.00%) chars / 442607 (0.00%) bytes / 1415 (0.03%) shows 315.74 est min ESTONIAN: 25813 (0.00%) words / 161687 (0.00%) chars / 165538 (0.00%) bytes / 4770 (0.11%) shows 300.20 est min MONGOLIAN: 26888 (0.00%) words / 160527 (0.00%) chars / 283249 (0.00%) bytes / 2943 (0.07%) shows 298.05 est min KYRGYZ: 24140 (0.00%) words / 147539 (0.00%) chars / 260045 (0.00%) bytes / 3322 (0.08%) shows 273.93 est min MAORI: 24570 (0.00%) words / 134334 (0.00%) chars / 138822 (0.00%) bytes / 4954 (0.12%) shows 249.42 est min TAMIL: 17379 (0.00%) words / 131552 (0.00%) chars / 354369 (0.00%) bytes / 812 (0.02%) shows 244.25 est min X_PIG_LATIN: 15217 (0.00%) words / 121385 (0.00%) chars / 125180 (0.00%) bytes / 8475 (0.20%) shows 225.37 est min MAURITIAN_CREOLE: 19286 (0.00%) words / 117019 (0.00%) chars / 126407 (0.00%) bytes / 6215 (0.15%) shows 217.27 est min FIJIAN: 18018 (0.00%) words / 116984 (0.00%) chars / 122014 (0.00%) bytes / 2888 (0.07%) shows 217.20 est min MALTESE: 17449 (0.00%) words / 113334 (0.00%) chars / 117271 (0.00%) bytes / 3025 (0.07%) shows 210.43 est min WARAY_PHILIPPINES: 16832 (0.00%) words / 112434 (0.00%) chars / 114800 (0.00%) bytes / 3804 (0.09%) shows 208.75 est min KANNADA: 15912 (0.00%) words / 102226 (0.00%) chars / 273192 (0.00%) bytes / 637 (0.01%) shows 189.80 est min FAROESE: 14257 (0.00%) words / 98389 (0.00%) chars / 105263 (0.00%) bytes / 4018 (0.09%) shows 182.68 est min KAZAKH: 15456 (0.00%) words / 98093 (0.00%) chars / 174058 (0.00%) bytes / 2001 (0.05%) shows 182.13 est min TSONGA: 15649 (0.00%) words / 97303 (0.00%) chars / 98504 (0.00%) bytes / 2689 (0.06%) shows 180.66 est min QUECHUA: 14147 (0.00%) words / 97160 (0.00%) chars / 99729 (0.00%) bytes / 2534 (0.06%) shows 180.40 est min BISLAMA: 16153 (0.00%) words / 97012 (0.00%) chars / 98946 (0.00%) bytes / 6381 (0.15%) shows 180.12 est min SUNDANESE: 14526 (0.00%) words / 94915 (0.00%) chars / 96378 (0.00%) bytes / 2844 (0.07%) shows 176.23 est min X_KLINGON: 17914 (0.00%) words / 93549 (0.00%) chars / 97196 (0.00%) bytes / 4619 (0.11%) shows 173.69 est min TONGA: 15828 (0.00%) words / 92460 (0.00%) chars / 95396 (0.00%) bytes / 4614 (0.11%) shows 171.67 est min LUXEMBOURGISH: 16509 (0.00%) words / 91879 (0.00%) chars / 95734 (0.00%) bytes / 8980 (0.21%) shows 170.59 est min BRETON: 15468 (0.00%) words / 90101 (0.00%) chars / 94280 (0.00%) bytes / 8276 (0.19%) shows 167.29 est min TSWANA: 16585 (0.00%) words / 89916 (0.00%) chars / 91423 (0.00%) bytes / 4313 (0.10%) shows 166.95 est min WELSH: 15195 (0.00%) words / 89820 (0.00%) chars / 92195 (0.00%) bytes / 7940 (0.19%) shows 166.77 est min KHASI: 15711 (0.00%) words / 84530 (0.00%) chars / 87079 (0.00%) bytes / 6334 (0.15%) shows 156.95 est min SAMOAN: 14553 (0.00%) words / 81696 (0.00%) chars / 85054 (0.00%) bytes / 4890 (0.11%) shows 151.68 est min VOLAPUK: 14909 (0.00%) words / 81637 (0.00%) chars / 86758 (0.00%) bytes / 4581 (0.11%) shows 151.57 est min INTERLINGUE: 13985 (0.00%) words / 80611 (0.00%) chars / 81823 (0.00%) bytes / 6297 (0.15%) shows 149.67 est min SESOTHO: 14406 (0.00%) words / 78874 (0.00%) chars / 80143 (0.00%) bytes / 2307 (0.05%) shows 146.44 est min TATAR: 13535 (0.00%) words / 76716 (0.00%) chars / 96882 (0.00%) bytes / 5853 (0.14%) shows 142.44 est min ESPERANTO: 11985 (0.00%) words / 76544 (0.00%) chars / 78622 (0.00%) bytes / 5319 (0.12%) shows 142.12 est min RHAETO_ROMANCE: 14373 (0.00%) words / 75813 (0.00%) chars / 78989 (0.00%) bytes / 4704 (0.11%) shows 140.76 est min AKAN: 14851 (0.00%) words / 75523 (0.00%) chars / 80647 (0.00%) bytes / 2824 (0.07%) shows 140.22 est min TURKMEN: 10626 (0.00%) words / 72122 (0.00%) chars / 93092 (0.00%) bytes / 3972 (0.09%) shows 133.91 est min MANX: 12096 (0.00%) words / 70503 (0.00%) chars / 72957 (0.00%) bytes / 4909 (0.11%) shows 130.90 est min SESELWA: 11179 (0.00%) words / 67878 (0.00%) chars / 69159 (0.00%) bytes / 3674 (0.09%) shows 126.03 est min ORIYA: 29783 (0.00%) words / 66972 (0.00%) chars / 142411 (0.00%) bytes / 17574 (0.41%) shows 124.35 est min RUNDI: 9932 (0.00%) words / 65632 (0.00%) chars / 67552 (0.00%) bytes / 3494 (0.08%) shows 121.86 est min HMONG: 9571 (0.00%) words / 65223 (0.00%) chars / 69673 (0.00%) bytes / 5223 (0.12%) shows 121.10 est min INTERLINGUA: 9772 (0.00%) words / 61043 (0.00%) chars / 62136 (0.00%) bytes / 3005 (0.07%) shows 113.34 est min GREENLANDIC: 8502 (0.00%) words / 53984 (0.00%) chars / 59299 (0.00%) bytes / 2678 (0.06%) shows 100.23 est min CEBUANO: 8080 (0.00%) words / 51297 (0.00%) chars / 51761 (0.00%) bytes / 1207 (0.03%) shows 95.24 est min PEDI: 8130 (0.00%) words / 47294 (0.00%) chars / 47984 (0.00%) bytes / 1631 (0.04%) shows 87.81 est min FRISIAN: 6823 (0.00%) words / 46524 (0.00%) chars / 49229 (0.00%) bytes / 2992 (0.07%) shows 86.38 est min HAITIAN_CREOLE: 8965 (0.00%) words / 46324 (0.00%) chars / 47654 (0.00%) bytes / 4119 (0.10%) shows 86.01 est min BASHKIR: 6182 (0.00%) words / 46180 (0.00%) chars / 73117 (0.00%) bytes / 3583 (0.08%) shows 85.74 est min AYMARA: 6453 (0.00%) words / 39335 (0.00%) chars / 43159 (0.00%) bytes / 2066 (0.05%) shows 73.03 est min PUNJABI: 8290 (0.00%) words / 38158 (0.00%) chars / 97679 (0.00%) bytes / 236 (0.01%) shows 70.85 est min SISWANT: 5303 (0.00%) words / 37007 (0.00%) chars / 38575 (0.00%) bytes / 939 (0.02%) shows 68.71 est min VENDA: 6329 (0.00%) words / 34679 (0.00%) chars / 35240 (0.00%) bytes / 2391 (0.06%) shows 64.39 est min UIGHUR: 4243 (0.00%) words / 28092 (0.00%) chars / 45386 (0.00%) bytes / 2037 (0.05%) shows 52.16 est min NAURU: 4373 (0.00%) words / 27147 (0.00%) chars / 27749 (0.00%) bytes / 1689 (0.04%) shows 50.40 est min ABKHAZIAN: 3178 (0.00%) words / 25299 (0.00%) chars / 43191 (0.00%) bytes / 925 (0.02%) shows 46.97 est min INUPIAK: 1373 (0.00%) words / 12027 (0.00%) chars / 12078 (0.00%) bytes / 635 (0.01%) shows 22.33 est min KASHMIRI: 2317 (0.00%) words / 9898 (0.00%) chars / 15906 (0.00%) bytes / 763 (0.02%) shows 18.38 est min ZHUANG: 1506 (0.00%) words / 9100 (0.00%) chars / 9429 (0.00%) bytes / 841 (0.02%) shows 16.90 est min MALAYALAM: 1107 (0.00%) words / 8106 (0.00%) chars / 21810 (0.00%) bytes / 202 (0.00%) shows 15.05 est min SANGO: 1115 (0.00%) words / 6531 (0.00%) chars / 6727 (0.00%) bytes / 296 (0.01%) shows 12.13 est min ASSAMESE: 223 (0.00%) words / 679 (0.00%) chars / 1528 (0.00%) bytes / 146 (0.00%) shows 1.26 est min YIDDISH: 96 (0.00%) words / 458 (0.00%) chars / 776 (0.00%) bytes / 51 (0.00%) shows 0.85 est min NDEBELE: 72 (0.00%) words / 444 (0.00%) chars / 448 (0.00%) bytes / 14 (0.00%) shows 0.82 est min TIBETAN: 1 (0.00%) words / 13 (0.00%) chars / 24 (0.00%) bytes / 1 (0.00%) shows 0.02 est min