While GCP's new Chirp ASR model does not officially support multilingual audio content, it has proven highly adept at correctly transcribing brief excerpts of other languages interspersed in broadcasts. Today's example is even more interesting: a broadcast that begins in Arabic, with a final 2 minutes of English speaking at the end. Chirp flawlessly switches between the two languages, despite not being told to expect English in the broadcast. Older generations of ASR systems typically would simply render other languages into the closest-sounded words in the expected languages, devolving the transcript into gibberish, whereas here the power of modern large speech models is on full display. For countries whose television channels often incorporate programming in multiple languages, the ability to seamlessly transcribe across multiple languages is truly game changing and opens their programming to search and analysis for the first time.
View Broadcast. (NOTE: video clips not yet playable)