Transcribing 2.5M Hours Of TV News: Correcting Mixed Transliterations In LSM Transcription Via LSM+LLM

Last week we explored how Large Speech Model (LSM) transcriptions can blend script and transliterated text, rendering the resulting transcripts resistant to traditional keyword search. In other words, with a single Hindi speaker, an LSM like Chirp can output a transcript that mixes script Hindi with transliterated Hindi, meaning a keyword search must include both script and transliterated forms of the search terms. Could we use LLM post processing to rewrite LSM transcripts to normalize their outputs to make it easier for keyword search and other downstream tasks?

Here we take our mixed script/transliterated Hindi excerpt produced by LSM Chirp and use LLM GPT-4 to rewrite into all-script Hindi. Comparing GPT-4's English translations of the original mixed and subsequent normalized text suggests the all-script Hindi yields a considerably more fluent and understandable translation compared with the mixed text, likely due to the normalization process altering the transcript in ways that make it more fluent. It is unclear which is more faithful to the actual contents of the broadcast, but this demonstrates that transcript normalization is both possible and improves downstream translations.

Let's start by normalizing the original mixed broadcast:

The text below is a mixture of script and transliterated Hindi. Convert all transliterated portions to script Hindi:

है, सबसे पहले गोली चलाई thiयरिंग किया था. to uske dar ki wajah se ragbar ek khet tha uske andar wo khali khet tha uske andar koi fasal nahi thi to wo uske andar bhag pada jo aslam tha wo peiche tha wo unko dikhai nahi diyais unhone fiyering ki to wahan ek kapas ka khet tha lagbhag usme itna bada 4 feet ka lagbhag kapas tha us time to wo kya huapas ke khet mein pad gaya to wahan se wo usko peitne me lag gaye rakbar ko to usne naam sune the wo apas me kar rahe the na ki bhai naval is ka pair tod de Suresh iska hat tod de to wo sun raha tha wo sari baate wahan se aur maine ye bhi suna hai ki gay tak seemit nahi raha hai ki koi bakre ke sath bhi ghum raha hai to unhe tang karte hai

This yields:

Here is the text with all transliterated Hindi converted to Devanagari script:

है, सबसे पहले गोली चलाई थी। यरिंग किया था। तो उसके डर की वजह से रगबर एक खेत था उसके अंदर वो खाली खेत था उसके अंदर कोई फसल नहीं थी तो वो उसके अंदर भाग पड़ा जो असलम था वो पीछे था वो उनको दिखाई नहीं दिया। इस उन्होंने फायरिंग की तो वहाँ एक कपास का खेत था लगभग उसमें इतना बड़ा 4 फीट का लगभग कपास था उस समय तो वो क्या हुआ कपास के खेत में पड़ गया तो वहाँ से वो उसको पीटने में लग गए रकबर को तो उसने नाम सुने थे वो आपस में कर रहे थे न कि भाई नवल इस का पैर तोड़ दे सुरेश इसका हाथ तोड़ दे तो वो सुन रहा था वो सारी बातें वहाँ से और मैंने ये भी सुना है कि गाय तक सीमित नहीं रहा है कि कोई बकरे के साथ भी घूम रहा है तो उन्हें तंग करते हैं।

What if we try to translate this into English?

Translate the following text into English:

है, सबसे पहले गोली चलाई थी। यरिंग किया था। तो उसके डर की वजह से रगबर एक खेत था उसके अंदर वो खाली खेत था उसके अंदर कोई फसल नहीं थी तो वो उसके अंदर भाग पड़ा जो असलम था वो पीछे था वो उनको दिखाई नहीं दिया। इस उन्होंने फायरिंग की तो वहाँ एक कपास का खेत था लगभग उसमें इतना बड़ा 4 फीट का लगभग कपास था उस समय तो वो क्या हुआ कपास के खेत में पड़ गया तो वहाँ से वो उसको पीटने में लग गए रकबर को तो उसने नाम सुने थे वो आपस में कर रहे थे न कि भाई नवल इस का पैर तोड़ दे सुरेश इसका हाथ तोड़ दे तो वो सुन रहा था वो सारी बातें वहाँ से और मैंने ये भी सुना है कि गाय तक सीमित नहीं रहा है कि कोई बकरे के साथ भी घूम रहा है तो उन्हें तंग करते हैं।

This yields:

"He had fired the first shot. He did it out of fear. Ragbar was in a field, it was an empty field without any crops, so he ran inside it. Aslam, who was behind, did not see them. When they fired, there was a cotton field nearby with cotton plants about 4 feet tall. So, what happened was he fell into the cotton field and then they started beating him up. Ragbar heard them saying to each other, 'Brother Naval, break his leg; Suresh, break his arm.' So, he was listening to all these things from there. And I've also heard that it's not just limited to cows, some are also seen with goats, so they harass them."

How does this compare to how GPT-4 translates the original mixed script/transliterated text?

Translate the following text into English:

है, सबसे पहले गोली चलाई thiयरिंग किया था. to uske dar ki wajah se ragbar ek khet tha uske andar wo khali khet tha uske andar koi fasal nahi thi to wo uske andar bhag pada jo aslam tha wo peiche tha wo unko dikhai nahi diyais unhone fiyering ki to wahan ek kapas ka khet tha lagbhag usme itna bada 4 feet ka lagbhag kapas tha us time to wo kya huapas ke khet mein pad gaya to wahan se wo usko peitne me lag gaye rakbar ko to usne naam sune the wo apas me kar rahe the na ki bhai naval is ka pair tod de Suresh iska hat tod de to wo sun raha tha wo sari baate wahan se aur maine ye bhi suna hai ki gay tak seemit nahi raha hai ki koi bakre ke sath bhi ghum raha hai to unhe tang karte hai

Which yields:

"He was the first to fire the shots. Due to fear of him, there was a field that belonged to Ragbar; it was an empty field without any crops, so he ran into it. Aslam, who was behind him, did not see them and they started firing. There was a cotton field nearby, about 4 feet high with cotton at that time. He fell into the cotton field, and from there, they began to beat Ragbar. He heard their names; they were saying among themselves, 'Brother Naval, break his leg, Suresh break his arm.' He was listening to all these talks from there. And I have also heard that it is not just limited to cows; someone is also wandering with goats, and they bother them."