The GDELT Project

AI Challenges: Speech Recognition In Multilingual Societies: From Multiple Languages To Code Switching

Earlier today we released an early experiment applying Google's Speech-to-Text API's speech recognition to the 18 sample broadcasts from our new EMEA TV News Archive collaboration with the Internet Archive's Television News Archive. One of the most fascinating findings from this experiment was the degree to which television news channels in highly multilingual societies utilize multiple languages in their broadcasts. This marks a sharp departure from simultaneous translation overdubbing favored by English language American television news broadcasters and represents an incredibly fascinating area of underdeveloped research in neural speech recognition. Indeed, such explorations are a hallmark of GDELT's work to globalize the field of analytics and AI, in broadening awareness of the rich diversity of news content around the world.

We've identified four core challenges that we'd love to see solutions for: