The GDELT Project

Visual Explorer: Another 570,000 Broadcasts Totaling 1.29B Seconds & 2.7B Words ASR'd Through Chirp

As complete the final preparations to bring the Visual Explorer back online, we are excited to announce that an additional 570,000 historical broadcasts spanning 2009 to present (with a large focus from 2010-2016) have been ASR'd through GCP Chirp (only the public model was used and no data was used to train or tune any model) over the past few days and will be added to the Visual Explorer for the first time! With this new tranche, we have now completed ASR of all uncaptioned broadcasts in the entire archive, meaning that, for the first time ever, the entire archive will be keyword searchable when the TV Explorer relaunches this fall. In all, 570,000 broadcasts totaling 1.29 billion seconds (21M minutes / 357K hours) were ASR'd yielding 2.76 billion words spanning 15.5 billion characters.

In a remarkable testament to just how cost effective ASR has become, ASR'ing more than half a million broadcasts totaling 1.29 billion seconds of airtime cost just $64,000.