We are tremendously excited to announce today that in collaboration with the Internet Archive's Television News Archive, we are making available for researchers and journalists the complete transcripts of more than 117,500 Russia Today broadcasts spanning July 15, 2010 to present and totaling more than 4.3GB of text. Transcripts will now appear when viewing Russia Today broadcasts in the Visual Explorer and, most importantly, are downloadable as time-coded SRT and TXT blob files for content analysis, such as thematic, sentiment, argument structure and narrative analysis. What new insights are captured in this archive tracing Russia's expanding global ambitions over the past decade, its hardening stance towards the West, its march to war and the first year of its invasion? What do the major themes and trends teach us about how Russian foreign propaganda has evolved through the course of the invasion? What key insights into Kremlin narratives can be divined? We are immensely excited to see what kinds of new analyses this massive new dataset makes possible.
All transcripts were machine-generated by the Internet Archive over a number of years using a third-party speech recognition system and have a significantly higher error rate than the modern Google Speech-to-Text ASR being used for the Russian-language channels. These errors tend to be systematic ("boys johnson" instead of "boris johnson" or "don bass" instead of "donbass") so will require additional care and consideration when keyword searching and performing entity and narrative analysis. The early portion of the archive is monotonically divided into 30 minute chunks, while the latter portion of the archive uses EPG data to divide by show, allowing show-level filtering. As with any realtime video archive collected over many years, there are unfortunately several extended outages in the dataset due to technical issues, including around the 2016 election.
Note that transcripts currently through through earlier this month. They will shortly run through present and update in realtime.