Understanding Television News Through Onscreen Text OCR

While much of the work on television news has focused on speech recognition and caption search, through the Visual Global Entity Graph 2.0 and our work on Covid-19 television news narratives, we are finding that the onscreen text of television news yields an entirely new world of rich content that makes possible a range of novel kinds of analyses, from tracking the agenda setting power of tweets to measuring the airtime of onscreen Covid-19 dashboards to inventorying interviewed medical doctors to examining how a particular public figure is presented, to name just a few. Chyrons in particular offer insight into the fascinating realtime editorialization process of television news that is absent from web and print media.

While current open source OCR tools are still evolving to handle the pathological conditions of television news programming, SOTA OCR systems like Google's Cloud Video API are able to transcribe the majority of the onscreen text seen in each broadcast across more than 300 languages and do so nearly flawlessly even under worst-case visual conditions. As we work with scholars to begin exploring this new world of television news content more closely, we're tremendously excited by the kinds of fundamentally new insights it offers.