November Roundup Of Television News Tools & Datasets

In collaboration with the Internet Archive's Television News Archive, we've been rolling out a flood of new tools and datasets to help scholars and journalists understand the global television news landscape, part of our more than a decade of collaboration with the Archive. To help you dive into this archive, we've collected a quick list of some of the core dashboards, datasets and experiments below.

Interactive Dashboards

  • Visual Explorer. Visually skim the entire global holdings of the TV News Archive covering 5.2 million broadcasts spanning 100+ channels over 50 countries from 5 continents over portions of 20 years. Each broadcast is sampled one frame every 4 seconds and represented as a thumbnail grid allowing rapid visual skimming of the entire broadcast. Some channels allow you to click on a thumbnail and play a 30 second clip from that point. Understand the visual narratives of the world's television channels over nearly a quarter-century. Television coverage from Belarus, Russia and Ukraine is being live transcribed and translated.
  • Television Explorer. Keyword search closed captioning of a range of American channels and a handful of international channels, with some channels like CNN/MSNBC/FOX and ABC/CBS/NBC evening news stretching back more than a decade.
  • Television AI Explorer. Keyword search OCR'd onscreen text and search for ~20-30K objects and activities across CNN/MSNBC/FOX/BBCNEWS since 2020 and ABC/CBS/NBC evening news since 2010.

Datasets

  • Visual Explorer Visual NGrams. All 5.2 million worldwide television news broadcasts held by the TV News Archive are available in the Visual Explorer, spanning 50 countries over portions of 20 years. For each broadcast, the Visual Explorer makes it "skimmable" by extracting one frame every 4 seconds at a fixed interval to represent the broadcast. These images are arrayed into a thumbnail grid in the Visual Explorer web interface. To enable at-scale non-consumptive visual analysis, each broadcast also makes available a ZIP file containing the full-resolution version of the images that make up the thumbnail grid, creating "ngrams for television news." Today those ngrams span 5.2 million broadcasts over 12.34 billion seconds of airtime (205.7 million minutes / 3.43 million hours) through 3 billion images totaling 1 quadrillion pixels, yielding what is perhaps the richest non-consumptive analyzable archive of television news ever created.
  • Visual Global Entity Graph (VGEGV2). A decade of ABC/CBS/NBC evening news, CNN/MSNBC/FOX/BBCNEWS since 2020 and selections since 2009 have been annotated through Google's Cloud Video AI API, with the complete OCR'd onscreen text and all recognized objects and activities available, with both summarized and raw JSON API output available for each broadcast.
  • Belarusian, Russian & Ukrainian Automated Transcripts. All broadcasts from Russian channels 1TV, NTV and Russia 1 (from March 26) and Russia 24 (from April 25), Ukrainian channel Espreso (from April 25) and Belarusian channel Belarus 24 (from May 16) are automatically transcribed by Google's Speech-to-Text API, with the full raw per-word JSON annotation for every broadcast available for download, along with SRT and TXT versions. When viewing these broadcasts in the Visual Explorer, a running transcript will appear alongside the thumbnail grid and when viewing playable clips, which can be automatically translated into English in-browser when using the Chrome web browser.
  • Television News Advertising Inventory. When analyzing television news, distinguishing news from paid advertising airtime is critically important to avoid skewing findings. Historically, advertisements have been excluded using algorithmic filters that had high error rates and tended to miss edge cases. Instead, this dataset compiles the definitive channel-provided inventory of advertising versus news programming, releasing both timecode files and the actual underlying captioning of captioned advertisements across a range of channels, as well as a video-aligned version for select channels.

Experiments