In collaboration with the Internet Archive's Television News Archive, we've been rolling out a flood of new tools and datasets to help scholars and journalists understand the global television news landscape, part of our more than a decade of collaboration with the Archive. To help you dive into this archive, we've collected a quick list of some of the core dashboards, datasets and experiments below.
- Visual Explorer. Visually skim the entire global holdings of the TV News Archive covering 5.2 million broadcasts spanning 100+ channels over 50 countries from 5 continents over portions of 20 years. Each broadcast is sampled one frame every 4 seconds and represented as a thumbnail grid allowing rapid visual skimming of the entire broadcast. Some channels allow you to click on a thumbnail and play a 30 second clip from that point. Understand the visual narratives of the world's television channels over nearly a quarter-century. Television coverage from Belarus, Russia and Ukraine is being live transcribed and translated.
- Television Explorer. Keyword search closed captioning of a range of American channels and a handful of international channels, with some channels like CNN/MSNBC/FOX and ABC/CBS/NBC evening news stretching back more than a decade.
- Television AI Explorer. Keyword search OCR'd onscreen text and search for ~20-30K objects and activities across CNN/MSNBC/FOX/BBCNEWS since 2020 and ABC/CBS/NBC evening news since 2010.
- Visual Explorer Visual NGrams. All 5.2 million worldwide television news broadcasts held by the TV News Archive are available in the Visual Explorer, spanning 50 countries over portions of 20 years. For each broadcast, the Visual Explorer makes it "skimmable" by extracting one frame every 4 seconds at a fixed interval to represent the broadcast. These images are arrayed into a thumbnail grid in the Visual Explorer web interface. To enable at-scale non-consumptive visual analysis, each broadcast also makes available a ZIP file containing the full-resolution version of the images that make up the thumbnail grid, creating "ngrams for television news." Today those ngrams span 5.2 million broadcasts over 12.34 billion seconds of airtime (205.7 million minutes / 3.43 million hours) through 3 billion images totaling 1 quadrillion pixels, yielding what is perhaps the richest non-consumptive analyzable archive of television news ever created.
- Visual Global Entity Graph (VGEGV2). A decade of ABC/CBS/NBC evening news, CNN/MSNBC/FOX/BBCNEWS since 2020 and selections since 2009 have been annotated through Google's Cloud Video AI API, with the complete OCR'd onscreen text and all recognized objects and activities available, with both summarized and raw JSON API output available for each broadcast.
- Belarusian, Russian & Ukrainian Automated Transcripts. All broadcasts from Russian channels 1TV, NTV and Russia 1 (from March 26) and Russia 24 (from April 25), Ukrainian channel Espreso (from April 25) and Belarusian channel Belarus 24 (from May 16) are automatically transcribed by Google's Speech-to-Text API, with the full raw per-word JSON annotation for every broadcast available for download, along with SRT and TXT versions. When viewing these broadcasts in the Visual Explorer, a running transcript will appear alongside the thumbnail grid and when viewing playable clips, which can be automatically translated into English in-browser when using the Chrome web browser.
- Television News Advertising Inventory. When analyzing television news, distinguishing news from paid advertising airtime is critically important to avoid skewing findings. Historically, advertisements have been excluded using algorithmic filters that had high error rates and tended to miss edge cases. Instead, this dataset compiles the definitive channel-provided inventory of advertising versus news programming, releasing both timecode files and the actual underlying captioning of captioned advertisements across a range of channels, as well as a video-aligned version for select channels.
- Visual Channel Comparer. This simple command line script takes a start and stop time and list of channels and generates a side-by-side comparison of coverage across the selected channels during that time. Examples showcase its use comparing midterms, global channels, senate hearings and Trump's campaign announcement.
- Provenance Analysis & Scanning For An Excerpted Clip. This demo showcases scanning a tv news channel to locate an excerpted clip seen on Twitter.
- OpenAI's CLIP Visual Search. An exploration of OpenAI's CLIP visual search system applied to a Russian television news broadcast, with example searches. A colab IPython notebook implementation is also available, making it possible to perform visual search without a local GPU.
- YOLO Object Detection. This experiment showcases applying the YOLOv5 object detection with its pretrained COCO model to the Visual Explorer Visual NGrams dataset to identify 80 major objects within any news broadcast. Scroll to the bottom of the tutorial to see a video of the results. A separate tutorial showcases scaling the analysis up to two weeks of a given channel.
- Face Detection & Similarity Scanning. This experiment applies an off-the-shelf face detection tool to identify all human faces in a broadcast, which it then extends by using similarity matching to track Tucker Carlson appearances in a single broadcast and then over an entire week and a half of coverage.
- Visual Annotation Of A Broadcast Through Google's Cloud Vision API. Run an entire broadcast through Google's Cloud Vision API, with all major features enabled, including geographic landmarks, labels (depicted objects and activities), major logos, counting the number of faces and 300+ language OCR, along with Web Entities (reverse image search across the open web), connecting television to online news coverage.
- Logo Detection Using Google's Cloud Vision API. Using the Google Cloud Vision API's logo detection service to compile a list of recognized major logos in an entire day of Russian television news. Showcases how Visual Explorer visual ngrams can be used with off-the-shelf visual analysis tools.
- OpenAI's Whisper ASR. A series of experiments has explored the promises and limitations of OpenAI's Whisper ASR.