The GDELT Project

Announcing Per-Second Television News Research Chyrons Dataset

Last month we announced a new research-grade chyron / "lower third" dataset for BBC News, CNN, MSNBC and Fox News covering August 25, 2017 through present, created by reprocessing the daily "Third Eye" chyron dataset from the Internet Archive's Television News Archive with a new workflow that uses language models and sophisticated clustering to reanalyze the Archive's OCR output to create a research-grade dataset that includes all of the onscreen chyron text, with each cluster filtered to its most legible version, rather than filtering to excerpts that remove much of the text.

The end result is a minute-by-minute chronology of the onscreen chyron text of the four stations over the past three years.

Minute-level resolution is ideal for manual comparisons of agenda setting and framing, but certain high-precision applications like chyron search require more precise second-level information that records the exact number of seconds each chyron was onscreen.

Today we are excited to announce a new per-second television news research chyrons dataset that does exactly this!

The new files follow a similar format to the per-minute TSV files, but instead of concatenating together all of the unique text blocks onscreen during a given minute, it breaks each minute into individual clusters representing a distinct grouping of text. Thus, if a given minute alternates between "Congress voting on Trump impeachment" and "Breaking: Trump impeached" every 10 seconds, the minute would be represented as two rows, one for each of the two clusters, with each listed as being onscreen for 30 seconds during that minute.

The final files are named "lowerthirdclusters.tsv" and have the following fields:

For example, the file for November 19, 2019 during Lt. Col. Vindman's testimony would be:

The dataset is also available as a table in BigQuery:

For more details on the underlying reprocessing workflow, see the original announcement last month. We're tremendously excited to see what you're able to accomplish with this new dataset!