Last week we released a massive new dataset of television news mentions of Covid-19 across 16 stations from January 1, 2020 through last week, using data from the Internet Archive's Television News Archive, spanning Al Jazeera, BBC News London, Bloomberg, CNBC, CNN, CSPAN, CSPAN2, CSPAN3, DW-TV (Deutsche Welle), Fox Business, Fox News, MSNBC, Russia Today, along ABC, CBS and NBC Evening News broadcasts.
Today we're releasing an expanded version of this dataset that will update each morning.
There are now nine different datasets, each covering all 16 stations, including:
- Cases: (case OR cases)
- Covid19: (coronavirus OR covid OR virus OR infection OR infected OR infect OR infects)
- Masks: (n95 OR mask OR masks OR respirator OR respirators)
- Panic: (panic OR panicking OR panicked)
- Quarantine: (quarantine OR shelter OR restrict OR restriction OR restricted OR isolation OR exclusion OR lockdown OR lockdowns)
- Shortages: (shortage OR shortages)
- SocialDistancing: "social distancing"
- Testing: (test OR tested OR tests OR testing)
- Ventilator: (ventilator OR ventilators)
Since this dataset will be updating daily, files are named as YYYMMDD.THEME.STATION.csv, where "THEME" is one of the nine themes above.
The complete list of 11,000 files can be found below:
The dataset is also available in BigQuery:
The dataset will update each morning, though the last 3 days of data may be incomplete as shows complete processing.
We're tremendously excited to see what you're able to do with this new dataset!