A New Dataset For Exploring The Coronavirus Narrative On Television News


UPDATE 3/20/2020: This dataset has been vastly expanded and is now updating daily.


How are television news stations covering the Coronavirus outbreak and how have their coverage changed as the virus has gone global? To help researchers explore this question, we've taken the methodology of our climate change television news dataset from January and repeated it for Coronavirus coverage, creating a dataset encompassing all 119,083 mentions of "coronavirus", "covid" and "virus" on a set of major television news stations from January 1, 2020 through the morning of March 10, 2020 using data from the Internet Archive's Television News Archive. Each mention includes the URL of the matching 15 second clip on the Archive's website, the time and date of the match in UTC, the station and show it appeared on, its unique Internet Archive identifier and a preview thumbnail image of the one-minute period containing the clip and the 15 second clip of the spoken word transcript containing the mention, allowing you to understand the context of the mention and the surrounding language.

The ABC, CBS and NBC evening news broadcasts from their San Francisco-based KGO, KPIX and KNTV affiliates (click the links below to interactively explore the coverage using the TV Explorer):

All mentions from the following stations are included (note that there may be some outages or errors in the Archive's monitoring and not all shows may be monitored on all stations) (click the links below to interactively explore the coverage using the TV Explorer):

To create the dataset we used the same workflow as our climate change television news narratives dataset. The following commands were used with the "makefetchcmds.pl" script to generate the shell scripts to perform the actual downloads. The only exception was Bloomberg's February coverage, since there were more than the 5,000 maximum results, so this month was manually divided into weeks and reassembled.

time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" ALJAZ
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" BBCNEWS
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" BLOOMBERG
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" CNBC
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" CNN
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" CSPAN
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" CSPAN2
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" CSPAN3
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" DW
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" FBC
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" FOXNEWS
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" MSNBC
time ./makefetchcmds.pl 202001 202003 "%28coronavirus+OR+covid+OR+virus%29" RT

The ABC, CBS and NBC evening news broadcasts require the addition of the "show:" query parameter to limit to just their national evening news broadcasts, rather than the rest of their local San Francisco news programming:

curl "https://api.gdeltproject.org/api/v2/tv/tv?format=html&startdatetime=20200101000000&datanorm=raw&dateres=DAY&query=(coronavirus%20OR%20covid%20OR%20virus)%20show:%22ABC%20World%20News%20Tonight%20With%20David%20Muir%22%20%20station:KGO&mode=clipgallery&format=csv&maxrecords=5000" -o ./ABC-EveningNews.csv
curl "https://api.gdeltproject.org/api/v2/tv/tv?format=html&startdatetime=20200101000000&datanorm=raw&dateres=DAY&query=(coronavirus%20OR%20covid%20OR%20virus)%20(show:%22CBS%20Evening%20News%22%20OR%20show:%22CBS%20Evening%20News%20with%20Jeff%20Glor%22%20OR%20show:%22CBS%20Evening%20News%20With%20Norah%20ODonnell%22)%20%20station:KPIX&mode=clipgallery&format=csv&maxrecords=5000" -o ./CBS-EveningNews.csv
curl "https://api.gdeltproject.org/api/v2/tv/tv?format=html&startdatetime=20200101000000&dateres=DAY&query=(coronavirus%20OR%20covid%20OR%20virus)%20%20show:%22NBC%20Nightly%20News%20With%20Lester%20Holt%22%20%20station:KNTV&mode=clipgallery&format=csv&maxrecords=5000" -o ./NBC-EveningNews.csv

Add all of these together and you get the final dataset: