The GDELT Project

Now Live Updating & Expanded: A New Dataset For Exploring The Coronavirus Narrative In Global Online News

Two weeks ago we released a compilation of  URLs and brief snippets of worldwide English language news coverage mentioning Covid-19. Today we're releasing a considerably expanded dataset that will update daily and includes a number of related topics.

For each topic there is a historical backfile that covers the period November 1, 2019 through March 26, 2020. From that point forward each morning a new file will be created for each topic with the previous day's URLs and snippets. Each file is in CSV format and consists of the date GDELT saw the article, its URL, page title and a brief snippet of up to three sentences or 800 characters, whichever is less, containing the matching keywords in context. Only one snippet per article per topic is provided – if the relevant keywords appear multiple times in an article, only one instance will be selected. Due to the way these snippets are constructed, they may not represent the first mention of the keywords in the article.

For each of the topics below, either the matching sentence or the sentence before or after it must also contain either "Coronavirus" or "Covid-19".

The complete list of GZIP'd files can be found below:

The dataset is also available in BigQuery:

We're tremendously excited to see what you're able to do with these incredible new datasets!