Two weeks ago we released a compilation of URLs and brief snippets of worldwide English language news coverage mentioning Covid-19. Today we're releasing a considerably expanded dataset that will update daily and includes a number of related topics.
For each topic there is a historical backfile that covers the period November 1, 2019 through March 26, 2020. From that point forward each morning a new file will be created for each topic with the previous day's URLs and snippets. Each file is in CSV format and consists of the date GDELT saw the article, its URL, page title and a brief snippet of up to three sentences or 800 characters, whichever is less, containing the matching keywords in context. Only one snippet per article per topic is provided – if the relevant keywords appear multiple times in an article, only one instance will be selected. Due to the way these snippets are constructed, they may not represent the first mention of the keywords in the article.
For each of the topics below, either the matching sentence or the sentence before or after it must also contain either "Coronavirus" or "Covid-19".
- Cases: (case OR cases)
- Covid19: (Coronavirus OR "Covid-19")
- Falsehoods: ("fake news" OR misinformation OR disinformation OR "false claim*" OR "conspiracy" OR "falsehood*" OR "rumor*"
- Masks: (mask OR masks OR respirator OR respirators)
- Panic: (panic OR panics OR OR panicked OR panicking)
- Prices: (price OR prices OR pricing OR priced)
- Quarantine: (quarantine* OR shelter* OR restrict* OR restriction* OR restricted* OR isolation* OR exclusion* OR lockdown*)
- Shortages: (shortage OR shortages)
- SocialDistancing: ("social distancing")
- Testing: (test*)
- Ventilators: (ventilat*)
The complete list of GZIP'd files can be found below:
The dataset is also available in BigQuery:
We're tremendously excited to see what you're able to do with these incredible new datasets!