We are tremendously excited to announce today the debut of the GDELT Global Geographic Graph, the underlying dataset powering the GDELT GEO 2.0 API, covering more than 1.6 billion location mentions from worldwide English language online news coverage back to April 4, 2017, with full details of each mention, including a 600-character contextual snippet of its context and usage.
Last week we unveiled the Covid-19 geographic news dataset, consisting of the entries from this enormous dataset matching certain Covid-19-related keywords. We received such an incredible response to that dataset that we've moved ahead with releasing the complete underlying Global Geographic Graph!
This initial dataset covers three years of data over the English language online coverage GDELT has monitored (around 1.6 billion location mentions out of a total of more than 3.9 billion mentions across 65 languages in the complete dataset). We're actively working to extend the dataset further back and to incorporate all of the languages in the full dataset, so stay tuned!
The dataset will shortly begin updating daily, once per morning, with the idea that realtime-dependent applications will rely upon the GEO 2.0 API mapping capabilities for approximate fast-updating maps (the GEO API limits the number of returned points for queries), while longitudinal analyses and those requiring greater control over results will utilize the full Global Geographic Graph dataset. Over time we will begin updating this dataset in realtime as well.
You can access the full dataset in BigQuery:
Or you can download the JSON files directly (note that the historically monthly files are typically around 15-20GB compressed):
We're enormously excited to see what you're able to do with this incredible new dataset!