We're excited to announce that the GKG 2.0, which runs 2015-present has passed 1.1 billion articles from across the world!
Counting how many locations have been mentioned in GDELT over that time period requires just a single SQL query that processes 406GB in just 24 seconds!
SELECT count(1) NumLocs, count(distinct REGEXP_EXTRACT(Loc,r'^.*?#.*?#.*?#.*?#.*?#.*?#.*?#(.*?)#') ) NumDistinctLocs FROM `gdelt-bq.gdeltv2.gkg_partitioned`, UNNEST(SPLIT(V2Locations,';')) Loc WHERE DATE(_PARTITIONTIME) = "2020-07-10"
In terms of geography alone, the GKG 2.0 encodes nearly 9.1 billion mentions of 1.25 million distinct places on earth over the past five years!