Complex Queries: Combining Events, EventMentions, and GKG

A common request revolves around filtering events to just those mentioned in articles that focus on certain themes or groups. GDELT 2.0 uses a combination of three tables: the EVENTS table that stores distinct events, the EVENTMENTIONS table that stores every mention of every event, one row per mention of an event, and the GKG table stores the GKG. Searching for Material Conflict events in articles that discussion the GKG "CIVILIAN" theme, for example, requires compiling a list of all GKG records containing CIVILIAN, then searching EVENTMENTIONS for all events mentioned in those articles, then searching the EVENTS table for which of those events are Material Conflict events. This requires a three-table join which involves a massive number of records. Yet, as with anything else massive, Google BigQuery comes to the rescue.

The following BigQuery SQL query will perform precisely the query outlined above. Note that here we limit to just events from April, since larger time frames can process a massive amount of data.

WARNING: This query consumes 221GB of data when run, which will consume more than a quarter of your monthly free BigQuery quota, so use with caution.

SELECT STRING(a.SQLDATE) day, COUNT(1)/MIN(totc) cnt
FROM [gdelt-bq:gdeltv2.events] a
JOIN EACH (
 SELECT COUNT(*) totc, SQLDATE
 FROM [gdelt-bq:gdeltv2.events]
 WHERE SQLDATE>=20150401 and SQLDATE<=20150430 and ActionGeo_CountryCode='IZ'
 GROUP BY 2
) b
ON a.SQLDATE=b.SQLDATE
WHERE GLOBALEVENTID IN (
 SELECT GLOBALEVENTID 
 FROM [gdelt-bq:gdeltv2.eventmentions] 
 WHERE MentionIdentifier IN (
 SELECT DocumentIdentifier 
 FROM [gdelt-bq:gdeltv2.gkg] 
 WHERE V2Themes CONTAINS 'CIVILIAN' AND DATE>=20150401000000 and DATE<=20150430999999 AND V2Locations like '%Iraq%'
 )
 GROUP BY GLOBALEVENTID
) 
AND a.SQLDATE>20150401 
AND QuadClass=4
GROUP BY day
ORDER BY day

From here it is easy to expand the query above to search for events reported in articles mentioning certain ethnic groups, people, organizations, themes, etc. The sky's the limit!