A Compilation Of GDELT BigQuery Demos

With the debut of GDELT 2.0 earlier this year and the general availability of the GDELT Global Knowledge Graph (GKG) in Google BigQuery, we've seen an incredible boom in the diversity and complexity of analyses being performed on GDELT that leverage BigQuery's ability to perform massive and highly complex queries in near-realtime.  The examples below offer a small cross-section of some of the public demos from the GDELT Blog over the past year.


The best place to get started with using the GKG is with our aptly titled "Google BigQuery + GKG 2.0: Sample Queries" post that offers several brief tutorials walking through basic person/organization/theme histograms, language comparisons, geographic histograms, and network visualizations.


For more advanced users looking to create rich interactive thematic maps of the geographic footprint of specific topics using the GKG should explore this tutorial, which presents a terascale mapping solution using BigQuery's User Defined Function (UDF) capability. Combining a JavaScript UDF function with a corresponding SQL query, this tutorial examines the more than one billion location references in the GKG, associates each location mention with the themes mentioned in closest proximity, identifies the locations most closely associated with the theme of interest, and outputs a CSV file suitable for importing directly into CartoDB.com's interactive online mapping platform.

There are also countless other demonstrations of using the GKG for mapping that use external PERL scripts for the post processing, though all of these can now be completed using the UDF approach above.

These two demos use an external PERL script designed to create thematic country-level maps of the density of particular topics or discourse by country.

And finally this demo, based on the 200-year books collection, showcases how to make an animated map:


For those interested in creating network co-occurrence diagrams from the GKG examining the people, organizations, and locations mentioned most frequently with each other in the world's news, there are several tutorials available showcasing different filters and network constructions.


This demo uses a toy sentiment dictionary to tone code 122 years of books totaling 67 billion words at the incredible rate of 341 million words per second. In this case the tone dictionary is simply a toy dictionary for illustration purposes, but the approach demonstrates content analysis at massive scale. The sentiment analysis performed here includes both density and value-based scoring, covering the full range of traditional sentiment scoring used today.


This demo showcases cerating ngrams across tens of billions of words in just minutes and is being used in production by GDELT to create a series of computational linguistic resources for underrepresented languages.


This demo showcases folding the entire GDELT 1.0 Event dataset on itself to perform 2.5 million correlations in just 2.5 minutes to explore the underlying cycles of world history.


Below are a handful of other examples that didn't fit into the categories above.