A Compilation Of GDELT BigQuery Demos

With the debut of GDELT 2.0 earlier this year and the general availability of the GDELT Global Knowledge Graph (GKG) in Google BigQuery, we've seen an incredible boom in the diversity and complexity of analyses being performed on GDELT that leverage BigQuery's ability to perform massive and highly complex queries in near-realtime. The examples below offer a small cross-section of some of the public demos from the GDELT Blog over the past year.

GETTING STARTED

The best place to get started with using the GKG is with our aptly titled "Google BigQuery + GKG 2.0: Sample Queries" post that offers several brief tutorials walking through basic person/organization/theme histograms, language comparisons, geographic histograms, and network visualizations.

Google BigQuery + GKG 2.0: Sample Queries
Google BigQuery + 3.5M Books: Sample Queries (This adapts the queries above to the 200-year books collection and also includes an example of plotting an emotional timeline using the GCAM data)
Getting Started with GDELT + Google Cloud Datalab: Simple Timelines (Illustrates how to rapidly create basic timelines of events, emotions, and topics)

ONE MINUTE MAPPING

For more advanced users looking to create rich interactive thematic maps of the geographic footprint of specific topics using the GKG should explore this tutorial, which presents a terascale mapping solution using BigQuery's User Defined Function (UDF) capability. Combining a JavaScript UDF function with a corresponding SQL query, this tutorial examines the more than one billion location references in the GKG, associates each location mention with the themes mentioned in closest proximity, identifies the locations most closely associated with the theme of interest, and outputs a CSV file suitable for importing directly into CartoDB.com's interactive online mapping platform.

New One Minute Maps: BigQuery UDF + CartoDB (Source code and step-by-step directions)
Mapping at Infinite Scale: Terascale and Petascale Cartography and Big Data in the BigQuery Era (High-level summary of the approach)

There are also countless other demonstrations of using the GKG for mapping that use external PERL scripts for the post processing, though all of these can now be completed using the UDF approach above.

Making Of: Mapping Three Months of Poaching, Drones, and Cyber

These two demos use an external PERL script designed to create thematic country-level maps of the density of particular topics or discourse by country.

And finally this demo, based on the 200-year books collection, showcases how to make an animated map:

Mapping 212 Years of History Through Books

ONE CLICK NETWORK VISUALIZATIONS

For those interested in creating network co-occurrence diagrams from the GKG examining the people, organizations, and locations mentioned most frequently with each other in the world's news, there are several tutorials available showcasing different filters and network constructions.

One-Click Network Visualization With BigQuery+Gephi (Generates Gephi CSV "edges" file formatted for direct one-click importing into Gephi for visualization)
Visualizing The Global Influencer Network (Shows how to exclude certain coverage from a network and how to create networks about a particular geography or written in a particular language).
A Network Diagram of Greece July 1-15 (Shows how to use average "tone" as the edge weight, how to construct networks of coverage from particular languages, robustly fetching the language of each article using IFNULL, and using REGEXP_MATCH for more complex filtering like twice-mention)
Mapping the Geographic Networks of Global Refugee Flows (A more complicated demo that uses Gephi to render a geographically-centered network diagram, which is then imported into Photoshop and overlaid onto a statically rendered CartoDB base layer)
Getting Started with GDELT + Google Cloud Datalab: Simple Network Visualizations (A demo of using Google Cloud Datalab and GraphViz to rapidly and interactively construct and visualize networks from the GKG)

SENTIMENT ANALYSIS

This demo uses a toy sentiment dictionary to tone code 122 years of books totaling 67 billion words at the incredible rate of 341 million words per second. In this case the tone dictionary is simply a toy dictionary for illustration purposes, but the approach demonstrates content analysis at massive scale. The sentiment analysis performed here includes both density and value-based scoring, covering the full range of traditional sentiment scoring used today.

Terascale Sentiment Analysis: BigQuery + Tone Coding Books

NGRAMMING

This demo showcases cerating ngrams across tens of billions of words in just minutes and is being used in production by GDELT to create a series of computational linguistic resources for underrepresented languages.

CYCLES OF HISTORY

This demo showcases folding the entire GDELT 1.0 Event dataset on itself to perform 2.5 million correlations in just 2.5 minutes to explore the underlying cycles of world history.

Towards Psychohistory: Uncovering the Patterns of World History with Google BigQuery

OTHER ANALYSES

Below are a handful of other examples that didn't fit into the categories above.

Complex Queries: Combining Events, EventMentions, and GKG (Demonstrates a three-way join of the EVENTS, EVENTMENTIONS, AND GKG tables)
Using BigQuery To Explore Large Log Files: Exploring the Wayback Machine (Very brief overview of how BigQuery was used to interactive analyze the Wayback Machine snapshot files for a study)

The GDELT Project