With the debut of GDELT 2.0 earlier this year and the general availability of the GDELT Global Knowledge Graph (GKG) in Google BigQuery, we've seen an incredible boom in the diversity and complexity of analyses being performed on GDELT that leverage BigQuery's ability to perform massive and highly complex queries in near-realtime. The examples below offer a small cross-section of some of the public demos from the GDELT Blog over the past year.
GETTING STARTED
The best place to get started with using the GKG is with our aptly titled "Google BigQuery + GKG 2.0: Sample Queries" post that offers several brief tutorials walking through basic person/organization/theme histograms, language comparisons, geographic histograms, and network visualizations.
- Google BigQuery + GKG 2.0: Sample Queries
- Google BigQuery + 3.5M Books: Sample Queries (This adapts the queries above to the 200-year books collection and also includes an example of plotting an emotional timeline using the GCAM data)
- Getting Started with GDELT + Google Cloud Datalab: Simple Timelines (Illustrates how to rapidly create basic timelines of events, emotions, and topics)
ONE MINUTE MAPPING
For more advanced users looking to create rich interactive thematic maps of the geographic footprint of specific topics using the GKG should explore this tutorial, which presents a terascale mapping solution using BigQuery's User Defined Function (UDF) capability. Combining a JavaScript UDF function with a corresponding SQL query, this tutorial examines the more than one billion location references in the GKG, associates each location mention with the themes mentioned in closest proximity, identifies the locations most closely associated with the theme of interest, and outputs a CSV file suitable for importing directly into CartoDB.com's interactive online mapping platform.
- New One Minute Maps: BigQuery UDF + CartoDB (Source code and step-by-step directions)
- Mapping at Infinite Scale: Terascale and Petascale Cartography and Big Data in the BigQuery Era (High-level summary of the approach)
There are also countless other demonstrations of using the GKG for mapping that use external PERL scripts for the post processing, though all of these can now be completed using the UDF approach above.
These two demos use an external PERL script designed to create thematic country-level maps of the density of particular topics or discourse by country.
- Mapping Greece Through the Eyes of the World
- IRIN News: The Nepal Earthquake at Three Months: Media Fatigue and Bias
And finally this demo, based on the 200-year books collection, showcases how to make an animated map:
ONE CLICK NETWORK VISUALIZATIONS
For those interested in creating network co-occurrence diagrams from the GKG examining the people, organizations, and locations mentioned most frequently with each other in the world's news, there are several tutorials available showcasing different filters and network constructions.
- One-Click Network Visualization With BigQuery+Gephi (Generates Gephi CSV "edges" file formatted for direct one-click importing into Gephi for visualization)
- Visualizing The Global Influencer Network (Shows how to exclude certain coverage from a network and how to create networks about a particular geography or written in a particular language).
- A Network Diagram of Greece July 1-15 (Shows how to use average "tone" as the edge weight, how to construct networks of coverage from particular languages, robustly fetching the language of each article using IFNULL, and using REGEXP_MATCH for more complex filtering like twice-mention)
- Mapping the Geographic Networks of Global Refugee Flows (A more complicated demo that uses Gephi to render a geographically-centered network diagram, which is then imported into Photoshop and overlaid onto a statically rendered CartoDB base layer)
- Getting Started with GDELT + Google Cloud Datalab: Simple Network Visualizations (A demo of using Google Cloud Datalab and GraphViz to rapidly and interactively construct and visualize networks from the GKG)
SENTIMENT ANALYSIS
NGRAMMING
CYCLES OF HISTORY
This demo showcases folding the entire GDELT 1.0 Event dataset on itself to perform 2.5 million correlations in just 2.5 minutes to explore the underlying cycles of world history.
OTHER ANALYSES
- Complex Queries: Combining Events, EventMentions, and GKG (Demonstrates a three-way join of the EVENTS, EVENTMENTIONS, AND GKG tables)
- Using BigQuery To Explore Large Log Files: Exploring the Wayback Machine (Very brief overview of how BigQuery was used to interactive analyze the Wayback Machine snapshot files for a study)