The GDELT Project

ChatGPT, Bard & A Large Language Model (LLM) Future: GDELT + LLMs = Realtime Planetary-Scale Risk Cataloging & Q&A

GDELT today monitors global news coverage in more than 150 languages, with GDELT 3.0 to expand that number to nearly 400, spanning coverage from every corner of the globe in realtime,  creating a planetary-scale realtime digital mirror and observatory of global human society. GDELT's realtime firehoses capture the pulse of the planet, transforming  the planetary firehose of the digital world into a single massive realtime global graph of daily life on Planet Earth. From mapping global conflict and modeling global narratives to providing the data behind one of the earliest alerts of the COVID-19 pandemic, from disaster response to countering wildlife crime, epidemic early warning to food security, estimating realtime global risk to mapping the global flow of ideas and narratives, GDELT powers an ever-growing portion of the global risk analysis and forecasting landscape globally.

Yet, one of the greatest challenges in processing GDELT's planetary-scale data is its sheer scale and scope: within its realtime feeds lie everything from 24/7 updates on events and narratives across the planet to the earliest glimmers of tomorrow's biggest stories. Not even the largest teams of human analysts can examine even a fraction of GDELT's daily monitoring, meaning even the largest organizations are today only able to scratch the surface of what is possible with GDELT.

Over the years we have explored myriad approaches to cataloging GDELT's vast insights, from early grammar-based approaches to statistical, hybrid and neural models, from partitioning, parsing and distillation to more advanced architectures like Transformers. The emergence and maturation of Large Language Models (LLMs) over the past few years has fundamentally altered the state of the possible when it comes to using GDELT to understand global risk. We have been closely tracking the capabilities of these models and their unique strengths in the kinds of flexible and robust guided distillation and codification needed for news cataloging and Q&A, as well as their current limitations.

With the public availability of ChatGPT last year and the forthcoming availability of Bard and other models, LLMs have reached a developmental state where they are now widely accessible and becoming increasingly standardized in capability, with their distillation and generative abilities increasingly paired in ways that allow more robust codification, such as the compilation of tabular extractive summaries of freeform text across languages and writing styles.

We see several key areas of growth for LLMs paired with GDELT:

A typical analytic pipeline might be:

From a technical standpoint, combining GDELT's NGrams 3.0 dataset with LLMs offers the incredibly unique opportunity to perform at-scale realtime classification over global news in realtime. Our recommended workflow is as follows:

We are tremendously excited for the emerging opportunities of LLMs paired with GDELT. Below is a selection of tabular codifications compiled by ChatGPT from GDELT's datasets, both online news and Internet Archive TV News Archive-monitored television transcripts.