The GDELT Project

Global Relationship Graph: Open IE 5.1 Sample Dataset

This past July we debuted the Global Relationship Graph (GRG), an experimental new initiative in codifying the factual claims and relationships made in the global press each day. Reading the world's news each day, what is the understanding of the world one would be expected to come away with? Most importantly, if we could codify news media with sufficient accuracy and resolution, could we autonomously identify contested narratives and tie news coverage more closely to related fact checks?

In July we released two small experimental datasets based on Google's Natural Language API, one using verb-centered ngrams and the other walking the dependency graphs of each sentence. Today we are excited to announce a third pilot dataset, created using Open IE 5.1 by the University of Washington and Indian Institute of Technology, Delhi.

Open IE attempts to codify the relations expressed in a given text, converting a sentence into a series of claims and relationships inferred or explicitly stated in it. To explore how it might be used to understand global news data, we processed a small random sample of around 3,000 English language online news articles from October 4, 2020, selecting a Sunday to ensure a mixture of breaking stories and retrospective news coverage on a wide range of topics. All sentences in each article were processed, yielding a total of 201,663 relationships. The precompiled jar and model files were downloaded directly from the Open IE 5.1 GitHub page.

We've reprocessed the Open IE 5.1 output in the following ways to make it simpler to work with and to collapse claims shared across articles:

The final UTF8 newline delimited JSON file format is as follows, with each row being a unique claim found in that given minute of monitoring:

You can download the entire dataset below:

Remember that these results are 100% automated and represent machine codification of news coverage without any human intervention. As with all globally sourced news coverage, claims therein may be true, false, contested or unknown – the goal of this dataset is merely to codify the news as it stands to make such deeper analysis possible.

We're tremendously excited to see what you're able to do this with this new experimental dataset!