The GDELT Project

Announcing The GDELT Context 2.0 API

We're incredibly excited to announce today the release of the GDELT Context 2.0 API! This newest addition to the stable of GDELT APIs joins the DOC 2.0, GEO 2.0, TV 2.0 and TVAI 2.0 APIs!

The Context 2.0 API is functionally extremely similar to the DOC 2.0 API, with the exception that instead of searching documents, it searches individual sentences and instead of only returning the URL of matching results, it returns a brief snippet of text showing the context of the match, typically the sentence that matched the keyword and a portion of the sentence before and after to offer an understanding of the context of the match. This makes it possible to understand whether an article mentioning "Covid-19" and "pandemic" together for example, is a casual reference to the outbreak or a clinical update on the pandemic's spread.

Instead of searching entire documents, this API requires that all search terms appear in the same sentence together. Thus, while the DOC 2.0 API can return articles that mention "Covid-19" in a sentence at the beginning of an article and "pandemic" at the end of the article, the Context 2.0 API requires that all keywords be contained in the same sentence together to ensure maximal relevancy.

The field of information science incorporates many areas of research related to information relevancy. We envision powerful new kinds of relevancy rankings that can use this additional contextual information together with machine learning approaches like neural or classical language understanding models to find the most relevant articles in response to a user's query. Using this additional contextual information, we expect relevancy filters that can actually model the response snippets and use them to semantically answer a user's question and route them to the most comprehensive and detailed articles supporting those answers, as well as identify contested narratives in which there are fundamentally opposing answers captured in the global news media. Eventually we hope to be able to incorporate these kinds of advanced relevancy models into the DOC 2.0 API in place of the current date and textual relevancy scores.

A maximum of one matching sentence per article will be returned. This means that if a given article contains multiple sentences that match the query, only a single representative sentence will be returned. The specific sentence selected from each article will be ranked by semantic relevance in textual ranking mode or selected at random in date descending mode and may change from query to query. This filtering process is performed after the query has executed, meaning that a request for 75 articles in which 15 results are sentences from articles already in the results will yield 60 results actually returned. Thus, most queries will typically receive fewer than the requested number of articles.

As we scale up the new Context 2.0 API, this inaugural release is limited to searching only the past 72 hours and may exclude some articles that are difficult to segment into sentences or which utilize particularly complex grammars. Thus, it represents a subset of coverage searched by the DOC 2.0 API for now. As the API evolves, these limitations will ease.

QUICK START EXAMPLES

 

FULL DOCUMENTATION