The GDELT Project

Campaign 2020: Getting Started With GDELT For Tracking The US Presidential Race

As the United States' 2020 presidential race begins in earnest, here are just a few of the ways GDELT can be used to track how the media is covering the candidates and the race as a whole.

Television Coverage

Television news coverage of the candidates and the broader race is most easily explored through GDELT Summary's Television Explorer, which provides a user-friendly interface to the underlying Television 2.0 API. The Television Explorer allows you to keyword search the raw closed captioning of the monitored stations, getting back a timeline of how much the keyword was mentioned, a comparison of how much each station covered it, a word cloud of common co-occurring words and a list of top clips matching the keyword.

The Television Explorer searches an archive of more than 5.7 billion words of closed captioning from 163 distinct stations spanning July 2009 to present, from major national stations like CNN, MSNBC, Fox News and Bloomberg to local affiliates like CBS, NBC and ABC to international stations like Al Jazeera, BBC News, DeutscheWelle and Russia Today (not all stations are monitored for the entire 2009-present period: see the stations list in the Television Explorer to see the monitoring range for each station). Note that there is a rolling 24 hour embargo, so the Television Explorer can only search up to the most recent 24 hours ago.

Example Analyses.

Data Access.

The easiest way to search television is just to interactively search the Television Explorer, using the results as-is or downloading them as CSV files to plot in Excel, import into statistical software, etc, or download programatically on a regular basis:

Comparing Queries.

Television Explorer makes it easy to compare up to four queries at once with a built-in Comparison Visualization mode. Specify up to four queries to view them combined on a single graph or specify two queries to view one as a percent of the other (for example to see what percent of coverage Hillary Clinton mentioned her emails by using "clinton (email OR emails OR server)" as the first query and "clinton" as the second query).

The Comparison Visualization mode runs interactively inside your browser, running each query and combining the results dynamically. This makes it trivial to test out different ideas to look for interesting results. The final results can be downloaded as a CSV file for further analysis. Note that since the Comparison Visualization runs in the browser, there is no URL that can be scripted via Python/etc. Instead, to run a comparison on a regular basis programatically, you would simply fetch the individual queries yourself and combine their results using your own code.

Online News Coverage

GDELT Summary can also be used to report online media coverage of each candidate or issue very similarly to the Television Explorer (it is a wrapper around the DOC 2.0 API). At this time online coverage is searchable only back to January 1, 2017, but the full historical backfile will be available shortly, allowing you to compare online coverage over the same time period as television coverage.

Emotion Mining

Using GDELT Summary's fulltext search to identify online news coverage mentioning a given candidate, it is possible to explore the deeper emotional currents surrounding coverage of that candidate by taking the list of matching URLs and cross-referencing them against the GDELT Global Knowledge Graph (GKG) 2.0. Within the GKG record for each article, extract the Global Content Analysis Measures (GCAM) field, which records thousands of complex emotions and topics. GCAM records emotions at an article level, rather than entity level, but nevertheless allows some exceptionally powerful analyses of the emotional undercurrents of the 2020 race.

Frontpage Coverage

Every hour on the hour since March 2018, GDELT has crawled the homepages of around 50,000 news outlets worldwide and recorded all of their links and link text in the order they appear on the page. This makes it possible to see which stories are making it to the world's homepages and how often a candidate is frontpage material.

Stealth Editing And Rewriting

GDELT recrawls each online news article it monitors after 24 hours and after one week, comparing it against the article's contents when it first saw it and recording any changes. Article deletions, redirects, title and body changes are all logged. This is a highly experimental dataset and additional filtering and manual review is recommended to confirm results for maximal accuracy, but can offer a powerful tool for identifying changing narratives around candidates.

Visual Analysis

Finally, a highly experimental approach to understanding the candidates can come from analyzing their visual depictions in online news coverage. Each day up to 750,000 randomly selected images are processed from all of the worldwide online news coverage monitored by GDELT and run through Google's Cloud Vision API deep learning algorithms. Among the available fields are Google's Web Entities field, which processes the caption of the image to identify captions that mention a particular public figure (Google does not perform facial recognition so it will only flag that a public figure was mentioned in the caption, it cannot identify whether they actually appeared in the image itself). This database, totaling nearly half a billion images and stretching back to December 2015 opens a huge number of possibilities for understanding the visual narratives of the race.