The GDELT Project

Web News NGrams 3.0 Dataset Historical Backfile: 2020 – Present

To allow researchers to trace the evolution of Covid-19 narratives across the world's news media, we are excited today to announce the release of the first stage of the Web News NGrams 3.0 Dataset historical backfile that extends it from January 1, 2020 through present. This is an extremely large dataset, totaling more than 54TB and 244 billion records and will require substantial computing power to analyze, but will allow public health researchers to trace how the core narratives and events of the pandemic were covered by the world's news media, especially how narratives and events were contextualized and internalized by the media, how guidance was conveyed to the public and digested by newsmakers and how discoveries and events spread.

Learn More.