The GDELT Project

  • The GDELT Project Blog
  • Website

Television News Ngram 2.0 Dataset: 16.9 Billion Records

 June 26, 2020

Just how large is Television News Ngram 2.0 Dataset? In all, the dataset comprises 3.12 billion unigrams, 4.98 billion bigrams, 4.1 billion trigrams, 2.9 billion quadgrams and 1.8 billion 5-grams (remember that word shingles do not span across punctuation for our dataset). Thus, in all the complete dataset of 1-5 grams comprises just under 16.9 billion records!

Learn More.

Post navigation

← Antifa & Terrorism Associations In The Media
ETH Zurich: The Effect Of Fox News On Health Behavior During COVID-19 →

Archives

The Official GDELT Project Blog