Yesterday we announced the Television News Ngram 2.0 Dataset. How might we use this dataset to track how often countries are mentioned in the news over time?
Accurately measuring country mentions on television news is a complex process, since mentions of any city within the country should be counted. Mentions of Paris, Nice and the myriad other cities in France commonly mentioned in the news must all be counted, while a mention of a "nice hotel" must be distinguished from a "Nice hotel." Thus, simply counting mentions of country names is an imperfect proxy but offers a quick glimpse of geographic prioritization.
With these caveats in mind, the SQL query below counts up how many times China was mentioned compared with Russia over the past decade on CNN:
SELECT DATE, China, Russia from ( SELECT DATE, sum(COUNT) China, 0 Russia FROM `gdelt-bq.gdeltv2.iatv_1gramsv2` where STATION='CNN' and (NGRAM='china' OR NGRAM='chinese' OR NGRAM='beijing') group by DATE UNION ALL SELECT DATE, 0 China, sum(COUNT) Russia FROM `gdelt-bq.gdeltv2.iatv_1gramsv2` where STATION='CNN' and (NGRAM='russia' OR NGRAM='russian' OR NGRAM='moscow' OR NGRAM='kremlin' OR NGRAM='putin') group by DATE ) order by DATE asc
The final result can be seen in the graph below, showing Russia surging in 2014 and 2015, but really bursting to the scene from 2016 with the election of Donald Trump, giving way to China this year with Covid-19.