The GDELT Project

Comparing Word Usage Across Television News Stations

Are there words that appear more often on one television news station than another? To explore this question in more detail, the Television News Ngrams dataset was processed to identify the 63,784 words that appeared at least 100 times in the past decade on CNN, MSNBC or Fox News and construct a table showing how often each was mentioned across the three stations. Such a dataset could be used to compare word affinity, to learn the words that are more common on one station compared with the others.

The top 15 most common words across the three stations can be seen below.

Word CNN MSNBC FOXNEWS
the
31033314
31791602
30302778
to
18558092
19421445
17442140
and
15270559
15423652
14699850
a
13982126
14152248
13495076
of
13074610
13456480
11833186
that
10200926
10749417
9457552
in
10179932
10512601
9396239
is
9146218
9153095
10052914
you
9308473
9232848
9588556
i
7284656
8108468
7274558
it
6873654
6978404
7086300
this
6376440
6277189
5616418
for
5627181
5989439
5404281
on
5017853
5343884
5131636
we
4865441
4875525
5339444

Using this dataset it is possible to see that the word "Schiff" (likely referring to US Representative Adam Schiff) appeared on CNN 9,203 times, MSNBC 10,798 times and Fox News 14,823 times. Similarly, former Iranian president (Mahmoud) "Ahmadinejad" was mentioned 3,126 times on CNN, 1,732 times on MSNBC and 4,293 times on Fox News. In contrast, CNN mentioned "Beijing" the most, with 12,121 mentions compared with 4,069 references on MSNBC and 5,246 on Fox News.

CNN favors the word "reporter" by a long margin, mentioning it 735,003 times compared to MSNBC's 296,973 and Fox News' 253,208 mentions, while also mentioning "bulletin" the most, at 4,712 times compared with 643 mentions on MSNBC and 1,256 mentions on Fox News. The words "viewers" (52,117 CNN mentions, 19,160 MSNBC mentions and 31,311 Fox News mentions) and "correspondents" (7,086 CNN mentions, 2,827 MSNBC mentions and 2,618 Fox News mentions) are also clear CNN favorites.

The final CSV dataset can be downloaded for further analysis:

TECHNICAL DETAILS

Constructing this table required just a single SQL query.

select WORD, CNN, MSNBC, FOXNEWS, CNN+MSNBC+FOXNEWS TOT from (
select WORD, SUM(CNN) CNN, SUM(MSNBC) MSNBC, SUM(FOXNEWS) FOXNEWS from (
(SELECT WORD, COUNT CNN, 0 MSNBC, 0 FOXNEWS FROM `gdelt-bq.gdeltv2.iatv_1grams` WHERE STATION='CNN')
UNION ALL
(SELECT WORD, 0 CNN, COUNT MSNBC, 0 FOXNEWS FROM `gdelt-bq.gdeltv2.iatv_1grams` WHERE STATION='MSNBC')
UNION ALL
(SELECT WORD, 0 CNN, 0 MSNBC, COUNT FOXNEWS FROM `gdelt-bq.gdeltv2.iatv_1grams` WHERE STATION='FOXNEWS')
) GROUP BY WORD 
) WHERE (CNN>100 OR MSNBC>100 OR FOXNEWS>100) order by TOT desc