Television News GEG: How Similar Is The Coverage On The Major Television News Stations?

How similar is the coverage across the major television news stations? When you turn on CNN, MSNBC or Fox News, BBC News or ABC, CBS or NBC evening news, do you see fundamentally different coverage across parallel universes or do they largely cover the same *topics* even if their take on those topics may be different?

Using the Television News Entity Graph of all well known entities mentioned on the seven stations 2009-2019 (though BBC News runs 2017-2019), compiled by the Cloud Natural Language API reading through the broadcast transcripts from the Internet Archive's Television News Archive, answering this question is as simple as computing the Pearson correlation among their entity lists. While a coarse measurement and limited by the entities the API was able to find in the all-capitalized transcripts, the results are none-the-less instructive both as a template for assessing how similar television news outlets are and for the interesting findings it yields.

The final results can be seen in the table below. In short, the topics and things mentioned on CNN, MSNBC and Fox News over the past decade are correlated around r=0.94 to r=0.96, meaning they are about as similar as you can get under these circumstances. In short, the stations largely talk about the same things, even if they may have different takes on how they present those topics. In contrast, the ABC, CBS and NBC evening news broadcasts of the past decade show substantial differences, with r values around 0.6, suggesting they are still quite similar, but also far more different than their 24/7 counterparts. One possible explanation is that their brief 30 minute timespan means their editorial decisions of what to cover are far more explicit, whereas the 24/7 nation of cable news affords the opportunity for stations to cover a much broader range of topics. BBC News is unsurprisingly a major outlier, with r=0.47 to r=0.58, offering a reminder of just how much news is localized to the nationality of its viewership. Domestic US politics are of less interest to British viewers, while domestic British issues are of less interest to US viewers.

CNN MSNBC FOXNEWS BBCNEWS ABC CBS NBC
CNN 0.939947 0.944733 0.583303 0.737366 0.733381 0.745449
MSNBC 0.963659 0.523607 0.702871 0.712269 0.748477
FOXNEWS 0.545957 0.726129 0.728376 0.73526
BBCNEWS 0.474261 0.467392 0.483805
ABC 0.618929 0.636899
CBS 0.616032
NBC

Limiting the results to just January 1, 2017 to present, to compare BBC News only against the same three year period as its US counterparts, yields the results below. It appears the stark divide between the topics and things discussed on BBC News and those discussed on the US stations is very real.

CNN MSNBC FOXNEWS BBCNEWS ABC CBS NBC
CNN 0.966343 0.942596 0.553148 0.79835 0.704016 0.694035
MSNBC 0.94531 0.518697 0.775983 0.691765 0.712379
FOXNEWS 0.544424 0.782886 0.698453 0.686777
BBCNEWS 0.519824 0.446502 0.465515
ABC 0.660534 0.6668
CBS 0.569109
NBC

The ability of machines to read through a pile of transcripts, identify the topics and things they discuss, then use a single SQL query to compare those entity lists reminds us of the immense power that lies in combining a quantitative mindset with deep learning and advanced analytics.

TECHNICAL DETAILS

To create the tables above required just a single SQL query. While at first glance it might appear complex, in reality it is comprised of three simple parts. The innermost query simply flattens the table to convert each entity to its own row and filters to only those entities with MID codes. The second block uses a set of "union all" joins to combine the per-station results into a single table. The final list of "corr" statements performs the station-by-station correlations.

WITH data AS (
SELECT Station, entities.mid entity, count(1) count FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities where entities.mid is not null group by Station, entities.mid 
)

select
corr(CNN,MSNBC) CNN_MSNBC,
corr(CNN,FOXNEWS) CNN_FOXNEWS,
corr(CNN,BBCNEWS) CNN_BBCNEWS,
corr(CNN,ABC) CNN_ABC,
corr(CNN,CBS) CNN_CBS,
corr(CNN,NBC) CNN_NBC,
corr(MSNBC,FOXNEWS) MSNBC_FOXNEWS,
corr(MSNBC,BBCNEWS) MSNBC_BBCNEWS,
corr(MSNBC,ABC) MSNBC_ABC,
corr(MSNBC,CBS) MSNBC_CBS,
corr(MSNBC,NBC) MSNBC_NBC,
corr(FOXNEWS,BBCNEWS) FOXNEWS_BBCNEWS,
corr(FOXNEWS,ABC) FOXNEWS_ABC,
corr(FOXNEWS,CBS) FOXNEWS_CBS,
corr(FOXNEWS,NBC) FOXNEWS_NBC,
corr(BBCNEWS,ABC) BBCNEWS_ABC,
corr(BBCNEWS,CBS) BBCNEWS_CBS,
corr(BBCNEWS,NBC) BBCNEWS_NBC,
corr(ABC,CBS) ABC_CBS,
corr(ABC,NBC) ABC_NBC,
corr(CBS,NBC) CBS_NBC
from (

select entity, SUM(CNN) CNN, SUM(MSNBC) MSNBC, SUM(FOXNEWS) FOXNEWS, SUM(BBCNEWS) BBCNEWS, SUM(ABC) ABC, SUM(CBS) CBS, SUM(NBC) NBC from (
(SELECT entity, count CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='CNN')
UNION ALL
(SELECT entity, 0 CNN, count MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='MSNBC')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, count FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='FOXNEWS')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, count BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='BBCNEWS')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, count ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='KGO')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, count CBS, 0 NBC FROM data where count > 10 and Station='KPIX')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, count NBC FROM data where count > 10 and Station='KNTV')
) GROUP BY entity

)

For those interested in seeing the actual table used for the correlations, the following query generates the human-friendly correlation list.

WITH data AS (
SELECT Station, APPROX_TOP_COUNT(entities.name, 1)[OFFSET(0)].value entity,
entities.mid mid,
APPROX_TOP_COUNT(entities.type, 1)[OFFSET(0)].value type,
APPROX_TOP_COUNT(entities.wikipediaUrl, 1)[OFFSET(0)].value wikipediaurl,
avg(avgSalience) avgsalience,
count(1) count 
 FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities
where entities.mid is not null
group by Station, entities.mid order by Count desc
)
select entity, SUM(CNN) CNN, SUM(MSNBC) MSNBC, SUM(FOXNEWS) FOXNEWS, SUM(BBCNEWS) BBCNEWS, SUM(ABC) ABC, SUM(CBS) CBS, SUM(NBC) NBC from (
(SELECT entity, count CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='CNN')
UNION ALL
(SELECT entity, 0 CNN, count MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='MSNBC')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, count FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='FOXNEWS')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, count BBCNEWS, 0 ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='BBCNEWS')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, count ABC, 0 CBS, 0 NBC FROM data where count > 10 and Station='KGO')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, count CBS, 0 NBC FROM data where count > 10 and Station='KPIX')
UNION ALL
(SELECT entity, 0 CNN, 0 MSNBC, 0 FOXNEWS, 0 BBCNEWS, 0 ABC, 0 CBS, count NBC FROM data where count > 10 and Station='KNTV')
) GROUP BY entity

Happy querying!