Visualizing The Most Important Relationships of 2009-2019 Through The Eyes of TV News

Having had Google's Cloud Natural Language API non-consumptively read through the transcripts of a decade of television news from the Internet Archive's Television News Archive and catalog all of the people, organizations, locations, events, publications, dates and other entities it saw, how might we use this data to visualize the most important connections of the past decade, as seen through television news? It turns out that with a single SQL query in BigQuery and Gephi we can visualize the relationships that defined the past decade.

All Stations

Looking across CNN, MSNBC, Fox News and the evening news broadcasts of ABC, CBS and NBC 2009-2019 and BBC News 2017-2019, what are the top 1,500 most commonly co-occurring pairs of entities? Here we define co-occurrence as the pair of names appearing in the same one-minute window and limit entities to those Google has assigned a unique identifier (MID code) to.

The final graph can be seen below.

Constructing this graph required just a single query, modified from the original by Felipe Hoffa:

SELECT Source, Target, Count RawCount, "Undirected" Type, ( Count/SUM(Count) OVER () ) Weight FROM (
SELECT a.entity Source, b.entity Target, COUNT(*) as Count
FROM (
 (SELECT date, entities.name entity FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities where entities.mid is not null)
) a
JOIN (
 (SELECT date, entities.name entity FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities where entities.mid is not null)
) b
ON a.date=b.date
WHERE a.entity<b.entity
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1500
)
order by Count Desc

What if we expanded the co-occurrence window to one hour? The result is below:

 

Using BigQuery's built-in timestamp truncation operator TIMESTAMP_TRUNC(), this is a cinch:

SELECT Source, Target, Count RawCount, "Undirected" Type, ( Count/SUM(Count) OVER () ) Weight FROM (
SELECT a.entity Source, b.entity Target, COUNT(*) as Count
FROM (
 (SELECT TIMESTAMP_TRUNC(date, HOUR) date, entities.name entity FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities where entities.mid is not null)
) a
JOIN (
 (SELECT TIMESTAMP_TRUNC(date, HOUR) date, entities.name entity FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities where entities.mid is not null)
) b
ON a.date=b.date
WHERE a.entity<b.entity
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1500
)
order by Count Desc

CNN

 

What if we narrow our gaze just to those relationships captured through CNN's reporting of the past decade? Here once again we use a co-occurrence time interval of one minute to ensure maximum relatedness.

As before, the only change to our query was adding a station filter:

SELECT Source, Target, Count RawCount, "Undirected" Type, ( Count/SUM(Count) OVER () ) Weight FROM (
SELECT a.entity Source, b.entity Target, COUNT(*) as Count
FROM (
 (SELECT date, entities.name entity FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities where entities.mid is not null and Station='CNN')
) a
JOIN (
 (SELECT date, entities.name entity FROM `gdelt-bq.gdeltv2.geg_iatv`, unnest(entities) entities where entities.mid is not null and Station='CNN')
) b
ON a.date=b.date
WHERE a.entity<b.entity
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1500
)
order by Count Desc

MSNBC

 

Here is the graph for MSNBC, using the same query:

Fox News

And the same for Fox News:

KGO (ABC)

What about the evening news broadcasts of ABC over the past decade? Here there is a much greater emphasis on the network's personalities.

KPIX (CBS)

And similarly for CBS evening news:

KNTV (NBC)

And for NBC:

BBC News

Finally, BBC News offers a very different take on the most important relationships of the past 3 years, offering a reminder of just how geographically focused the news is.

Once again, the power of Cloud Natural Language API and BigQuery, with open source Gephi creating the visualizations, offers an incredibly powerful lens through which to explore the patterns of global society.