The GDELT Project

Visualizing The Global Influencer Network

Kalev's latest Forbes piece explores the connections that defined 2015 by constructing a network diagram over the people mentioned in the more than 150 million articles in 65 languages in the GKG thus far this year. The underlying network structure for all of the visualizations was computed using BigQuery and rendered using Gephi.

The query for constructing the overall global network (includes filtering to try and remove most US presidential campaign coverage):

SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM (
SELECT a.name Source, b.name Target, COUNT(*) as Count
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE V2Locations is not null and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' 
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE V2Locations is not null and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' 
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 100000
) c
order by Count Desc

The query for constructing a network of coverage in all languages about Russia:

SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM (
SELECT a.name Source, b.name Target, COUNT(*) as Count
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE V2Locations like '%Russia%Russia%' and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' and TranslationInfo like '%rus%' 
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE V2Locations like '%Russia%Russia%' and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' and TranslationInfo like '%rus%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 500
) c
order by Count Desc

The query for constructing a network of all Russian-language coverage:

SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM (
SELECT a.name Source, b.name Target, COUNT(*) as Count
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE TranslationInfo like '%rus%' 
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE TranslationInfo like '%rus%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 500
) c
order by Count Desc

We hope these serve as templates for your own network explorations!