Kalev's latest Forbes piece explores the connections that defined 2015 by constructing a network diagram over the people mentioned in the more than 150 million articles in 65 languages in the GKG thus far this year. The underlying network structure for all of the visualizations was computed using BigQuery and rendered using Gephi.
The query for constructing the overall global network (includes filtering to try and remove most US presidential campaign coverage):
SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM ( SELECT a.name Source, b.name Target, COUNT(*) as Count FROM (FLATTEN( SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name FROM [gdelt-bq:gdeltv2.gkg] WHERE V2Locations is not null and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' ,name)) a JOIN EACH ( SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name FROM [gdelt-bq:gdeltv2.gkg] WHERE V2Locations is not null and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' ) b ON a.GKGRECORDID=b.GKGRECORDID WHERE a.name<b.name GROUP EACH BY 1,2 ORDER BY 3 DESC LIMIT 100000 ) c order by Count Desc
The query for constructing a network of coverage in all languages about Russia:
SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM ( SELECT a.name Source, b.name Target, COUNT(*) as Count FROM (FLATTEN( SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name FROM [gdelt-bq:gdeltv2.gkg] WHERE V2Locations like '%Russia%Russia%' and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' and TranslationInfo like '%rus%' ,name)) a JOIN EACH ( SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name FROM [gdelt-bq:gdeltv2.gkg] WHERE V2Locations like '%Russia%Russia%' and V2Persons not like '%Clinton%' and V2Persons not like '%Trump%' and V2Persons not like '%Bush%' and TranslationInfo like '%rus%' ) b ON a.GKGRECORDID=b.GKGRECORDID WHERE a.name<b.name GROUP EACH BY 1,2 ORDER BY 3 DESC LIMIT 500 ) c order by Count Desc
The query for constructing a network of all Russian-language coverage:
SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM ( SELECT a.name Source, b.name Target, COUNT(*) as Count FROM (FLATTEN( SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name FROM [gdelt-bq:gdeltv2.gkg] WHERE TranslationInfo like '%rus%' ,name)) a JOIN EACH ( SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', '')) name FROM [gdelt-bq:gdeltv2.gkg] WHERE TranslationInfo like '%rus%' ) b ON a.GKGRECORDID=b.GKGRECORDID WHERE a.name<b.name GROUP EACH BY 1,2 ORDER BY 3 DESC LIMIT 500 ) c order by Count Desc
We hope these serve as templates for your own network explorations!