One-Click Network Visualization With BigQuery+Gephi

For his latest Forbes piece, Kalev took a network visualization developed by the BBVA Research Emerging Markets Unit of how often countries are mentioned together in global news coverage of the Russian economic sanctions, and revisualized it in Gephi using modularity to group it into communities, PageRank to size each node by its importance in the network, and Force Atlas 2 layout to position countries based on how often they appear together.

We have had a tutorial for creating network visualizations of GDELT using BigQuery since this past March, which was ported to Google Cloud Datalab last month, but we've now created a more sophisticated BigQuery SQL query that outputs a CSV file that can now be imported directly into Gephi without having to rewrite the headers and add additional weight columns. Moreover, this new query uses the geographic fields to create a geographic network of how countries are mentioned together in coverage of interest.

The query below searches for all articles mentioning Russia at least twice and containing either the Theme "SANCTIONS" or the word "sanctions" in the URL, compiles a list of the country codes of locations mentioned in each article, joins those together into a final network, selects the top 1000 strongest connections, and cross-walks the country codes into their human-readable names (since the raw V2Locations field might contain "Russia" or "Russian" or "Russians" and so on).

SELECT c.Source SourceCountryCode, d.country Source, c.Target TargetCountryCode, e.country Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM (
SELECT a.name Source, b.name Target, COUNT(*) as Count
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[1-5]#.*?#(.*?)#')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE V2Locations like '%Russia%Russia%' and (V2Themes like '%SANCTIONS%' or DocumentIdentifier like '%sanctions%') ignore case
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[1-5]#.*?#(.*?)#')) name
FROM [gdelt-bq:gdeltv2.gkg] 
WHERE V2Locations like '%Russia%Russia%' and (V2Themes like '%SANCTIONS%' or DocumentIdentifier like '%sanctions%') ignore case
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 1000
) c
JOIN EACH (
select fips, country from [gdelt-bq:extra.countryinfo]) d
ON c.Source = d.fips
JOIN EACH (
select fips, country from [gdelt-bq:extra.countryinfo]) e
ON c.Target = e.fips
order by Count Desc

Once the query is complete, just click on the "Download as CSV" button and follow the remaining instructions (starting with importing the CSV file into Gephi) in the Sample Queries post.