Mapping The 2015 Media Geography In One Line of SQL

Kalev's latest Forbes piece includes an interactive clickable map of the major news outlets covering each location on earth, as seen through the more than 1.48 billion geographic mentions across nearly 200 million news articles monitored by GDELT in 2015. Generating this map with BigQuery took just a single line of SQL and less than 2 minutes of compute time, yielding a CSV file that could be imported directly into CartoDB to create an interactive clickable map.

2016-mapping-the-2015-media-geography

TECHNICAL DETAILS

The final query to generate the map is:

select min(location) Location, min(latitude) Latitude, min(longitude) Longitude, GROUP_CONCAT_UNQUOTED(UNIQUE(domain),'<BR>') DomainList, max(tot) TotalArts from (

select REGEXP_REPLACE(location, r'^-', '') location, latitude, longitude, featureid, domain, rank, cnt, tot from (

select location, latitude, longitude, featureid, CONCAT('<a href="http://', domain, '" target="blank">', domain , '</a>') domain, cnt, DENSE_RANK() OVER (PARTITION BY featureid ORDER BY cnt DESC) rank, sum(cnt) over (PARTITION BY featureid) tot from (

select min(location) location, max(latitude) latitude, max(longitude) longitude, featureid, domain, count(1) cnt from (

SELECT REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[2-5]#(.*?)#') as location,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[2-5]#.*?#.*?#.*?#.*?#(.*?)#') as latitude,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[2-5]#.*?#.*?#.*?#.*?#.*?#(.*?)#') as longitude,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[2-5]#.*?#.*?#.*?#.*?#.*?#.*?#(.*?)#') as featureid,
IFNULL(DOMAIN(DocumentIdentifier), 'broadcast_source') as domain
FROM [gdeltv2.gkg] 

) where featureid is not null group by featureid, domain 

)

) where tot > 50 and rank <=5 order by featureid, rank asc

) group by featureid