The GDELT Project

Mapping Global Happiness 2015-2018 Through 850 Million News Articles

In 2016 and 2017 we explored what it looked like to try and literally map "global happiness" through the eyes of the world's news media. As with television, however, often the most interesting stories are told through through how emotion changes day by day through the eyes of hundreds of millions of news articles from across the globe. Those stories tend to be lost when looking only at annual summaries.

Towards this end we are excited to present our first ever Animated Global Happiness Map, showing how "happy" or "sad" the average tone was of all worldwide news coverage monitored by GDELT each day that mentioned that location. The final animation covers nearly four years of world history through the eyes of more than 7.1 billion location mentions across 850 million news articles in 65 languages. Just a single line of SQL and 2 minutes 14 seconds was all it took for BigQuery to process all 7.1 billion location references (more than 400GB of geographic data in total) to generate the final output file of more than 70 million distinct location-day pairs that were rendered into the final maps.

The final animation can be seen below. It is available in the following formats:

TECHNICAL DETAILS

To create the map above, we simply used the following SQL statement with BigQuery and downloaded the results as a CSV file (see above to download the raw CSV file to make your own map). This was then imported into GDELT's powerful new mapping infrastructure that projected it into Web Mercator (EPSG:3857), styled the points, reprojected them into GraphViz space adjusted for the selected basemap, rendered them using GraphViz's rasterization pipeline, overlaid them on top of the selected basemap (in this case the Carto Positron basemap) and finally appended the date as a text layer overlay. Finally, the PNG sequence was converted into an MPEG movie using ffmpeg. GDELT's new mapping system automates this entire workflow, accepting as input the raw results of the BigQuery query below and automatically distributing the workload in this case across 32 cores and 200GB of RAM backed by Local SSD, outputting to GCS, allowing one-click mapping from BigQuery GDELT queries directly into a final polished animation, uniquely tailored for GDELT's geographic data.

For those interested in mapping only portions of the world, you can use the countrycode and ADM1 codes in the CSV files to filter to geographic subsets.

select DATE, max(latitude) latitude, max(longitude) longitude, max(countrycode) countrycode, max(adm1code) adm1code, featureid, max(featuretype) featuretype, count(1) cnt, avg(tone) tone from (
SELECT REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'(^[0-5])#') as featuretype,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[0-5]#.*?#(.*?)#') as countrycode,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[0-5]#.*?#.*?#(.*?)#') as adm1code,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[0-5]#.*?#.*?#.*?#.*?#(.*?)#') as latitude,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[0-5]#.*?#.*?#.*?#.*?#.*?#(.*?)#') as longitude,
REGEXP_EXTRACT(SPLIT(V2Locations,';'),r'^[0-5]#.*?#.*?#.*?#.*?#.*?#.*?#(.*?)#') as featureid,
FLOAT(REGEXP_REPLACE(V2Tone, r',.*', "")) tone,
substr(string(DATE), 1, 8) DATE
FROM [gdeltv2.gkg]
) where featureid is not null group by DATE, featureid