Kalev's latest piece for Forbes explores how GDELT can be used to assess Turkey's physical stability using the GDELT Event database and various societal-scale emotional dimensions like "tone" and "anxiety" using GDELT GKG's GCAM system.
For those interested in running their own analyses for other countries or expanding the analysis using other emotional dimensions or event types, the following BigQuery SQL queries were used to create the graphs in the paper.
To create the physical stability timeline, this query was used to construct a normalized by-day timeline for all Turkish events having a QuadClass of 4 (Material Conflict):
SELECT SQLDATE, INTEGER(norm*100000)/1000 Percent
SELECT ActionGeo_CountryCode, QuadClass, SQLDATE, COUNT(1) AS c, RATIO_TO_REPORT(c) OVER(PARTITION BY SQLDATE ORDER BY c DESC) norm FROM [gdelt-bq:gdeltv2.events]
GROUP BY ActionGeo_CountryCode, QuadClass, SQLDATE
WHERE ActionGeo_CountryCode='TU' and QuadClass = 4
ORDER BY ActionGeo_CountryCode, QuadClass, SQLDATE;
The query automatically normalizes its results and presents the final counts as a percentage of all events, taking into account the variance in the day-to-day raw event counts and yielding a fully normalized "conflict intensity" score for Turkey by day over the last seven months.
To generate the two emotional graphs, the following GKG query was used:
SELECT substr(string(DATE),0,8) as daydate, count(1), avg(float(REGEXP_REPLACE(V2Tone, r',.*', ""))) tone, avg(float(REGEXP_EXTRACT(GCAM, r'v19.1:([-d.]+)'))) anew,sum(integer(REGEXP_EXTRACT(GCAM, r'c8.3:([-d.]+)'))) ridanxietycnt, sum(integer(REGEXP_EXTRACT(GCAM, r'wc:(d+)'))) wordcount FROM [gdeltv2.gkg] where V2Locations like '%Turkey%Turkey%' group by daydate
This query uses regular expressions to parse out the ANEW valiance/tone score (which is already in a scaled form so can be averaged and used as-is) and the RID "anxiety" dimension, which, unlike ANEW, is reported as the number of words in the document that are found in the RID Anxiety dictionary. Once you download the results of the above query in CSV format into Microsoft Excel or other spreadsheet package, you can use the ANEW score as-is, but you should divide the "ridanxietycnt" (in this case the sum total number of words of all Turkey-related articles published in a given day that are anxiety-related by "wordcount" (the total number of all words in those articles), yielding the percentage of all words in Turkey-related articles each day that were reflective of the emotion "anxiety." So, in other words, in Excel, you would create a new column called "%Anxiety" and set its formula to "=ridanxietycnt/wordcount" (replacing those two variables with their corresponding column names in your spreadsheet).