The GDELT Project

A Network Diagram of Greece July 1-15

Using the sample GKG 2.0 BigQuery network query, the following visualization shows the network of the top 1,500 connections among names mentioned in global coverage of the Greek debt crisis in the first half of July.  The actual query examined all coverage mentioning Greece from July 1-15, 2015, and constructed a histogram of all of the names that appear together in coverage, selecting the top 1,500 strongest connections.  Looking at the network below, the diagram yields a pretty robust approximation of the reality of the debt crisis, with all of the major players front and center and connected to each other in the ways you would expect.  So, while this visualization does not yield any groundbreaking new findings that a well-read reader would not have already gathered from following the myriad coverage of the debt crisis, it does reinforce that GDELT captures a strong approximation of the actual leadership dynamics underlying global politics.

 

 

Technical Details

For the technically-minded, the diagram above was made following the Google BigQuery + GKG 2.0 Sample Queries network examples (towards the bottom of that page), using the following query:

SELECT a.name, b.name, COUNT(*) as count
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', ")) name
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20150700000000 and DATE < 20150715000000 and V2Locations like '%Greece%'
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', ")) name
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20150700000000 and DATE < 20150715000000 and V2Locations like '%Greece%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 1500

Several other modifications of the above were tested, such as adding a language filter to each of the two where clauses above – testing " TranslationInfo LIKE '%srclc:ell%' " for Greek, " TranslationInfo LIKE '%srclc:deu%' " for German, and " TranslationInfo LIKE '%srclc:fra%' " for French.

A topical histogram was also tested, to return the top World Bank topics discussed in Greek-related coverage during those days:

SELECT theme, COUNT(*) as count
FROM (
select REGEXP_REPLACE(SPLIT(V2Themes,';'), r',.*', ") theme
from [gdelt-bq:gdeltv2.gkg]
where DATE>20150700000000 and DATE < 20150715000000 and V2Locations like '%Greece%' and TranslationInfo LIKE '%srclc:ell%'
)
group by theme
having theme like 'WB_%'
ORDER BY 2 DESC
LIMIT 15

The results are unsurprising, but reassuring that the approach generates reasonable results:

Category Number Mentions
WB_1104_MACROECONOMIC_VULNERABILITY_AND_DEBT 48156
WB_450_DEBT 47717
WB_696_PUBLIC_SECTOR_MANAGEMENT 40189
WB_2432_FRAGILITY_CONFLICT_AND_VIOLENCE 38134
WB_840_JUSTICE 32349
WB_2470_PEACE_OPERATIONS_AND_CONFLICT_MANAGEMENT 28405
WB_2471_PEACEKEEPING 27019
WB_2473_DIPLOMACY_AND_NEGOTIATIONS 26922
WB_936_ALTERNATIVE_DISPUTE_RESOLUTION 26865
WB_843_DISPUTE_RESOLUTION 26865
WB_939_NEGOTIATION 26734
WB_471_ECONOMIC_GROWTH 13874
WB_135_TRANSPORT 13592
WB_698_TRADE 12943
WB_621_HEALTH_NUTRITION_AND_POPULATION 12862

Finally, a modified version of the network code was tested that returns an approximation of the average tone of each link (in reality it is really an average of the two averages of their independent occurrences):

SELECT a.name, b.name, COUNT(*) as count, (avg(a.tone)+avg(b.tone))/2 as linktone
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', ")) name, FLOAT(REGEXP_EXTRACT(V2Tone, '^(.*?),')) tone
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20150700000000 and DATE < 20150715000000 and V2Locations like '%Greece%'
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*', ")) name, FLOAT(REGEXP_EXTRACT(V2Tone, '^(.*?),')) tone
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20150700000000 and DATE < 20150715000000 and V2Locations like '%Greece%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 1500

Though, in the case of Greece this does not yield sufficient stratification to produce a distinct visual. You could also add additional filters, such as narrowing to just a subset of media outlets by adding a filter like " SourceCommonName in (select domain from [mytable])) " to the two where clauses above (replacing "mytable" with the name of your BigQuery table).  For example, you might filter against the Alexa One Million list to get a more Westernized perspective on a topic.

Instead of a network query, if you just want to fetch a list of URLs of coverage of Greece, you could add "IFNULL(REGEXP_EXTRACT(TranslationInfo, 'srclc:(.*?);'), 'eng') lang" to your select query to return the three-letter language code of the article (automatically recasts nulls to "eng" for English).

If you want to add topical filters to your query, traditionally you might use a "LIKE '%%' " query, but if you want to ensure that there are at least two matches in the document of any of your selected themes, this is not possible with a traditional LIKE (you can easily set up a LIKE to require that a given theme appear twice, but not that any of a set of themes appear twice).  Instead, you can use the REGEXP_MATCH operator "REGEXP_MATCH(V2Themes, '(WB_1104_MACROECONOMIC_VULNERABILITY_AND_DEBT|WB_696_PUBLIC_SECTOR_MANAGEMENT|WB_2473_DIPLOMACY_AND_NEGOTIATIONS|WB_936_ALTERNATIVE_DISPUTE_RESOLUTION).*(WB_1104_MACROECONOMIC_VULNERABILITY_AND_DEBT|WB_696_PUBLIC_SECTOR_MANAGEMENT|WB_2473_DIPLOMACY_AND_NEGOTIATIONS|WB_936_ALTERNATIVE_DISPUTE_RESOLUTION)')".

Happy visualizing!