The GDELT Project

Using BigQuery To Surface The Dyadic Country Interactions Defining The Last 40 Years

What if, with a single SQL query, we could surface the most common dyadic relationships defining each year of the past 40 years? In short, for each year, find the two actors that appear most commonly together. This may not necessarily be the most intense or meaningful event of the year, but rather reflects what the media chose to emphasize and thus the interactions on the global public consciousness. The final result is the table below:

Year Actor1Name Actor2Name
1979 CHINA VIETNAM
1980 AFGHANISTAN RUSSIA
1981 RUSSIA UNITED STATES
1982 ISRAEL LEBANON
1983 ISRAEL LEBANON
1984 RUSSIA UNITED STATES
1985 RUSSIA UNITED STATES
1986 RUSSIA UNITED STATES
1987 IRAN IRAQ
1988 IRAN IRAQ
1989 ISRAELI PALESTINIAN
1990 IRAQ KUWAIT
1991 IRAQ KUWAIT
1992 ISRAEL PALESTINIAN
1993 ISRAEL PALESTINIAN
1994 ISRAEL JORDAN
1995 ISRAEL SYRIA
1996 ISRAEL PALESTINIAN
1997 ISRAEL PALESTINIAN
1998 ISRAEL PALESTINIAN
1999 ISRAEL PALESTINIAN
2000 ISRAEL PALESTINIAN
2001 ISRAELI PALESTINIAN
2002 ISRAELI PALESTINIAN
2003 IRAQ UNITED STATES
2004 IRAQ UNITED STATES
2005 ISRAEL PALESTINIAN
2006 IRAQ UNITED STATES
2007 IRAQ UNITED STATES
2008 IRAQ UNITED STATES
2009 AFGHANISTAN UNITED STATES
2010 CHINA UNITED STATES
2011 PAKISTAN UNITED STATES
2012 CHINA UNITED STATES
2013 RUSSIA UNITED STATES
2014 RUSSIA UKRAINE
2015 IRAN UNITED STATES
2016 RUSSIA UNITED STATES
2017 RUSSIA UNITED STATES
2018 RUSSIA UNITED STATES
2019 RUSSIA UNITED STATES
2020 IRAN UNITED STATES

TECHNICAL DETAILS

Constructing this table took just a single SQL query and 3 seconds of query time in BigQuery. The "UNION ALL" is used to combine two versions of the same base query, one that selects Actor1 and Actor2's names in correct order and one that reverses their order so that an event that encodes A doing something to B is counted the same as B doing something to A, to ensure that both directions of dyadic relationships are captured. The outer query wrapper converts these results to a ranking by count and teh final result filters these down to just the top by year.

Simply by changing "where rank=1" to where rank<=5" will yield the top 5 relationships per year and so on. You can also adjust the query terms to return just the top relationships for a particular country for each year and so on.

SELECT Year, Actor1Name, Actor2Name, Count FROM (
  SELECT Actor1Name, Actor2Name, Year, COUNT(1) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count(1) DESC) rank FROM (
    (SELECT Actor1Name, Actor2Name,  Year FROM `gdelt-bq.full.events` WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode)
    UNION ALL
    (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM `gdelt-bq.full.events` WHERE Actor1Name > Actor2Name  and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode)
)  WHERE Actor1Name IS NOT null AND Actor2Name IS NOT null GROUP BY 1, 2, 3 HAVING Count > 100
) WHERE rank=1 ORDER BY Year