Global Similarity Graph: Visualizing Language Overlap

Using the new Global Similarity Graph, a single SQL query can visualize language overlap using BigQuery + Gephi. Note that since this analysis is based on a single day of coverage and requires at least five overlapping stories between each language, the results here are only exploratory and a larger time horizon would be required for more concrete results. Moreover, the impact of linguistic similarity and translation error may also impact these results, but we hope they offer a template for the kinds of more advanced analyses that can be performed using the new GSG.

select Source, Target, "Undirected" Type, avg(simScore) Weight from (
    SELECT IF(fromLang<toLang, fromLang, toLang) Source, IF(fromLang<toLang, toLang, fromLang) Target, simScore FROM `gdelt-bq.gdeltv2.gsg` WHERE fromLang != toLang and DATE(fromDate) = "2021-07-02"
) group by Source, Target having count(1) > 5 order by Weight desc limit 10000

You can see the resulting graph below: