Multilingual Source-Country Crossreferencing Dataset

In November of last year we released a crossreference dataset estimating the country of origin for all major English-language news outlets monitored by GDELT.  This past February we announced that this algorithm was now running live on all global coverage monitored by GDELT in all 65 languages that it live-translates.

Over the past several months we've received numerous requests to make this lookup file available to allow filtering by source origin, such as examining how French media is covering EU hegemony compared with the Greek press or to examine Egypt and Saudi Arabia's differing takes on events in Yemen.  Today we are releasing a first version of this lookup, estimating the likely country of origin for all online news outlets monitored by GDELT.  You will note many outlets are missing from this list.  Only outlets with sufficient coverage volume (outlets such as small community publications with a handful of largely non-geographic posts are excluded), sufficient geographic emphasis (a topical outlet that focuses on cyberwar might not be included if it does not reference enough precise geographic locations), and sufficient separation among the top several countries discussed (an outlet might focus on "Eastern Europe" instead of a single country in which case it would not have enough of a focus on a single country compared with the others to be included).  The data for this lookup is based on March 1 through May 4, 2015 and thus may exhibit some skew in smaller or topically-centered outlets that may have emphasized specific countries more heavily during the last two months.

It is also important to note that country of origin is assigned based on the primary geographic focus of the outlet over the time frame monitored (in this case the last two months).  In some regions, outlets tend to emphasize specific highly populated or internationally-connected nations.  Thus, in Africa some outlets are assigned to the larger nations they emphasize – this is why is assigned a location of Nigeria.  In other cases, regional wire services with bureaus in particular countries may have specific wires focused on those countries offering hyperlocal coverage, but will be assigned by GDELT to the largest and highest-volume country the wire offers coverage of.  Regions with a higher prevalence of media partnerships and interchange agreements will see the highest density of such cross-assignments.

However, overall this lookup is believed to offer a reasonable approximation of the geographic country of origin for the world's major online news outlets monitored by GDELT and can be used to perform basic kinds of media localization.  We'd love to hear your feedback on this list, including corrections and ideas for enhancements and we can't wait to see what you all are able to do with it!