Automatic Source Location Contextualization for Geocoding Local News

As we gear up for the imminent debut of GDELT 2.0, we are tremendously excited to announce a transformative new addition to GDELT's geographic processing: GDELT's geocoder now estimates the geographic location of each news outlet in the world by monitoring its primary locative focus, and then uses this information to assist the geocoder in processing local and regional press across the globe by helping it understand the "shared contextual background" required to understand each article.

When moving beyond the international Western news outlets that form the basis of much of the current work on watchboarding and political risk assessment, one finds that, as one might expect, local news outlets throughout the world make assumptions about the geographic knowledge of their readers.  A small rural radio station in Guinea, for example, would likely refer only to "Kankan", while a Western news outlet would likely assume its readership is unfamiliar with the geography of Guinea and assist them by listing the city as "Kankan, Guinea", sometimes going as far as to write "Kankan, Guinea, in Western Africa". Similarly, the News Gazette in Champaign, Illinois, frequently refers to "Urbana", assuming its readership will understand all references to "Urbana" mean the neighboring city in Illinois, not the Urbana, also located in a Champaign County, in Ohio.  In this case, the title of the newspaper, the "News Gazette", offers no hint as to whether it is based in Illinois or Ohio.  Disambiguating geographic mentions in the News Gazette requires knowing that it is physically based in Champaign, Illinois in the United States.  Historically the geocoding infrastructure used by GDELT has not taken into account the physical location that source material was published in, since this is frequently not known – GDELT has only a URL or name of a radio or television station to work with.  Indeed, estimating the location of a news outlet is extremely complex.

With the debut of GDELT 2.0, GDELT now continually examines the geographic footprint of each outlet it monitors across the world on an ongoing basis, attempting to further and further refine its estimate of that outlet's physical location or location of focus.  This information is propagated to the geocoding infrastructure and used in the geocoding process to assist in disambiguating locative mentions in that outlet.

To our knowledge this is one of the first deployments of a geocoding infrastructure that monitors global source material in realtime, estimating the ambiguity and confidence of locative references from each source, using that information to continually refine an estimated location of that outlet, and then feeding that information back into the geocoding process to assist with disambiguation and contextual background knowledge.  Already we are seeing an incredible increase in geographic recovery, especially from local broadcast, print and web outlets across Africa and Latin America, where upwards of 30-60% or greater of coverage of certain topics and types of events (such as small local protests) make assumptions of geographic context that do not allow them to be geocoded without incorporating the shared background context of the outlet's geographic location.

We are incredibly excited about the vastly expanded geographic recovery that this new system is providing GDELT 2.0 and we can't wait to see what you are able to do with it when GDELT 2.0 launches shortly!