Over the past year we've been hearing from more and more of you that you need to be able to geographically cluster GDELT by Second Order Administrative Division (ADM2). GDELT today relies on the GNS and GNIS gazeteers to recognize and geocode geographic locations mentioned in the world's news each day. GNIS, which focuses exclusively on the United States, offers ADM2 (county) information for each feature in the gazeteer. GNS, however, does not provide ADM2 information, meaning that up to this point GDELT has only been able to offer ADM1 information for geographic matches. (Learn more about GNS and GNIS and their features and coverage).
Few other geographic gazetteers provide ADM2 assignments and those that do tend to provide it for only a limited subset of features in a small number of countries. One popular gazetteer that aggregates and enriches GNS provides ADM2 assignments for less than 40 countries at only around 50% feature coverage, with 75% or greater coverage for just 10 countries. Thus, one of the historic challenges in providing ADM2 assignments in GDELT's geographic fields has been the lack of gazetteers offering high-coverage global ADM2 information. Users needing to cross-walk GDELT geographic locations into ADM2 codes have been forced to search for ADM2 Shapefiles for their countries of interest and perform computationally expensive point-in-polygon analysis to compute ADM2 assignments themselves.
Fortunately, it turns out that the Food and Agriculture Organization of the United Nations (FAO) has created an incredible dataset called the "Global Administrative Unit Layers" (GAUL), which "compiles and disseminates the best available information on administrative units for all the countries in the world." The GAUL dataset includes a massive Shapefile layer providing polygonal boundaries for every ADM2 in the world in a single normalized and consistent format.
To add ADM2 information to GNS required compiling all 5+ million unique geographic coordinates from GNS and performing a massive fully automated point-in-polygon merge with GAUL to compute their individual ADM2 memberships. The lookup files at the bottom of this page contain the final output of this process. Note that the final results were not manually corrected in any way and that there are a number of sources of possible error in this process. GNS records each feature as a single point indicating its geographic centroid. It is possible that in some cases large irregularly-shaped features may have a centroid which is sufficiently offset as to be counted by this process into a neighboring ADM2. Numeric precision errors may cause features with centroids right on the boundary of two ADM2's to be incorrectly assigned. Features which span multiple ADM2's are assigned to the ADM2 containing its GNS-provided geographic centroid. Thus, it is possible for a certain degree of error to be present in the final crosswalk and users should take this into consideration. A manual review of 1,000 randomly selected points yielded ADM2 assignments matching their official government assignments in those countries, so it is believed that the majority of the results of this crosswalk should be accurate. Users should refer to the GAUL documentation for its handling of disputed boundaries.
GAUL assigns a unique numeric identifier to each ADM2 and provides the division's romanized name. Entries for which ADM2 assignment is not available will still be issued a numeric code by GAUL, but the textual name will be “Administrative unit not available” – this may also occur in the ADM1 field . Other textual values for the ADM2 field can include “Name Unknown” and “Area without administration at 2nd level.” To assist in cross-walking with other datasets making use of the GAUL codes, the crosswalk file provided below also contains the numeric GAUL Country Code and First Order Administrative divisions for each feature as well, along with their romanized names as provided by GAUL. The GAUL Shapefile makes it easy to construct ADM2-level choropleth maps from GDELT simply by merging on the numeric GAUL ADM2 code. To download a copy of the GAUL Shapefile, please see the contact information available on the GAUL website.
While GAUL also covers the United States, since GDELT uses GNIS as its gazetteer for US features, it was desired to use the GNIS ADM2 codes for US features to ensure consistency with other GNIS-based datasets. Thus, the 3-digit numeric ADM2 codes in the GNIS crosswalk below are those found in GNIS (INCITS 31:200x standard), indicating the county it resides in, and prefaced by its two-character abbreviation, along with the county's formal name as provided by GNIS. Users combining the GAUL-derived GNS ADM2 crosswalk with the GNIS ADM2 crosswalk should take into consideration the divergent namespace of the ADM2 codes (GAUL-assigned numeric codes from GAUL for GNS and 3-digit numeric INCITS 31:200x codes for GNIS) and prepend the country code to the ADM2 code when merging the two datasets within a geodatabase.
- GNS ADM2 Crosswalk Compiled from GAUL. (139.25MB) Source of Administrative boundaries: The Global Administrative Unit Layers (GAUL) dataset, implemented by FAO within the CountrySTAT and Agricultural Market Information System (AMIS) projects. Available online: http://www.fao.org/geonetwork/srv/en/metadata.show?id=12691
- GNIS ADM2 Crosswalk Compiled from GNIS. (29.19MB) Source of Administrative boundaries: The Geographic Names Information System (GNIS) dataset. Available online: http://geonames.usgs.gov/domestic/index.html