Since the debut of the GDELT GEO 2.0 API this past April, we've received a lot of interest in being able to more precisely and accurately map global news imagery processed by the GDELT Visual Knowledge Graph (VGKG) via image georeferencing.
The first version of the GEO 2.0 API georeferenced images by simply associating each image in an article with all of the geographic locations mentioned in the text of that article. While extremely primitive and yielding a fairly high false positive rate, this at least reflected macro-level geographic visual trends, such as a surge in flood-related imagery being included in articles about a certain geographic area. However, the mismatch between the parallel visual and textual geographies of news coverage meant this approach has been fairly limited to date and its high false positive rate rendered it less useful for realtime visual ground truthing.
Thus, we are tremendously excited to announce that over the next 24 hours we are rolling out our completely new generation image georeferencing infrastructure for the GEO 2.0 API. The new system no longer relies on the textual geography of the article and instead makes use of several fields provided by Google's Cloud Vision API via the VGKG.
Any images for which the Cloud Vision API is able to recognize a precise geographic location we now pinpoint on the map and automatically compute country and first order administrative division membership. Note that we use a fast approximate spatial algorithm to compute country and ADM1 membership, so locations right on the border of two countries or ADM1s may occasionally be missassigned.
The Cloud Vision API is currently able to precisely visually geocode only a fairly small percentage of global news images each day, so this by itself is insufficient. Instead, we turn to the Web Detect entities returned for each image by the Cloud Vision API, where is where the API performs the equivalent of a reverse Google Images search and identifies any location where Google Images has seen the image before on the open web. The API then looks across all of those instances to examine the textual captions used to describe the image across those pages, across all of the languages supported by GDELT, and compiles a final list of the topics most frequently used to caption the image. Historically we only made those topics available for searching via the "imagewebtags" field, but with our new image georeferencing infrastructure we are now cross-referencing all of the topics assigned to each image against the Google Knowledge Graph to determine their centroid geographic location and then enriching with country and ADM1 membership as we do with exact location matches. This means that if a photograph appears in a Chinese language news outlet captioned "Flooding in Houston" in Chinese, the Cloud Vision API assigns Web Entity tags of flooding and Houston to the image, which we then cross-reference against the Google Knowledge Graph to determine that "Houston" is a city-level geographic reference and look up its centroid latitude/longitude, estimate its country to be the USA and its first order administrative division to be Texas, and then add the now-georeferenced image to our search store. The ability of the Cloud Vision API to look across all of the captions of all of the instances of a given image it has found on the web allows it to mitigate the impact of individual wrong or information-poor captions or other errors by focusing on the most common topics associated with the image across all of those occurrences. The Cloud Vision API's Web Entities are also based on advanced neural natural language processing of the captioning text, meaning they reflect deep learning-quality topical tagging, rather than simple keyword matching.
Of course, this approach is not perfect. An image might be captioned with locations not directly depicted in the image (for example an image of FEMA headquarters in DC being captioned as "FEMA is working in Houston to assist those affected by Hurricane Harvey" mistakenly resulting in its headquarters building being mapped to Houston). Captions may also be wrong and the Cloud Vision API may make mistakes here and there when attempting to process images and textual captions spanning the daily events of the entire globe. Thus, you will still find error when mapping images in the GEO 2.0 API, but the overall accuracy should now be very good and allow you to do things like create realtime maps of flooding, fire and other natural disasters and see precisely what locations are being featured the most in flood-related imagery.
We're tremendously excited to see what you're able to do with these new capabilities! There is no need to make any changes to your existing applications – your existing searches should simply transparently start being vastly more accurate over the coming 24 hours as it fully comes online!