We are tremendously excited to announce today that over the past 24 hours we have completed the integration of Google Cloud Vision API's new Web Detect capability into the GDELT Visual Knowledge Graph! Web Detect is a truly game-changing new capability for the VGKG in that, combined with our file metadata extraction, we can now estimate whether a news image is truly novel or whether it has appeared heavily in the past (such as a stock photo). This is immensely important when it comes to disaster response (knowing whether images purporting to be flooding in Thailand are from today's flooding or from a past flood), understanding conflict (whether an image of the aftermath of a bombing run is really from this afternoon or whether it was from a past conflict in another country) and so on.
The Web Detect capability of Google's Cloud Vision API performs what amounts to a reverse image search over Google Images, returning a list of other pages on the open web where it has found the image, as well as partial instances of the image where just a portion of the image appears or where the image appears in slightly modified form. This allows you to instantly triage whether a given news image is novel (has never been seen before on the open web by Google) or whether it is a heavily used and popular stock image.
Web Detect also performs topical analysis of the text surrounding the image on all pages it has found it on and uses this text to assign contextual topical labels to the image. These tags are very different than the labels assigned by its machine vision algorithms. The labels refer to what the deep learning models found when they visually analyzed the image content. The Web Detect tags refer to the identified topics appearing frequently in the text immediately surrounding the image – this captures the context of how the image has been used on the web and deeper insight into what is has been used to illustrate and its broader information environment.
We have also added the Cloud Vision API's Crop Hints feature, which performs visual analysis to identify the most visually important area of the image and returns a bounding box capturing this segment. This can be used to better understand framing and visual importance.
We are tremendously excited to see what you are able to do with this new information!
To explore the new Web Detection data, you can use the following BigQuery query to extract the most recent 10,000 images and their associated Web Detection results from the past hour (using Table Decorators to reduce the amount of data the query has to touch):
SELECT ImageURL, JSON_EXTRACT(RawJSON, "$.responses.webDetection"), DocumentIdentifier FROM [gdelt-bq:gdeltv2.cloudvision@-3600000-] where RawJSON like '%webDetection%' order by DATE desc limit 10000