We’re enormously excited to announce today a transformative moment for GDELT: the alpha release of a highly experimental, but extraordinarily powerful, new data stream that categorizes all of the news imagery that GDELT monitors from around the world using the Google Cloud Vision API. The new GDELT Visual Global Knowledge Graph (VGKG) Version 1.0 Alpha, powered by the Google Cloud Vision API, extends GDELT’s ability to understand global news media, allowing it for the first time to make sense of the vast stream of visual narrative that accompanies the world’s news.
Each day GDELT monitors between half a million and a million images from across the planet, capturing nearly every event and topic imaginable from almost every corner of the earth. These images offer a vivid and rich visual tapestry of global events and daily life across the planet, reaching far beyond what textual narrative alone can offer. Today, powered by the Google Cloud Vision API, GDELT is able to apply some of the most sophisticated deep learning neural network algorithms in the world to actually make sense of the imagery of the world’s news just like a human.
Each image undergoes a range of incredibly sophisticated analyses that catalog the actual topical focus of the image (tagging an image with the kinds of objects, activities, and backgrounds it contains), sentiment mine its emotions (whether people in the image are happy, sad, angry, or joyful), recognize street signs, labels, and other text, identify famous locations, and even flag violent imagery. The topical algorithms in particular we think will entirely transform how we understand the imagery of the world’s news media, reaching deeply into how world events are portrayed and understood. We are especially excited about the potential of such algorithms to rapidly triage the earliest imagery emerging from disaster zones or conflict areas, allowing human first responders to assess rapidly changing environments.
CAVEATS & DISCLAIMER: ALPHA RELEASE
Unlike the other text-based GDELT feeds, please note that this feed is extremely experimental and is being released as an alpha release. This means that the behavior and supported features of the feed may change at any moment, or it may go away entirely. If ingesting these feeds into an automated workflow, your scripts should perform extensive error checking to ensure that they are able to cope with any changes to the underlying data format. Please check back here on the GDELT Blog on a regular basis for any updates or documentation changes to the format and its behavior.
The use of deep learning algorithms for image recognition is still a highly experimental area of active research and the application use case presented by GDELT (attempting to recognize and catalog arbitrary images from almost every corner of the planet on almost every topic imaginable at realtime speed) represents one of the most difficult and wide-ranging applications of such technology today.
What this means is that you will almost certainly encounter a certain level of error in the categorizations and other information computed about each image. Remember that all tags are applied 100% automatically with NO HUMAN INTERVENTION and mistaken categorizations or other tags represent computer algorithm errors, NOT editorial statements. Remember that computer image recognition at these scales is still in its relative infancy and the underlying algorithms are encountering large amounts of imagery utterly unlike anything they’ve ever seen before, so this data stream is really pushing the boundaries of current deep learning recognition and will make mistakes. Please email cloudvision-feedback@google.com with the image URL and mistaken metadata fields if you find any particularly significant errors so that the system can be constantly improved and refined.
ACCESSING THE DATA
There are two primary mechanisms for accessing the VGKG data stream: CSV files and Google BigQuery, both of which are updated every 15 minutes:
- Google BigQuery. Similar to the main GKG table, we also populate a publically accessible table housed in Google BigQuery to make it possible to interactively query and analyze the computed metadata and join it against the main GKG and EVENT tables. The table is gdelt-bq:gdeltv2.cloudvision.
- Raw CSV Files. Similar to the GKG files, you can access the VGKG data stream via simple CSV files updated every 15 minutes. These files are tab-delimited and gzip compressed. The latest CSV files are released somewhere between 0-5 minutes, 15-20 minutes, 30-35 minutes, and 40-45 minutes after the hour (about 5 minutes after the GKG file is released for each 15 minute increment). To determine when the latest VGKG files are available, check the contents http://data.gdeltproject.org/gdeltv2_cloudvision/lastupdate.txt, which is updated when the latest update files become available. You could poll this file every 5 minutes to ensure you always download the latest files.
READ THE VGKG DOCUMENTATION