The GDELT Project

VGKG 2.0 Metadata Stats At 2 Weeks

In the two weeks since the new Visual Knowledge Graph 2.0 (VGKG 2.0) debuted, it has been incredibly exciting to watch the statistics of what it is able to compute about the world's images. In particular, extracting all of the embedded metadata fields hidden in each image file is offering a fascinating enhanced look at the visual narratives of the world's news.

Looking at a 48 hour period last week (October 13-14, 2016), more than 5,000 unique metadata fields were found in the 1.35 million unique images processed by the VGKG. (Note that the top 17 fields with 90-100% penetration are automatically generated by the underlying Image::ExifTool library, but all other fields are extracted from the image metadata). Certain other fields with very high penetration are due to widely utilized web libraries, such as PHP's imagejpeg adding a "Comment" field containing "CREATOR: gd-jpeg" (found in 333,000 of the images or around 25% of the two day total).

In all, around 10% of news images found online have some form of organic embedded metadata, with 4.6% having a "Caption" field, 3% having a "Location" field and 2.2% having a "Keywords" field. While this is a relatively small portion of all news images, it still amounts to thousands of images per day with rich human-generated metadata. There are also countless rarer but fascinating fields like "CameraTemperature" that records the physical temperature of the camera at the time it took the photograph, which could be used to offer context of just how hot or cold it was when the image was taken and a sort of climatic sensor network over time (albeit at very small scale).

The complete histogram of all 5,092 distinct metadata fields extracted from the 1.35 million images processed by GDELT on October 13th and 14th, 2016 are available as a tab-delimited file to allow you to identify fields of interest for your own analyses! We've also extracted just the metadata JSON for each image and made it available in a tab-delimited file below, with the format IMAGEURL, ARTICLEURL, JSON. This allows you to see the kinds of contents each field holds and offers a glimpse of the metadata hidden in the world's news imagery circa-2016.