We're excited to announce that with the release of the new version of the Visual Global Knowledge Graph (VGKG) in the next few weeks, we will be adding three perceptual hashing algorithms, enabling for the first time image similarity search using the VGKG! Each of the three algorithms generates a 64-bit hash of the image's content using a different set of features, resulting in three very different characterizations of its visual content. A Hamming Distance search can be applied on each of the dimensions (or combinations therein) to perform visual similarity scans along the different dimensions.
The three algorithms are described in technical detail here and here. The implementation in PERL's Image::Hash library is used, with GD as the image conversion library. In brief the three are:
- aHash (Average Hash). Collapses the image to an 8×8 grayscale image and sets each bit of the hash based on whether each pixel is above the mean color.
- pHash (Perceptive Hash). Similar to the aHash, but uses discrete cosine transforms (DCTs) and compares based on frequency rather than color.
- dHash (Difference Hash). Similar to aHash, but records gradient change instead.
The number of grayscale levels in the resized and converted image is also reported. Finally, the three hashes are concatenated into a single hexadecimal string and all 0's and F's are removed, with the remaining characters counted and returned as the number of "significant values." A highly visually complex image with vibrant rich colors will typically have few 0's and F's (255's), yielding a value of 48 in this field. The closer to 0, the more simplistic the image is or the image is so dark or so light that it is difficult to discern fine detail.
Internally, the VGKG system will begin discarding images with less than 15 significant values and a total filesize of less than 100K. Extensive testing over the past month shows around 99% of these images are simple line graphs, solitary logos, solid color placeholders or other images that the Cloud Vision API is not able to yield useful information about and thus are not worth sending to the service, allowing us to devote all available resources towards images containing recognizable objects and actions.
This new data will be available in a new JSON block appended to the Cloud Vision JSON and EXIF JSON blocks. It will also include general image attributes previously not included in the JSON block, including the image's height and width in pixels, its final resolved URL (if different from the original URL, such as if it was URl shortened), language hints and filesize in bytes.
We are immensely excited to see what you are able to do with this new capability! We are actively exploring adding other perceptual hashes to the VGKG, so reach out if there are other key hashes you'd like to see us add!