Now that the new Visual Global Knowledge Graph, powered by Google Cloud Vision API, is available in Google's BigQuery platform, we wanted to put out a quick guide to some basic queries to help you get started using it! To expeirment with the queries below, use the "gdelt-bq:gdeltv2.cloudvision" table in BigQuery.
BASIC QUERIES
The simplest of queries just searches for a particular keyword in the Labels field. This lets you search for images depicting a particular topic, object, or activity. The query below searches for all images of crowds, returning the most recent images first:
SELECT DocumentIdentifier, ImageURL, Labels FROM [gdelt-bq:gdeltv2.cloudvision] where Labels like '%crowd%' order by DATE desc
Or to search for images with any of the labels "litter," "trash," or "garbage":
SELECT DocumentIdentifier, ImageURL, Labels FROM [gdeltv2.cloudvision] where (Labels like '%litter%' or Labels like '%trash%' or Labels like '%garbage%')
Here's a simple query that returns the most recent images where the API believes it recognized the location where the image was taken:
SELECT DocumentIdentifier, ImageURL, GeoLandmarks FROM [gdelt-bq:gdeltv2.cloudvision] where GeoLandmarks is not null order by DATE desc
Here's a simple query that returns the most recent images containing the Starbucks logo:
SELECT DocumentIdentifier, ImageURL FROM [gdelt-bq:gdeltv2.cloudvision] where Logos like '%Starbucks%' order by DATE desc
Here's a query that returns a histogram of the most common Labels found across all of the images processed to date:
select Label, count(1) as cnt from ( SELECT REGEXP_REPLACE(SPLIT(Labels, '<RECORD>'), r'<FIELD>.*', '') Label FROM [gdelt-bq:gdeltv2.cloudvision] where Labels is not null ) group by Label order by cnt desc
This query uses the SafeSearch flags to return imagery deemed "Very Likely" to contain violence or destruction (WARNING: some of these images may be extremely graphic):
SELECT DocumentIdentifier, ImageURL, SafeSearch FROM [gdelt-bq:gdeltv2.cloudvision] where SafeSearch like '1<FIELD>%' or SafeSearch like '2<FIELD>%' order by DATE desc
This query generates a co-occurrence graph of which labels have appeared most commonly together over the past 24 hours using table decorators.
SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM ( SELECT a.name Source, b.name Target, COUNT(*) as Count FROM (FLATTEN( SELECT ImageURL, REGEXP_EXTRACT(SPLIT(Labels,'<RECORD>'),r'^(.*?)<FIELD>') name FROM [gdelt-bq:gdeltv2.cloudvision@-86400000-] WHERE Labels is not null ,name)) a JOIN EACH ( SELECT ImageURL, REGEXP_EXTRACT(SPLIT(Labels,'<RECORD>'),r'^(.*?)<FIELD>') name FROM [gdelt-bq:gdeltv2.cloudvision@-86400000-] where Labels is not null ) b ON a.ImageURL=b.ImageURL WHERE a.name<b.name GROUP EACH BY 1,2 ORDER BY 3 DESC LIMIT 1000000 ) c having Count > 100 order by Count Desc
This query uses regular expressions to output a list of the four recognized emotional states for each face found in a given image. If an image contains multiple faces, it will output one row per face found. Only faces expressing one of the identified emotions will be output:
Select DocumentIdentifier, ImageURL, sorrow, anger, joy, surprise from ( SELECT DocumentIdentifier, ImageURL, INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?$')) sorrow, INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?$')) anger, INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?<FIELD>.*?$')) joy, INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?$')) surprise, FROM [gdelt-bq:gdeltv2.cloudvision] where Faces is not null order by DATE desc ) where sorrow > 0 or anger > 0 or joy > 0 or surprise > 0
COMBINING THE VGKG AND GKG
One common task is to combine the VGKG and GKG in a single query, to search, for example, for images of fish or fishing that appear in articles about poaching, rather than just any image about fishing. Since the VGKG and GKG are both housed in BigQuery, accomplishing such a query is trivial:
SELECT max(DocumentIdentifier), ImageURL, max(Labels) FROM [gdeltv2.cloudvision] where Labels like '%fish%' and DocumentIdentifier in (select DocumentIdentifier from [gdeltv2.gkg] where DATE > 20151220000000 and V2Themes like '%POACH%POACH%') group by ImageURL
Hopefully these queries have given you a quick sense of how to get started using the VGKG in BigQuery!