The GDELT Project

Google BigQuery + Visual GKG: Sample Queries

Now that the new Visual Global Knowledge Graph, powered by Google Cloud Vision API, is available in Google's BigQuery platform, we wanted to put out a quick guide to some basic queries to help you get started using it! To expeirment with the queries below, use the "gdelt-bq:gdeltv2.cloudvision" table in BigQuery.

BASIC QUERIES

The simplest of queries just searches for a particular keyword in the Labels field. This lets you search for images depicting a particular topic, object, or activity. The query below searches for all images of crowds, returning the most recent images first:

SELECT DocumentIdentifier, ImageURL, Labels FROM [gdelt-bq:gdeltv2.cloudvision] where Labels like '%crowd%' order by DATE desc

Or to search for images with any of the labels "litter," "trash," or "garbage":

SELECT DocumentIdentifier, ImageURL, Labels FROM [gdeltv2.cloudvision] where (Labels like '%litter%' or Labels like '%trash%' or Labels like '%garbage%')

Here's a simple query that returns the most recent images where the API believes it recognized the location where the image was taken:

SELECT DocumentIdentifier, ImageURL, GeoLandmarks FROM [gdelt-bq:gdeltv2.cloudvision] where GeoLandmarks is not null order by DATE desc

Here's a simple query that returns the most recent images containing the Starbucks logo:

SELECT DocumentIdentifier, ImageURL FROM [gdelt-bq:gdeltv2.cloudvision] where Logos like '%Starbucks%' order by DATE desc

Here's a query that returns a histogram of the most common Labels found across all of the images processed to date:

select Label, count(1) as cnt from (
 SELECT REGEXP_REPLACE(SPLIT(Labels, '<RECORD>'), r'<FIELD>.*', '') Label FROM [gdelt-bq:gdeltv2.cloudvision] where Labels is not null
 ) group by Label order by cnt desc

This query uses the SafeSearch flags to return imagery deemed "Very Likely" to contain violence or destruction (WARNING: some of these images may be extremely graphic):

SELECT DocumentIdentifier, ImageURL, SafeSearch FROM [gdelt-bq:gdeltv2.cloudvision] where SafeSearch like '1<FIELD>%' or SafeSearch like '2<FIELD>%' order by DATE desc

This query generates a co-occurrence graph of which labels have appeared most commonly together over the past 24 hours using table decorators.

SELECT c.Source Source, c.Target Target, Count RawCount, "Undirected" Type, RATIO_TO_REPORT(Count) OVER () Weight FROM (
SELECT a.name Source, b.name Target, COUNT(*) as Count
FROM (FLATTEN(
SELECT ImageURL, REGEXP_EXTRACT(SPLIT(Labels,'<RECORD>'),r'^(.*?)<FIELD>') name
FROM [gdelt-bq:gdeltv2.cloudvision@-86400000-] 
WHERE Labels is not null
,name)) a
JOIN EACH (
SELECT ImageURL, REGEXP_EXTRACT(SPLIT(Labels,'<RECORD>'),r'^(.*?)<FIELD>') name
FROM [gdelt-bq:gdeltv2.cloudvision@-86400000-] 
where Labels is not null
) b
ON a.ImageURL=b.ImageURL
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 1000000
) c having Count > 100
order by Count Desc

This query uses regular expressions to output a list of the four recognized emotional states for each face found in a given image. If an image contains multiple faces, it will output one row per face found. Only faces expressing one of the identified emotions will be output:

Select DocumentIdentifier, ImageURL, sorrow, anger, joy, surprise from (
SELECT DocumentIdentifier, ImageURL,
INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?$')) sorrow,
INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?$')) anger,
INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?<FIELD>.*?$')) joy,
INTEGER(REGEXP_EXTRACT(SPLIT(Faces,'<RECORD>'), r'^.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>.*?<FIELD>(.*?)<FIELD>.*?<FIELD>.*?$')) surprise,
FROM [gdelt-bq:gdeltv2.cloudvision] where Faces is not null order by DATE desc
) where sorrow > 0 or anger > 0 or joy > 0 or surprise > 0

 

COMBINING THE VGKG AND GKG

One common task is to combine the VGKG and GKG in a single query, to search, for example, for images of fish or fishing that appear in articles about poaching, rather than just any image about fishing. Since the VGKG and GKG are both housed in BigQuery, accomplishing such a query is trivial:

SELECT max(DocumentIdentifier), ImageURL, max(Labels) FROM [gdeltv2.cloudvision] where Labels like '%fish%' and DocumentIdentifier in  (select DocumentIdentifier from [gdeltv2.gkg] where DATE > 20151220000000 and V2Themes like '%POACH%POACH%') group by ImageURL

 

Hopefully these queries have given you a quick sense of how to get started using the VGKG in BigQuery!