The GDELT Project

Using Google's Deep Learning To Model Visual Portrayal In The News

With the debut of the GDELT Visual Global Knowledge Graph (VGKG), which uses Google's Cloud Vision API deep learning algorithms to catalog global news imagery, we've been immensely excited about the ways this incredible technology can be used to catalog and understand global visual narratives. This time, Felipe Hoffa came up with an incredible way of combining the GKG and VGKG's together into a single query that lets you specify two GKG queries and get back a list of the visual topics that appear more commonly in images appearing in articles that match the first query than those that match the second query.

Applying this query to Donald Trump and Hillary Clinton, we get the following query, which will output a list of tags and a sample image and article for each, that show the topics that appear more commonly in images found in articles mentioning Donald Trump than in images appearing in articles mentioning Hillary Clinton.

SELECT label, COUNT(*) c, 
  FIRST(ImageURL) ImageURL, FIRST(DocumentIdentifier) DocumentIdentifier,
FROM (
FLATTEN((
SELECT (REGEXP_EXTRACT(SPLIT(Labels,''), r'([^<]*)')) label, ImageURL, DocumentIdentifier
FROM (
  SELECT FIRST(DocumentIdentifier) DocumentIdentifier, ImageURL, FIRST(Labels) Labels
  FROM (
    SELECT * 
    FROM [gdelt-bq:gdeltv2.cloudvision@-259200000-] # last 3 days
    WHERE Labels IS NOT null
  )
  WHERE DocumentIdentifier IN (
    SELECT DocumentIdentifier 
    FROM [gdelt-bq:gdeltv2.gkg@-259200000-] # last 3 days
    WHERE LOWER(AllNames) LIKE '%donald trump%trump%trump%'
  )
  GROUP BY ImageURL
)), label))
WHERE label NOT IN (SELECT label FROM (
  SELECT label, COUNT(*) c
  FROM (
  SELECT (REGEXP_EXTRACT(SPLIT(Labels,''), r'([^<]*)')) label, ImageURL, DocumentIdentifier 
  FROM (
    SELECT FIRST(DocumentIdentifier) DocumentIdentifier, ImageURL, FIRST(Labels) Labels
    FROM (
      SELECT * 
      FROM [gdelt-bq:gdeltv2.cloudvision@-259200000-] # last 3 days
      WHERE Labels IS NOT null
    )
    WHERE DocumentIdentifier IN (
      SELECT DocumentIdentifier 
      FROM [gdelt-bq:gdeltv2.gkg@-259200000-] # last 3 days
      WHERE LOWER(AllNames) LIKE '%hillary clinton%clinton%clinton%'
    )
    GROUP BY ImageURL
  ))
  GROUP BY 1
  ORDER BY 2 DESC
  LIMIT 20


))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100

Subjects like the American flag, the military, crowds and stages, his trademark jet and powerful presidential imagery tends to dominate photographs appearing in Trump coverage.

Perhaps most fascinatingly, running this query in the middle of this past May to compare Bernie Sanders versus Hillary Clinton showed that the imagery of Sanders coverage tended to depict large arenas filled to capacity with cheering crowds while Clinton imagery tended to cover an enormous range of topics dealing with current events. In short, the visual narrative surrounding Bernie focused on the size of the energetic crowds he was drawing, while the narrative surrounding Clinton was on her reaction and connection to global events. One might draw from this a conclusion that Clinton was being covered as a potential head of state, while Bernie was being covered as a cultural phenomenon worthy of attention solely because of the size of the crowds he was drawing.