Visualizing The Entity Graph Of The New 5.5-Million Protest Images Annotations Dataset

Yesterday we released a massive 5.5-million image global protest images dataset, extracted from the Visual Global Knowledge Graph (VGKG). Among the myriad fields contained in that dataset are the Web Entity annotations for each image, which record the entities, activities and topics most closely associated with the image in the textual captions used to describe it across the web.

These annotations are especially powerful for their ability to transcend the visual dimensions of an image to capture how it is described in the captions assigned to it, offering a rich understanding of the details behind the image.

As with our 3.8-billion-edge full-VGKG visualizations, what would it look like to visualize the co-occurrences of these entities, especially given their narrow domain of protest?

In all, across the entire 5.5-million-image protest images dataset, there were 316,262,171 total entity co-occurrences yielding 35,955,741 distinct pairings.

The complete graph in Gephi's CSV edge format is available for download:

What might it look like to visualize this dataset?

The image below shows the 1,500 most commonly co-occurring entity pairs in captions of protest imagery in the dataset.

You can download the visualization files for this graph below:

Unsurprisingly, at the center of the graph are two of the terms used to construct the dataset in the first place: "protest" and "demonstration." What if both terms, along with the third filter term "rebellion" were removed from the graph, eliminating that central cluster? The graph below shows the same co-occurrence graph with all edges to or from these three nodes removed.

Perhaps most obvious in the graph below is that the Catalonia independence protests appear as an isolated cluster at the top right, largely separate from the rest of the protest topics with the exception of Brexit, which appears just below Catalonia in the same cluster. It appears that both Catalonia and Brexit appear as isolated protest stories, though connected to one another in contexualization.

Latin American protests appear in the blue cluster at the far right of the core, while French protests appear in green towards the top of the core. European protests appear between them. Nationalistic protests appear towards the center, near immigration-related protests. At far left is a cluster that includes Extinction Rebellion and climate -protests, grouped together with education, UAW and labor protests. To their right are LGBTQ-related protests. A tan core beneath them focuses largely on the characteristics of protests as a whole.

You can download the visualization files for this graph below:

We're tremendously excited to see what kinds of new insights you can gather from these graphs and the new protest image dataset!

TECHNICAL DETAILS

To construct the graphs above, the JSON edition of the protest image annotations dataset was loaded into a temporary BigQuery table and the following query below used, adapted from Felipe Hoffa's BigQuery graph code and json2array function.

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";

SELECT Source, Target, Count RawCount, "Undirected" Type FROM (
SELECT a.entity Source, b.entity Target, COUNT(*) as Count
FROM (

WITH nested AS (
SELECT DocumentIdentifier url, json2array(JSON_EXTRACT(RawJSON, "$.responses[0].webDetection.webEntities")) entities FROM `TEMPTABLE` WHERE
JSON_EXTRACT(RawJSON, "$.responses[0].webDetection.webEntities") is not null
) select url, JSON_EXTRACT_SCALAR(entity,'$.description') entity from nested, unnest(entities) entity

) a
JOIN (

WITH nested AS (
SELECT DocumentIdentifier url, json2array(JSON_EXTRACT(RawJSON, "$.responses[0].webDetection.webEntities")) entities FROM `TEMPTABLE` WHERE
JSON_EXTRACT(RawJSON, "$.responses[0].webDetection.webEntities") is not null
) select url, JSON_EXTRACT_SCALAR(entity,'$.description') entity from nested, unnest(entities) entity

) b
ON a.url=b.url
WHERE a.entity<b.entity
GROUP BY 1,2
)