Image Sharing Across The News: Social Media Site Overlap

Following from our graph of global news coverage image overlap, what would it look like to limit that graph to just social media sites? What if we looked at each news website monitored by GDELT and counted how often its images were found over the past year on Facebook, Reddit, Twitter and YouTube? For example, images from DailyMail and Yahoo News have been frequently found on Twitter, just barely edging out their Facebook appearances, while the India Times' images were found on Twitter just a small number of times more than YouTube. It is important to remember that Google Cloud Vision's ability to search the images of social media sites is largely a factor of those sites' accessibility to crawlers and whether the majority of their content is publicly accessible or limited to private authenticated access by authorized users. Thus, these results are incomplete, but offer a powerful picture of the state of social media overlap.

TECHNICAL DETAILS

Constructing this graph was nearly identical to our broader image overlap graph, with the only change being the removal of the 100-image overlap requirement and a filter to limit edges only to those connecting to the four social sites examined here:

CREATE TEMP FUNCTION json2array_url(json STRING)
RETURNS ARRAY<STRUCT<name STRING,url STRING>>
LANGUAGE js AS """
var obj = JSON.parse(json);
var result = [];
var obj = JSON.parse(json); for(var i in obj) result.push({name: i, url: obj[i].url});
return result;
""";
select PageDomain Source, SimilarImagePageDomain Target, Count, "Undirected" Type, ( Count/SUM(Count) OVER () ) Weight from (
select PageDomain, NET.REG_DOMAIN(rec.url) SimilarImagePageDomain, Count(1) Count from (
SELECT NET.REG_DOMAIN( DocumentIdentifier) PageDomain, json2array_url(JSON_EXTRACT(RawJSON, "$.responses[0].webDetection.pagesWithMatchingImages")) recs FROM `gdelt-bq.gdeltv2.cloudvision_partitioned` WHERE DATE(_PARTITIONTIME) >= "2020-01-01" order by date desc
), unnest(recs) rec group by PageDomain, SimilarImagePageDomain having (SimilarImagePageDomain='facebook.com' OR SimilarImagePageDomain='twitter.com' OR SimilarImagePageDomain='reddit.com' OR SimilarImagePageDomain='youtube.com') and PageDomain != SimilarImagePageDomain 
) order by Count desc

We're hopeful this analysis inspires you to think of new ways of understanding the global news image landscape as an interconnected graph!