Cataloging A Week And A Half Of Tucker Carlson On Russian TV News Using Face Detection & Similarity Matching

Yesterday we demonstrated how off-the-shelf face detection and similarity matching could be used to catalog all of the excerpted Tucker Carlson clips in a single Russian television news broadcast. How might we scale this up to scanning an entire week and a half for excerpts from his show? Importantly, as an extremely high profile American public figure whose broadcasts are being used by Russian state media to advance its narratives on its invasion of Ukraine, there is compelling public interest in better understanding how external narratives are being woven into the official Russian narration of the war.

The Visual Explorer works by extracting one frame every four seconds from each broadcast, converting it into a fixed-interval thumbnail grid that captures the narrative visual arc of each broadcast. The full-resolution versions of the images from the thumbnail grid are available as a downloadable ZIP file for each broadcast, creating "ngrams for video" that enable at-scale non-consumptive computational visual analysis of television news.

Using the off-the-shelf open "facedetect" tool, we can rapidly annotate a television news broadcast by cataloging all of the places where there are visible human faces. That same tool has an additional feature that allows a known face to be compared against all of the detected faces, with faces that are above a certain threshold of similarity flagged. In this case, rather than true facial recognition, which utilizes sophisticated facial landmark analysis, this is a more rudimentary image similarity assessment on the cropped facial regions. It also does not associate a face with an identity – it is simply an example of "find more like this" similarity matching like a reverse image search. Yet, despite its more basic approach, it offers the ability to scan a broadcast for a specific face and catalog where it appears with remarkable accuracy.

How might be apply the single-broadcast workflow from yesterday to an entire week and a half of Russian television news?

First, we'll need to install three libraries: facedetect for face detection, GNU parallel to run our processing in parallel across a single machine or cluster of machines and jq to extract selected fields from JSON:

apt-get -y install facedetect
apt-get -y install parallel
apt-get -y install jq

Now we'll compile the list of dates we wish to analyze. In this case we are going to examine from October 1st to early the morning of October 11th:

#make date list...
start=20221001; end=20221011; while [[ ! $start > $end ]]; do echo $start; start=$(date -d "$start + 1 day" "+%Y%m%d"); done > DATES

Now we'll use that list of dates to fetch all of the JSON inventory files from the Visual Explorer for those days, which will allow us to compile the list of IDs for all of the broadcasts from those channels for those days:

#get the inventory files
mkdir JSON
time cat DATES | parallel --eta 'wget -q{}.inventory.json -P ./JSON/'
time cat DATES | parallel --eta 'wget -q{}.inventory.json -P ./JSON/'
time cat DATES | parallel --eta 'wget -q{}.inventory.json -P ./JSON/'
time cat DATES | parallel --eta 'wget -q{}.inventory.json -P ./JSON/'
rm IDS; find ./JSON/ -depth -name '*.json' | parallel --eta 'cat {} | jq -r .shows[].id >> IDS'
wc -l IDS

In all, there are 1,121 broadcasts across those 4 channels over those 11 days. Now we'll take that list of IDs and download all of the ZIP files for them and unpack them to their underlying JPG files. Note that you may wish to use a RAM or SSD disk for this, since this will yield a vast number of small files:

#download and unpack them all...
mkdir IMAGES
time cat IDS | parallel --eta 'wget -q{}.zip -P ./IMAGES/'
time find ./IMAGES/ -depth -name '*.zip' | parallel --eta 'unzip -n -q -d ./IMAGES/ {} && rm {}'
time find ./IMAGES/ -depth -name "*.jpg" | wc -l

In the end, this yields 927,588 frames that we will need to scan. We download a sample Tucker Carlson screen grab from the web and use facedetect to compare against all 927,588 frames individually, using GNU parallel to run across all processors on our VM:

#and scan them all for carlson faces
time find ./IMAGES/ -depth -name "*.jpg" | parallel --resume --joblog ./PAR.LOG --eta 'facedetect -q --search-threshold 40 -s ./2022-tve-facedetect-scan-tuckerface.png {} && echo {} >> MATCHING.TXT.tmp'

On our 60-core VM this took 10 hours and 40 minutes using a RAM disk to remove IO as a limiting factor.

You can see the final results:

In all, this yielded 319 frames (remember that each represents 4 seconds of airtime) containing a match for Tucker Carlson across the 927,588 examined frames, which works out to as much as 21 minutes of airtime – around 0.03% of the total airtime across these four Russian channels over a week and a half! In reality, since broadcasts are sampled every 4 seconds, each of those 319 frames does not necessarily represent 4 full seconds of Tucker Carlson (since he might have been featured only for 1-3 seconds of that 4 second interval), but this at least offers a rough estimate of his screen time. There may also be false positives that reduce this total. In all, there were 75 distinct clips across 55 broadcasts.

To collapse this per-frame list into the per-clip list seen at the bottom of this post, we used the following Perl script:


use POSIX qw(mktime);

#get our conversion to UTC (in case the local system has a non-UTC timezone set)...
$tz = (localtime time)[8] * 60 - mktime(gmtime 0) / 60; $tzhour = $tz / 60; $tzmin = abs($tz) % 60;

open(FILE, $ARGV[0]);
while() {
    ($ID, $FRAME) = $_=~/.*\/(.*?)\-(\d+)\.jpg/;
    if ($ID ne $LASTID || ($FRAME - $LASTFRAME) > 4) {
    	($CHAN) = $ID=~/^(.*?)_\d\d\d\d/;
	($year, $mon, $day, $hour, $min, $sec) = $ID=~/(\d\d\d\d)(\d\d)(\d\d)_(\d\d)(\d\d)(\d\d)/;
	$timestamp = mktime( $sec, $min, $hour, $day, $mon-1, $year-1900 ) + ( ($FRAME - 1) * 4);
	$timestamp += ($tzhour * 60 * 60) - ($tzmin * 60);
	($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($timestamp); $year+=1900; $mon++; $day+=0; $hour+=0;
	print "<li><a href=\"$ID&play=$timestamp\">$CHAN: $mon/$day/$year $hour:$min:$sec UTC (Frame: $FRAME)</A></li>\n";
    $LASTID = $ID;

$cnt = scalar(keys %UNIQ_IDS); print "Found $cnt IDs...\n";

Most importantly, the facedetect tool only performs basic image similarity comparison, not the full-fledged facial landmark comparison that a true facial recognition system would perform, so it will only detect cases where Tucker Carlson is facing primarily directly towards the camera in a studio environment, as he does for the majority of the airtime of his show. A more robust analysis would likely use more sophisticated tools to more precisely inventory his appearances and capture instances where his head is tilted or occluded or where he appears outside of a studio environment, such as leading a rally or being interviewed on other shows. There may also be false positive matches, given the simplicity of the matching algorithm used here.

In the end, we've demonstrated that using the Visual Explorer preview images with the off-the-shelf "facedetect" tool, we can scan a week and a half of Russian television across 4 channels totaling 1,030 hours of airtime in all with a single 60-core CPU-only VM in just under 11 hours! Simply by scheduling this workflow as a cronjob and using the last 72 hours as our DATE list, we could repeat this pipeline every 30 minutes to scan Russian television in near-realtime!

You can see the complete list of clips below:

Of those 75 clips, 17 ended up being false positives: