Visual Explorer: Heads Of State Embedding Database For Searching Russian Television News

Last week we demonstrated searching an entire year of Belarusian, Russian and Ukrainian television news for appearances of current and former heads of state Donald Trump, Joe Biden, Vladimir Putin and Volodymyr Zelenskyy and one of the most-excerpted American journalists on Russian television, Tucker Carlson. Our example workflow for searching this massive database requires installing the facial embedding server on a VM and running it locally to generate the signatures of the known faces to search for. Since this is the most complex part of the search workflow, we're releasing today the first few entries in what will eventually be a global current and former "heads of state" and other major notables database of precomputed embeddings that will make it easy to search Russian television news. This will include both leaders and major public figures that appear commonly on Russian television. Rather than having to compute the embeddings yourself by installing and running the embedding server, we will be precomputing the embeddings of these major figures so that you can simply download the embedding JSON files directly to the "KNOWNFACES" directory to run searches.

To start, we've included the precomputed embeddings we used for last week's experiments: Biden, Carlson, Putin, Trump and Zelenskyy. The modified complete workflow now to search Russia 24 for all appearances of these major public figures and incorporating our new channel inventory is:

#download prerequisites
apt-get -y install parallel
apt-get -y install jq
pip3 install argparse
pip3 install numpy

#create the directories
mkdir /dev/shm/EMBED
mkdir /dev/shm/EMBED/SEARCH
cd /dev/shm/EMBED/SEARCH
mkdir KNOWNFACES

#download the precomputed embeddings:
cd /dev/shm/EMBED/SEARCH/KNOWNFACES
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/DonaldTrump.json
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/JoeBiden.json
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/TuckerCarlson.json
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/VladimirPutin.json
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/VolodymyrZelenskyy.json

#download the complete channel inventory for Russia 24 and download all of its per-broadcast embedding files:
start=$(curl -s https://api.gdeltproject.org/api/v2/tvv/tvv?mode=chaninv | jq -r '.channels[] | select(.id=="RUSSIA24") | .startDate'); end=20230521; while [[ ! $start > $end ]]; do echo $start; start=$(date -d "$start + 1 day" "+%Y%m%d"); done > DATES
rm -rf JSON; mkdir JSON
time cat DATES | parallel --eta 'wget -q https://storage.googleapis.com/data.gdeltproject.org/gdeltv3/iatv/visualexplorer/RUSSIA1.{}.inventory.json -P ./JSON/'
rm IDS; find ./JSON/ -depth -name '*.json' | parallel --eta 'cat {} | jq -r .shows[].id >> IDS'
rm -rf EMBEDDINGS; mkdir EMBEDDINGS
time cat IDS | parallel --eta 'curl -s https://storage.googleapis.com/data.gdeltproject.org/gdeltv3/iatv/visualexplorer_lenses/{}.faceembed.json -o ./EMBEDDINGS/{}.faceembed.json'
wc -l EMBEDDINGS/* | tail -1 > RUSSIA1.CNT

#and finally perform the actual search:
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/search_faceembeddings.py
chmod 755 ./search_faceembeddings.py
time ./search_faceembeddings.py --knownfaces ./KNOWNFACES/ --searchfaces ./EMBEDDINGS/ --threshold 0.52 --outfile ./RUSSIA24-MATCHES.json

We hope this helps jumpstart new forms of investigative reporting into the narratives and storytelling structures of Russian television news.