Last week we demonstrated searching an entire year of Belarusian, Russian and Ukrainian television news for appearances of current and former heads of state Donald Trump, Joe Biden, Vladimir Putin and Volodymyr Zelenskyy and one of the most-excerpted American journalists on Russian television, Tucker Carlson. Our example workflow for searching this massive database requires installing the facial embedding server on a VM and running it locally to generate the signatures of the known faces to search for. Since this is the most complex part of the search workflow, we're releasing today the first few entries in what will eventually be a global current and former "heads of state" and other major notables database of precomputed embeddings that will make it easy to search Russian television news. This will include both leaders and major public figures that appear commonly on Russian television. Rather than having to compute the embeddings yourself by installing and running the embedding server, we will be precomputing the embeddings of these major figures so that you can simply download the embedding JSON files directly to the "KNOWNFACES" directory to run searches.
To start, we've included the precomputed embeddings we used for last week's experiments: Biden, Carlson, Putin, Trump and Zelenskyy. The modified complete workflow now to search Russia 24 for all appearances of these major public figures and incorporating our new channel inventory is:
#download prerequisites apt-get -y install parallel apt-get -y install jq pip3 install argparse pip3 install numpy #create the directories mkdir /dev/shm/EMBED mkdir /dev/shm/EMBED/SEARCH cd /dev/shm/EMBED/SEARCH mkdir KNOWNFACES #download the precomputed embeddings: cd /dev/shm/EMBED/SEARCH/KNOWNFACES wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/DonaldTrump.json wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/JoeBiden.json wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/TuckerCarlson.json wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/VladimirPutin.json wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/faceembeddb/VolodymyrZelenskyy.json #download the complete channel inventory for Russia 24 and download all of its per-broadcast embedding files: start=$(curl -s https://api.gdeltproject.org/api/v2/tvv/tvv?mode=chaninv | jq -r '.channels[] | select(.id=="RUSSIA24") | .startDate'); end=20230521; while [[ ! $start > $end ]]; do echo $start; start=$(date -d "$start + 1 day" "+%Y%m%d"); done > DATES rm -rf JSON; mkdir JSON time cat DATES | parallel --eta 'wget -q https://storage.googleapis.com/data.gdeltproject.org/gdeltv3/iatv/visualexplorer/RUSSIA1.{}.inventory.json -P ./JSON/' rm IDS; find ./JSON/ -depth -name '*.json' | parallel --eta 'cat {} | jq -r .shows[].id >> IDS' rm -rf EMBEDDINGS; mkdir EMBEDDINGS time cat IDS | parallel --eta 'curl -s https://storage.googleapis.com/data.gdeltproject.org/gdeltv3/iatv/visualexplorer_lenses/{}.faceembed.json -o ./EMBEDDINGS/{}.faceembed.json' wc -l EMBEDDINGS/* | tail -1 > RUSSIA1.CNT #and finally perform the actual search: wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/search_faceembeddings.py chmod 755 ./search_faceembeddings.py time ./search_faceembeddings.py --knownfaces ./KNOWNFACES/ --searchfaces ./EMBEDDINGS/ --threshold 0.52 --outfile ./RUSSIA24-MATCHES.json
We hope this helps jumpstart new forms of investigative reporting into the narratives and storytelling structures of Russian television news.