Last week we showcased how to create a full-fledged massively parallel and scalable face detection and embedding API with just a few dozen lines of Python that accepts an image, identifies all of the human faces within, and returns a list of each face and a 128-dimension vector representation that allows for fast facial search at scale. To demonstrate it in action, we applied it to two weeks of Russian television channel Russia 1, processing 1 million seconds of airtime across 245 broadcasts in just over 1 hour on a 64-core VM. Today we'll show how to use those embeddings to perform fast scalable facial search across Russian television.
In a production application, you would pipe the embeddings from the API server directly into an ANN (Approximate Nearest Neighbor) indexing service like Vertex AI Matching Engine or Elasticsearch ANN. Using ANN indexing, you could create an interactive interface that allows true subsecond realtime facial search: just upload an image and within a fraction of a second it will return all of the video clips containing any of the faces from the uploaded image. What if you just want to run a search periodically or you want to perform continual scanning, such as monitoring Russian television for appearances of Zelenskyy 24/7? Here we'll show how to use a trivial Python script to perform at-scale search using the embeddings produced by the API by searching two weeks of Russia 1 coverage totaling 214,232 human faces detected across 1 million seconds of airtime for Biden, Putin and Zelenskyy in just 23 seconds.
First, complete the API demo, both installing the server and using it to compute the embeddings for two weeks of Russia 1. We'll assume that you have saved all of the embeddings to "./EMBEDDINGS/".
First, we'll download a library of faces that we want to search for into a subdirectory called "KNOWNFACES":
mkdir KNOWNFACES wget https://upload.wikimedia.org/wikipedia/commons/9/9c/Volodymyr_Zelensky_Official_portrait.jpg mv Volodymyr_Zelensky_Official_portrait.jpg KNOWNFACES/VolodymyrZelenskyy.jpg wget https://www.whitehouse.gov/wp-content/uploads/2021/04/P20210303AS-1901-cropped.jpg mv P20210303AS-1901-cropped.jpg KNOWNFACES/JoeBiden.jpg wget http://static.kremlin.ru/media/events/press-photos/orig/41d3e9385e34ebc0e3ba.jpeg mv 41d3e9385e34ebc0e3ba.jpeg KNOWNFACES/VladimirPutin.jpg
You can search for as many faces as you want – just copy them all into this directory as ".jpg" images and name each with the name that you want to appear in the search results. Note, do not include spaces in the names (you can use underscores or dashes as needed). Make sure, however, that there is only one face in each image – if there are multiple faces you may get an error and/or the recognition process will yield unpredictable results. If the only photograph you have has multiple faces, simply crop it until only the desired face is visible and save that image to the KNOWNFACES directory.
Now, compute the embeddings for all of the known faces (this assumes the API service is running on your computer on port 8088:
time find ./KNOWNFACES/ -depth -name "*.jpg" | parallel --eta "curl -s -f -X POST http://localhost:8088/faceembed -H 'Content-Type: application/json' -d '{\"id\": \"{/.}\", \"file\":\"{}\"}' >> {.}.json"
That will create a JSON file for each known face that contains its embedding. If you update any of the images or add any faces, just rerun the command above each time you add or change the images. If you delete an image, make sure to delete its corresponding ".json" file in the KNOWNFACES directory as well.
Now its time to search our embedding database for those known faces! The script here uses brute-force searching, where it takes each face in the EMBEDDINGS directory and compares it against each of the known faces by comparing their embedding vectors. In this case, it will compare all 214,232 faces extracted from those two weeks of Russia 1 against all three known faces, performing 642,696 total face comparisons in just 23 seconds on a single processor, showing how efficient even brute-force search is!
Install these two dependencies:
pip3 install argparse pip3 install numpy
And download the search script:
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/search_faceembeddings.py chmod 755 ./search_faceembeddings.py
Then just run it via:
time ./search_faceembeddings.py --knownfaces ./KNOWNFACES/ --searchfaces ./EMBEDDINGS/ --threshold 0.52 --outfile ./matches.json
That's literally all there is to it! The "knownfaces" parameter tells it the directory to find the JSON files containing the embeddings of the known faces to search for. The "searchfaces" parameter tells it the directory to find the embeddings you want to search. The "threshold" tells it how similar the face must be to match. The value ranges from 0.0 to 1.0, with lower values requiring a closer match. In practice we've found that 0.52 is a good value to use, but to reduce false negatives, you can set this higher and then just post-filter the results (we'll show this in a moment). Finally, "outfile" tells it where to write the matches.
The search above will take just 23 seconds on a single processor, showing how fast and efficient this search is. You could trivially scale this up to search years of coverage in just tens of seconds on a larger multi-CPU machine, even without the benefit of ANN indexed search.
The "matches.json" file contains all of the matches that were as or more similar than "threshold". Note that frames will not be in order due to our API demo computing all of the embeddings in parallel, so to look for contiguous clips you'll need to sort the results below. Also note that even in a contiguous appearance of a person, they might turn to the side or away from the camera for a few seconds, so for a production application where you look for sequences of frames, you'll want to allow for gaps of a few frames in a given sequence.
You can see some sample lines below to see what the output looks like. Each match appears on its own line. If multiple known faces are seen in a single frame, they will each appear on their own line. The "face" contains the filename of the known face, while "id" is the ID from the original embedding file and "dist" is the distance (lower numbers mean more similar) of this face from that known face. The "threshold" value in the search script sets the maximum upper bound (distance scores above that will not be output), but to reduce false negatives, you can set the threshold higher and then post-process the matches.json file to use a smaller threshold value.
{"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-000064", "dist": 0.47301553113218947} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000261", "dist": 0.5183728678106501} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000254", "dist": 0.418819051953452} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000253", "dist": 0.496182468426782} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000330", "dist": 0.4855730285995884} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000279", "dist": 0.5142073593669766} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000353", "dist": 0.48817007520391825} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000280", "dist": 0.5023710334959888} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000331", "dist": 0.43101854885473184} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000329", "dist": 0.4842155185114112} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-000337", "dist": 0.48676776215012996} {"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-000990", "dist": 0.4731232430441676} {"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-000997", "dist": 0.5144504609005843} {"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-000998", "dist": 0.5175637021561289} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-001079", "dist": 0.49576076736872315} {"face": "JoeBiden", "id": "RUSSIA1_20230413_143000_60_minut-001077", "dist": 0.4326693897508657} {"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-001598", "dist": 0.5021093566913623} {"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-001597", "dist": 0.3826476540439994} {"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-001596", "dist": 0.50220851831088} {"face": "VolodymyrZelenskyy", "id": "RUSSIA1_20230413_143000_60_minut-001668", "dist": 0.4540704015138464} {"face": "VladimirPutin", "id": "RUSSIA1_20230407_130000_Vesti-000092", "dist": 0.48348717602394026} {"face": "VladimirPutin", "id": "RUSSIA1_20230407_130000_Vesti-000077", "dist": 0.46442282337637186} {"face": "VladimirPutin", "id": "RUSSIA1_20230407_130000_Vesti-000096", "dist": 0.13760758971638978} {"face": "VladimirPutin", "id": "RUSSIA1_20230407_130000_Vesti-000090", "dist": 0.48544144422662633} {"face": "VladimirPutin", "id": "RUSSIA1_20230407_083000_60_minut-001512", "dist": 0.4870403480000896} {"face": "VladimirPutin", "id": "RUSSIA1_20230405_115500_Kto_protiv-000752", "dist": 0.5094790478068193} {"face": "VladimirPutin", "id": "RUSSIA1_20230405_083000_60_minut-000283", "dist": 0.46563555236676013} {"face": "VladimirPutin", "id": "RUSSIA1_20230405_083000_60_minut-000288", "dist": 0.4723670707861306} {"face": "VladimirPutin", "id": "RUSSIA1_20230405_083000_60_minut-000285", "dist": 0.5183854914753822} {"face": "VladimirPutin", "id": "RUSSIA1_20230405_083000_60_minut-000367", "dist": 0.4998221341274246} {"face": "VladimirPutin", "id": "RUSSIA1_20230405_083000_60_minut-000287", "dist": 0.4869906394235959}
In all, the search finds 1,231 total matches, with 605 frames containing Putin, 335 containing Biden and 291 containing Zelenskyy.
That's all there is to it! Again, a production application would use an ANN indexing service to allow for true realtime search, but if you only need to search periodically, you can use this brute-force script. You could also use the matching code from this script and pair it with the API to build a realtime 24/7 scanning system that uses "yt-dlp" or a similar tool to ingest a streaming video in realtime, pipe it into ffmpeg to generate a sequence of frames, analyze each frame through the API service and then feed the resulting embeddings through this matching code to search for known faces, yielding a true realtime search service!