Visual Explorer: Building An At-Scale Face Detection & Embedding RESTful API

Over the past few weeks we've unveiled a series of experiments exploring face detection and analysis over Russian television news, from tracking a year of Tucker Carlson to building a facial co-occurrence graph of a single broadcast and an entire year of broadcasts to streaming video analysis. To scale these explorations up to much larger volumes of content, we need to be able to keep the models memory-resident and share them across multiple processing pipelines, including realtime interactive use for querying. In other words, we need to wrap them up a fully parallelized and distributed RESTful service and provide generalized face detection and embedding generation. Here we will create precisely such a service in just 90 lines of Python that wraps up the face_recognition library as a RESTful API service and will accept a local disk image, BASE64 encoded image or remote image URL and return a JSON-NL list of all human faces found within, one per line, with the bounding box and 128-dimension embedding vector for each. The underlying embeddings can be used directly to scan for known faces or ingested into any vector search system with ANN (Approximate Nearest Neighbor) support for realtime face search.

For the underlying face detection and embedding service we'll use the face_recognition Python library that wraps dlib's facial analysis functions. While there are existing server wrappers for face_recognition, we wanted to avoid external frameworks in order to strip the server down to its most rudimentary to minimize overhead and maximize long-term stability and shapeability in a production workflow. Note that this service is designed to be deployed behind the application firewall with only trusted applications being able to call to it – it is designed for higher-efficiency higher-throughput workflows serving trusted applications running on the same machine or VPC network.

First, install the dependencies:

apt-get update
apt-get -y upgrade
apt-get update
apt-get -y upgrade
apt-get -y install build-essential
apt-get -y install python3-pip
apt-get -y install cmake
apt-get -y install jq
pip3 install face_recognition
pip3 install requests

Then download the API code:

wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/server_faceembeddings.py
chmod 755 server_faceembeddings.py

To launch the API on port 8088:

./server_faceembeddings.py 8088&

This will take a few seconds and then will print "FaceEmbedding Server Started On Port 8088". Note that this a live server, so you will want to ensure that the firewall settings for the machine running the server restrict access to that port exclusively to localhost or other trusted machines on the VPC.

Now let's analyze the image below, from a Russia 24 broadcast last month. The image contains four recognizable faces (the fifth, in the background, is too small and blurry to be analyzable), so we would expect the server to output four records for this image.

The server allows for three different methods of providing an image for analysis. The simplest is just to give it a URL of an image from the web:

time curl -s -f -X POST http://localhost:8088/faceembed -H 'Content-Type: application/json' -d '{"id": "demo-fourfaces", "url":"https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/RUSSIA24_20230306_123000_RIK_Rossiya_24-000382.jpg"}'

The API's input is a simple JSON object consisting of two fields: a user-defined "id" and the image data ("url" or "file" or "data"). The "id" field is simply passed as-is by the server into the output and can be anything you want. It is provided so that you can simply pipe the API's results directly to a file or database and know which image the output is from (since the filename might not be meaningful for automatically-named temporary images, etc).

The API returns the following output. There are four records, one per identified face. The ID is passed as-is from our call. The "bbox" field is is the bounding box of the face in top, right, bottom, left format. The "embed" field is the 128-dimension embedding for the face.

{"id": "demo-fourfaces", "bbox": [160, 324, 345, 139], "embed": [-0.17864948511123657, 0.039305590093135834, 0.12228338420391083, 0.03522704914212227, -0.14781200885772705, -0.02983931079506874, 0.06539969146251678, -0.12651926279067993, 0.030300049111247063, -0.05232788622379303, 0.16666513681411743, 0.011832849122583866, -0.26839837431907654, -0.022123921662569046, 0.03421848267316818, 0.12717273831367493, -0.040247753262519836, -0.17930294573307037, -0.13857263326644897, -0.09828142076730728, -0.011200197041034698, 0.05801715329289436, -0.08471457660198212, 0.03712156414985657, -0.04542381316423416, -0.3166368901729584, -0.09851729869842529, -0.02199438214302063, -0.015605790540575981, -0.05341707170009613, 0.028149370104074478, 0.035263556987047195, -0.14548949897289276, -0.0796029269695282, 0.04023762419819832, 0.07273219525814056, -0.04574459046125412, -0.07554158568382263, 0.16594184935092926, -0.04789800941944122, -0.12926824390888214, -0.03398921340703964, 0.1269501894712448, 0.345369815826416, 0.08139482140541077, 0.0392020009458065, 0.05524780601263046, -0.0044038426131010056, 0.08742167800664902, -0.2748998701572418, 0.07904928922653198, 0.0850796103477478, 0.1143764853477478, 0.08914776891469955, 0.06080998107790947, -0.165929913520813, 0.06349259614944458, 0.09026609361171722, -0.20712654292583466, 0.15365798771381378, 0.10713662207126617, -0.22628237307071686, -0.038184069097042084, -0.10768427699804306, 0.22942973673343658, 0.0385909266769886, -0.08706078678369522, -0.08520762622356415, 0.10195647180080414, -0.13876187801361084, -0.06280186027288437, -0.023359447717666626, -0.13174735009670258, -0.1418939232826233, -0.3522339463233948, 0.09327895939350128, 0.4682317078113556, 0.14738281071186066, -0.1443435549736023, 0.03294626995921135, -0.059879302978515625, 0.022247279062867165, 0.07491916418075562, 0.04242248833179474, -0.032564688473939896, -0.04539923369884491, -0.14336425065994263, 0.10870344191789627, 0.2153690904378891, 0.0027542738243937492, -0.025816356763243675, 0.17369721829891205, 0.028856221586465836, -0.04113640636205673, 0.034996408969163895, 0.09046683460474014, -0.05310817435383797, -0.030664021149277687, -0.052754923701286316, 0.07853519171476364, 0.040413569658994675, -0.13384798169136047, 0.09165585786104202, 0.05322222411632538, -0.18211212754249573, 0.1743214726448059, -0.0001928168348968029, 0.04410813748836517, -0.013994031585752964, -0.03338450565934181, -0.15693722665309906, 0.0008632733952254057, 0.14043861627578735, -0.2556890845298767, 0.19726061820983887, 0.1818806678056717, -0.014775135554373264, 0.1612945944070816, -0.013766753487288952, 0.07613223791122437, -0.051591236144304276, -0.03611420840024948, -0.14214201271533966, -0.09632895141839981, 0.0748121440410614, -0.009095346555113792, 0.08380663394927979, 0.1420854926109314]}
{"id": "demo-fourfaces", "bbox": [229, 917, 304, 842], "embed": [-0.12101098150014877, 0.0425533652305603, 0.023132074624300003, -0.005107216536998749, -0.1328006237745285, -0.027090147137641907, 0.048460133373737335, -0.14943422377109528, 0.13038715720176697, -0.062426045536994934, 0.12775298953056335, -0.06707772612571716, -0.2594861090183258, -0.01782461255788803, 0.03191949427127838, 0.08358856290578842, -0.022448979318141937, -0.18641680479049683, -0.20234745740890503, -0.14988984167575836, -0.10426526516675949, 0.015236261300742626, 0.023684458807110786, 0.03525495529174805, -0.13969913125038147, -0.26837772130966187, -0.07511623203754425, -0.09430965781211853, 0.03896374627947807, -0.06868370622396469, -0.01689750701189041, 0.08724984526634216, -0.17822228372097015, -0.16026270389556885, 0.016122382134199142, 0.07099311053752899, -0.07691973447799683, -0.12123029679059982, 0.229859858751297, 0.011663392186164856, -0.10474741458892822, 0.006521326024085283, 0.09991361945867538, 0.287364661693573, 0.16480302810668945, 0.033667780458927155, 0.08849038928747177, 0.001895921304821968, 0.1757497936487198, -0.29366520047187805, 0.04714662954211235, 0.06492559611797333, 0.21603326499462128, 0.03831672668457031, 0.13656897842884064, -0.16642433404922485, 0.05851947143673897, 0.1065797507762909, -0.19085761904716492, 0.1545921117067337, 0.11251700669527054, -0.031821563839912415, -0.07987836003303528, -0.06844618916511536, 0.2013626992702484, 0.1093655377626419, -0.06591767072677612, -0.0751778781414032, 0.11210036277770996, -0.17642372846603394, -0.044973939657211304, 0.06651598960161209, -0.07219310104846954, -0.17446961998939514, -0.2431880384683609, 0.06840577721595764, 0.35370951890945435, 0.16108918190002441, -0.16895876824855804, -0.004466937854886055, 0.026844877749681473, 0.009236724115908146, 0.04633918032050133, 0.021008862182497978, -0.12101863324642181, -0.0494183674454689, -0.1121143028140068, -0.01846059039235115, 0.23893600702285767, 0.01528197806328535, -0.048091962933540344, 0.19949769973754883, 0.049821190536022186, -0.0355987511575222, 0.04003162682056427, 0.0551452599465847, -0.12606985867023468, -0.013402381911873817, -0.052962567657232285, 0.07473694533109665, 0.11355161666870117, -0.09953983873128891, 0.020192435011267662, 0.0060191163793206215, -0.1447656750679016, 0.17542147636413574, -0.11431761831045151, 0.02212953008711338, -0.05381240323185921, -0.07818562537431717, -0.09847626090049744, 0.05359233170747757, 0.15847641229629517, -0.264535129070282, 0.09073781967163086, 0.1556558609008789, -0.11625225841999054, 0.11912797391414642, 0.04295163229107857, 0.03202695772051811, 0.04605582356452942, 0.11220956593751907, -0.14312376081943512, -0.19383801519870758, 0.06815055012702942, -0.07877927273511887, 0.030703337863087654, 0.08992695063352585]}
{"id": "demo-fourfaces", "bbox": [266, 564, 310, 520], "embed": [-0.20809172093868256, 0.052269838750362396, 0.11469502002000809, -0.035609979182481766, -0.20176103711128235, -0.012985514476895332, 0.02504940703511238, -0.09055523574352264, 0.11045192927122116, -0.12939046323299408, 0.19537799060344696, -0.01997452788054943, -0.2285156548023224, -0.01994839310646057, -0.002241453155875206, 0.17861686646938324, -0.10324321687221527, -0.1739717274904251, -0.07429373264312744, -0.05580921098589897, -0.03197174146771431, 0.03197320178151131, -0.020700519904494286, 0.030275648459792137, -0.11058095097541809, -0.2818998694419861, -0.023798011243343353, -0.03694014996290207, 0.012616117484867573, -0.0346299484372139, -0.0012244908139109612, 0.150570347905159, -0.1364002525806427, 0.0031426306813955307, 0.022205233573913574, 0.16708089411258698, -0.11921662092208862, -0.09727229923009872, 0.18533068895339966, -0.035548772662878036, -0.2543056905269623, -0.08079206943511963, 0.09654794633388519, 0.2002182900905609, 0.2127065509557724, -0.009975263848900795, 0.059952814131975174, -0.06741592288017273, 0.14966602623462677, -0.32402971386909485, 0.048830002546310425, 0.11126343160867691, 0.07572256773710251, 0.048054039478302, 0.09651260077953339, -0.19907522201538086, -0.010621125809848309, 0.08110444247722626, -0.15248313546180725, -0.02870679460465908, 0.08051858097314835, -0.07162759453058243, -0.058132022619247437, -0.1365705281496048, 0.13688798248767853, 0.15084221959114075, -0.04770538955926895, -0.19559246301651, 0.2559064030647278, -0.2306903749704361, -0.09723961353302002, 0.15481199324131012, -0.07362542301416397, -0.17488059401512146, -0.22921033203601837, 0.02642517164349556, 0.25943082571029663, 0.1618480533361435, -0.05095580965280533, -0.01207863911986351, 0.057172276079654694, -0.028026916086673737, 0.007025041151791811, 0.08596308529376984, -0.027844263240695, -0.052847545593976974, -0.09330470860004425, -0.04507746547460556, 0.2374761402606964, -0.03623111546039581, 0.027252614498138428, 0.18791645765304565, 0.08300478011369705, -0.07675457000732422, 0.02879537269473076, 0.05016470327973366, -0.058246511965990067, 0.04282402992248535, -0.1545834094285965, 0.008609519340097904, 0.00862160138785839, -0.0697515606880188, -0.0011912425979971886, 0.09001054614782333, -0.14734968543052673, 0.16690923273563385, -0.12278223782777786, 0.005514219403266907, -0.06699670106172562, -0.12564659118652344, -0.07195210456848145, 0.035306308418512344, 0.13217896223068237, -0.17925094068050385, 0.11923614889383316, 0.13065052032470703, -0.018053555861115456, 0.13238896429538727, 0.08691930025815964, 0.08329896628856659, 0.024142444133758545, -0.057083889842033386, -0.19194866716861725, -0.10846623033285141, 0.07122141122817993, -0.10142354667186737, -0.05437728390097618, 0.0630180686712265]}
{"id": "demo-fourfaces", "bbox": [222, 751, 274, 700], "embed": [-0.08353783190250397, 0.1764019876718521, 0.09170341491699219, -0.0328945517539978, -0.11893700063228607, 0.06088455766439438, -0.009969118982553482, -0.12365452200174332, 0.0670238807797432, -0.09508395195007324, 0.15384680032730103, -0.07348308712244034, -0.31817111372947693, -0.00592381227761507, 0.08348063379526138, 0.145731583237648, -0.15612459182739258, -0.0929897129535675, -0.1913108229637146, -0.09453311562538147, -0.04978782683610916, 0.0866231620311737, 0.03570444881916046, -0.0712832659482956, -0.08592157065868378, -0.3093200623989105, -0.016368849202990532, -0.09970956295728683, 0.024614553898572922, -0.04319508373737335, 0.09076881408691406, 0.049859046936035156, -0.18224146962165833, -0.05505646765232086, 0.012900271452963352, 0.029360752552747726, -0.01948626898229122, -0.024154772982001305, 0.11838910728693008, -0.08596626669168472, -0.16502001881599426, 0.013479238376021385, 0.06159155070781708, 0.19966255128383636, 0.20940464735031128, -0.006025460083037615, 0.03605932742357254, -0.12667210400104523, 0.11432887613773346, -0.2284473031759262, -0.015081238001585007, 0.07833901792764664, 0.07307567447423935, 0.04851754009723663, 0.0586063414812088, -0.11021469533443451, 0.03595983237028122, 0.15978573262691498, -0.20442333817481995, 0.10933297127485275, 0.13632141053676605, -0.06837885081768036, -0.04926469549536705, -0.0045211766846477985, 0.12168696522712708, 0.03792048990726471, -0.052968669682741165, -0.1929343044757843, 0.19224362075328827, -0.13668964803218842, -0.07943649590015411, 0.0488387867808342, -0.10507603734731674, -0.15172986686229706, -0.2893330454826355, -0.0028890185058116913, 0.3378438353538513, 0.07016987353563309, -0.19710145890712738, -0.05218148231506348, -0.03359372541308403, -0.013944575563073158, 0.06041373312473297, 0.04744061455130577, -0.04660700261592865, -0.18223203718662262, -0.0736737847328186, 0.0520886667072773, 0.2590909004211426, -0.08627015352249146, 0.0004609824391081929, 0.2323981374502182, 0.0345662422478199, -0.10276246815919876, -0.003525245003402233, 0.08618244528770447, -0.11870571225881577, 0.012613384053111076, -0.18445169925689697, -0.038660723716020584, 0.05273377150297165, -0.1316019743680954, -0.0103328051045537, 0.15619082748889923, -0.18389849364757538, 0.19717319309711456, -0.050354018807411194, -0.0026806280948221684, 0.03995943069458008, -0.03013342246413231, -0.041689276695251465, -0.0126032130792737, 0.24378114938735962, -0.12752659618854523, 0.17438633739948273, 0.23628224432468414, -0.03154789283871651, 0.07062218338251114, 0.037162408232688904, 0.10944858193397522, 0.03268656134605408, 0.04701278358697891, -0.22686129808425903, -0.15018485486507416, 0.037161607295274734, -0.012386120855808258, -0.0718928799033165, 0.052153363823890686]}

If there are no faces present in the image, the API will return a code 200 response, but the contents will be blank, allowing you to safely pipe the API output directly downstream. Only in the case of an unrecoverable error will a code 500 and a JSON error message be returned.

Of course, fetching arbitrary images from the web is highly error-prone and introduces high latency, so production applications will want to process local images. The highest-throughput and safest way of doing so is to pass the API the filepath of an image from local disk by replacing the "url" field with "file":

cd /tmp/
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/RUSSIA24_20230306_123000_RIK_Rossiya_24-000382.jpg
time curl -s -f -X POST http://localhost:8088/faceembed -H 'Content-Type: application/json' -d '{"id": "demo-fourfaces", "file":"/tmp/RUSSIA24_20230306_123000_RIK_Rossiya_24-000382.jpg"}'

The API will then read the local image and return the same results as above. This is the fastest and most efficient way to use the API and is the method we recommend for real-world applications.

Finally, if the server is running a separate VM from the one containing the image data, you can upload the image in a BASE64 encoding. We don't recommend this, as it adds substantial overhead and is highly inefficient, but does allow image data to be directly provided to the API in cases where the API is running on a different VM than the image data. While there are more efficient mechanisms for packing JSON + raw binary data into contiguous HTTP streams, we chose BASE64 encoding for its simplicity of implementation for this example API.

cd /tmp/
wget https://storage.googleapis.com/data.gdeltproject.org/blog/2022-tv-news-visual-explorer/RUSSIA24_20230306_123000_RIK_Rossiya_24-000382.jpg 
echo -e "{\"id\": \"demo-fourfaces\", \"data\":\"$(base64 -w0 ./RUSSIA24_20230306_123000_RIK_Rossiya_24-000382.jpg)\"}" > request.json
time curl -s -f -X POST http://localhost:8088/faceembed -H 'Content-Type: application/json' -d @./request.json

Here there are two steps. The first is to generate the JSON to be POSTed that contains the BASE64-encoded image and the second then uploads that to the API.

That's literally all there is to it! You simply download the server above and launch it, then hand it images via URL, filepath or direct upload and it returns a list of faces and their bounding boxes and embeddings it identified in the image!

Note that the code in our server doesn't handle edge cases like packet loss, partial reads, buffer overruns, corrupt data and the myriad other challenges that confront real-world servers running on heavily loaded systems in the real world. Thus, for production applications we recommend limiting to localhost use.

Due to the underlying face recognition library being single-threaded, our API uses a single-threaded server architecture, meaning on a multi-core system it will run on only a single core. If multiple processes submit images to the server in parallel they will be queued and run one at a time. Look closely in the code and you'll notice we override server_bind and set SO_REUSEADDR and SO_REUSEPORT on the server socket. That makes scaling our API up to as many cores as are available on the local machine trivial: multiple copies of our server can bind to the same port and the kernel takes care of queuing requests and distributing them across the servers.

In other words, if our VM has 8 cores, we simply launch the server 8 times like this:

./server_faceembeddings.py 8088&
./server_faceembeddings.py 8088&
./server_faceembeddings.py 8088&
./server_faceembeddings.py 8088&
./server_faceembeddings.py 8088&
./server_faceembeddings.py 8088&
./server_faceembeddings.py 8088&
./server_faceembeddings.py 8088&

Now there are eight copies of the server running, all of them sharing port 8088. Now, as images are submitted to port 8088, the kernel evenly hands them off to all of the servers in parallel. If 8 requests come in at once, one request is given to each of the 8 servers, allowing them to be processed entirely in parallel. If 16 requests come in at the time same, the first 8 are handed to the servers and the remaining 8 are queued and doled out to the servers as they each finish. The kernel handles all of this for us, so all we have to do is run as many servers as we have cores and just submit images to the shared port as fast as we want and let the kernel handle everything from there. If we run one server per core, we have complete parallelism, with the kernel itself handling all of the workload sharding for us.

Of course, for larger systems with large numbers of cores, it would be cumbersome to copy-paste "./server_faceembeddings.py 8088&" to launch dozens of copies of the server. Instead, using xargs and nproc, we can use a single command to launch as many copies of our server as there are cores on the system. On a 64-core system, this will launch 64 copies of the server:

time seq 1 $(nproc) | xargs -n1 -P0 sh -c "./server_faceembeddings.py 8088&"

In real life, networking and other overhead means around half our servers are not in their peak-CPU routines at any given moment, meaning under full load with all servers processing images, we only achieve 50-60% CPU utilization. Thus, in practice it is better to take the number of cores and run 1.5x as many servers. Thus, this is a better command, which on a 64-core system will launch 96 servers that can achieve 99% core utilization:

time seq 1 $(( $(nproc) + ($(nproc) / 2) )) | xargs -n1 -P0 sh -c "./server_faceembeddings.py 8088&"

Alternatively, we can wrap the invocation in a bash one-liner that will automatically restart the server if it crashes, unless it is killed via a SIGHUP (the signal it uses internally to kill itself upon an "exitserver" command):

time seq 1 $(( $(nproc) + ($(nproc) / 2) )) | xargs -n1 -P0 sh -c 'while true; do ./server_faceembeddings.py 8088 || [ $? -eq 129 ] && break; sleep 1; done &'

Now with those 96 servers running, how can we leverage them in a parallel-processing pipeline? Let's try analyzing two weeks of Russia 1 coverage, totaling 1 million seconds of airtime across 245 broadcasts.

First, we'll download the inventory files for Russia 1 for the last two weeks and compile a list of all broadcasts from those days and then download all of their Visual Explorer frames (each frame represents 4 seconds of airtime):

#GET THE SHOWS
start=20230401; end=20230413; while [[ ! $start > $end ]]; do echo $start; start=$(date -d "$start + 1 day" "+%Y%m%d"); done > DATES
mkdir JSON
time cat DATES | parallel --eta 'wget -q https://storage.googleapis.com/data.gdeltproject.org/gdeltv3/iatv/visualexplorer/RUSSIA1.{}.inventory.json -P ./JSON/'
rm IDS; find ./JSON/ -depth -name '*.json' | parallel --eta 'cat {} | jq -r .shows[].id >> IDS'

#DOWNLOAD AND UNPACK THE IMAGES
mkdir IMAGES
time cat IDS | parallel --eta 'wget -q https://storage.googleapis.com/data.gdeltproject.org/gdeltv3/iatv/visualexplorer/{}.zip -P ./IMAGES/'
time find ./IMAGES/ -depth -name '*.zip' | parallel --eta 'unzip -n -q -d ./IMAGES/ {} && rm {}'
time find ./IMAGES/ -depth -name "*.jpg" | wc -l

In all, this yields 269,969 images to be processed. Using GNU parallel, we can process these quarter-million images in full parallelism across all 96 API servers running on our 64-core VM with just a single line:

apt-get -y install parallel
mkdir EMBEDDINGS
time find /your/directory/IMAGES/ -depth -name "*.jpg" | parallel --eta -j 96 --line-buffer "curl -s -f -X POST http://localhost:8088/faceembed -H 'Content-Type: application/json' -d '{\"id\": \"{/.}\", \"file\":\"{}\"}' | flock -x -w 20 ./EMBEDDINGS/{= s/.*\/(.*?)\-\d+\.jpg\$/\$1/ =}.json -c \"cat >> ./EMBEDDINGS/{= s/.*\/(.*?)\-\d+\.jpg\$/\$1/ =}.json\""

The nested double and single quote marks necessitate the cumbersome escaping backsplashes that make the command more difficult to read (note that if you swap the outer double quotemarks to single quotemarks, the regex at the end will not function properly). The regular expression at the end of the command is optional, but what it does is take each image filename like "/your/directory/RUSSIA24_20230306_123000_RIK_Rossiya_24-000382.jpg" and extracts just the broadcast's name from the filename like "RUSSIA24_20230306_123000_RIK_Rossiya_24" and uses that as the filename to write the embedding to. Thus, the embedding for each image will be written to a JSON file for that specific broadcast and the ID field for each face will be set via the "{/.}" to the image filename minus the path and .jpg extension. To prevent partial writes due to overlapping appends, we pipe the output of curl to flock to perform the actual append – this allows the curl to proceed in parallel and only flocks the appending of the output to the destination file.

On our 64-core system running 96 API servers the total airtime of Russia 1 over these two weeks totaling more than 1 million seconds of airtime took just over 60 minutes to analyze, showing just how efficient this workflow is.

To exit all of these servers, just run the command below, which calls the "exitserver" command on each server. Note that due to the single-threaded nature of Python's HTTPServer, this endpoint internally calls the interpreter's signal handler to exit the application, which means that on occasion you might have to run this command multiple times if some servers don't properly exit the first time.

time seq 1 $(ps -ef | grep server_faceembeddings.py | wc -l) | parallel -j 1 --eta "curl -s -f -X POST http://localhost:8088/exitserver -H 'Content-Type: application/json' -d '{}'"
ps -ef| grep server_faceembeddings | wc -l

If you encounter problems killing the servers, you can use the following to hard-kill them all:

ps -ef | grep server_faceembeddings | awk '{print $2}' | parallel 'kill {}'

Congratulations, you now have a production-scale fully parallelized face detection and embedding RESTful API service that can be dropped into any workflow to perform at-scale facial analysis, with the resulting JSON being piped directly to downstream pipelines or databases for everything from face clustering to realtime interactive facial search.