More Experiments Visually Describing TV News Using Google's Imagen Generative AI Image Captioning

Earlier this month we examined Imagen on Vertex AI, Google's new image-based generative AI foundational model for images, applying it to two complete television news broadcasts. Today we'll expand those experiments to three new broadcasts to test its international capabilities: one each from the Republic of Congo's Tele Congo, South Sudan's SSBC and Taiwan's CTV.

We've made a few updates to the scripts used last time, including automatic adjustment for the resolution of the original broadcast.

Creating those captioned displays was as simple as:

#install required utilities...
apt-get install -y parallel

#download and unpack the thumbnails...

#unpack the ZIP files...
find *.zip | parallel --eta 'unzip {}'
rm *.zip

#download our API wrapper (NOTE: change the [YOURPROJECTID] in the script to your GCP Project ID):
chmod 755

#now run the API over all of the images
#NOTE: Imagen quotas have increased dramatically this month and you can now safely run many more images in parallel...
time find ./CTV_20230829_090000/ -maxdepth 1 -name "*.jpg" | parallel --eta -j 40 './ {}'
time find ./SOUTHERNSUDAN_20230825_120000/ -maxdepth 1 -name "*.jpg" | parallel --eta -j 40 './ {}'
time find ./TELECONGO_20230825_143000/ -maxdepth 1 -name "*.jpg" | parallel --eta -j 40 './ {}'

#compile the captioning output into the thumbnail displays
chmod 755
time ./ CTV_20230829_090000
time ./ SOUTHERNSUDAN_20230825_120000
time ./ TELECONGO_20230825_143000