Yesterday we explored how to use OpenAI's CLIP to perform natural language visual search of television news. For that exploration we used a large-CPU GCE VM with an off-the-shelf CLI interface to CLIP. Today we're excited to announce an IPython interactive notebook in Colab adapted from this notebook and customized for the Visual Explorer that allows you to select any broadcast from the Visual Explorer and search it using CLIP! You can even explore using different models, extracting the underlying image embeddings for use with other analyses such as clustering, etc!
To get started, open the Colab notebook we've created and select "Save a Copy in Drive" from the "File" menu to make your own copy in your own account that you can edit and use.
Follow the instructions in the third code cell to analyze a different broadcast. You'll need to set "showid" to the ID and then click on the first thumbnail in the Visual Explorer display for that broadcast (top right of the thumbnail grid) and copy-paste the number you see in the "play=" URL parameter" to "starttime" in the notebook.
This notebook requires GPU acceleration, so under the "Runtime" menu choose "Change runtime type" and select "GPU" and "Save" from the popup that appears. Then click on the start button at top left of the first code block and proceed downward in sequence. Note that even with GPU acceleration, the embedding generation stage may take up to a minute or longer, depending on the length of the broadcast you analyze. You can also test different models by searching the Sentence Transformers list for "clip-ViT" options.
Share your interesting results on Twitter and/or drop us a line and we'd love to feature on the blog!