Visual Explorer: Using Chrome's Google Lens Integration To OCR & Translate Onscreen Text

While the Visual Explorer performs speech recognition and machine translation into English on the Belarusian, Persian, Russian and Ukrainian television news broadcasts, there is often considerable onscreen text that can be of importance when trying to understand the context of a given story. You can run your own OCR and translation at scale using the downloadable images ZIP file for each broadcast, but what if you just want to quickly know what a given block of text says and even having the text translated into English seamlessly? It turns out that the Google Chrome browser's integration with Google Lens makes this trivial!

Let's take a look at this clip on Russia 1's 60 Minutes from this past Friday, in which Bloomberg News is cited, but the text is in Russian:

From the translated spoken word transcript, we can tell the context is about a speech Putin will be giving soon. But what does the onscreen text say?

To answer that question, just right click anywhere on the page outside of the video (right-clicking on the video is disabled) to get the right-click popup menu. Here we right-click on the white background just below the video:

Note how one of the options is "Search images with Google Lens." (On Windows it is called "Search images with Google Lens" / on Macs it is called "Search Images with Google"). If you click on that option, the page is darkened and the mouse cursor is turned into a click-drag box and you can click-drag to highlight any rectangular zone on the page. Here we highlight the portion of the video with the text of interest.

Instantly, Chrome opens a new right-hand information bar that shows the selected portion of the page and performs a reverse Google Images search across the open web to find similar images. If the video frame is a strong match for a known image on the web, this will frequently show the original source of the image or at least other major uses of it. This can be a very useful tool for exploring the provenance of an image and verifying more details about it (such as if the video claims the image is of a protest 10 minutes ago in Tehran, but the actual image is an exact match for a protest in Saudi Arabia five years ago). You'll also notice that all textual regions are highlighted – you can click-drag on any of the textual regions to highlight any text on the page and CTRL-C to copy it.

Note that just below the image in the right-hand bar are a trio of three buttons: Search, Text and Translate, with Search highlighted. Click on the Text button and the display will refocus on the highlighted textual regions of the image. You can click-drag on any text to highlight it and Lens will instantly perform a Google search using that text to help you identify provenance and other mentions of that onscreen text passage, just as it just did for the image itself.

But of course, if you can't read Russian, simply having the Cyrillic text OCR'd isn't that helpful. Instead, click on the "Translate" button and instantly all of the text is not only translated into English, but the image itself is modified to replace the original onscreen text with its English Translation in the original font family, size and color to seamlessly match the original text's appearance! Beneath the image it displays the complete translated text in copyable format, while you can also click-drag on the translated text in the image to copy specific passages. Here we can see the Bloomberg text is not about Putin's speech, but rather about an EU banking proposal and a mention of Ukrainian President Zelensky's visit to Brussels.

With the Chrome browser, translating the onscreen text of any video frame or even searching the open web for similar images is as trivial as right-clicking!