What Google's Video AI API Sees In A Russian Television News Episode Of "Antifake"

In a glimpse at the future of how states may harness the concept of "disinformation" to reframe opposing information as such, the new "Antifake" show on Russian television channel 1TV showcases a mixture of commentary, media clipping, analysis and high production value to cast negative information as Western "disinformation." What does one of these broadcasts look like through the eyes of Google's Video AI API, extracting the text, objects and activities depicted onscreen?

To explore this further, we took yesterday's 9:20AM MSK broadcast of Antifake and ran it through the API using the following command:

curl -s -X POST \
  -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
  -H "Content-Type: application/json; charset=utf-8" \      --data "{
     'features': [
    'location_id': 'us-east1',
    'videoContext': {
        'labelDetectionConfig': {
           'labelDetectionMode': 'SHOT_AND_FRAME_MODE',
           'stationaryCamera': false,
           'model': 'builtin/latest'
        'shotChangeDetectionConfig': {
            'model': 'builtin/latest'
    outputUri: 'gs://[BUCKET]/1TV_20220602_062000_AntiFeik.full.json'
  }" "https://videointelligence.googleapis.com/v1/videos:annotate"

You can download the complete raw JSON annotation file produced by the API with frame-level annotations:

Once you've downloaded the JSON file above, drag-drop it onto the "Your JSON" box at the top-right of the Video Intelligence API Visualizer page. Since the video of this broadcast is not downloadable, you'll get an error that the JSON file doesn't match the stock demo video on the Visualizer page and you won't be able to click on any of the entities to see them in place in the video, but you can view the broadcast in the TV Visual Explorer and eyeball roughly where in the broadcast each corresponds to by looking at the timelines and timecode offsets reported by the Visualizer:

The OCR data is particularly useful in scanning chyrons for the names and affiliations of commentators to better understand who is telling each story in a given broadcast, as well as allowing the onscreen text of textual reports that are shown onscreen to be copy-pasted into Google Translate to understand what they say.