The GDELT Project

LMMs & Google's Gemini 1.5 Pro Watching Television News: Overriding Gemini's Sampling To Extend Its Context Window To 2.5 Hours

Google's new Gemini 1.5 Pro LMM model accepts videos up to one hour in length and internally samples them into thousands of still frames that it uses as visual surrogates for the video content. By default, Gemini supports videos up to one hour in length due to its one million token limit (a 30 minute video typically yields just under half a million tokens). How might we push past this one hour limit to process larger chunks of video? Perhaps the simplest method is simply to override the Gemini sampling process: we run our own custom sampling algorithm on the raw video file to extract a series of still frames, then reassemble back into a video flipbook that we then provide to Gemini as its final input. Let's apply this workflow to load a 2.5 hour video into Gemini, increasing the model's video processing capacity by two and a half times.

The end result is that we are able to successfully scale Gemini beyond its default 1 hour video limitation up to 2.5 hours of video, with the potential to process just over 7 hours of video with a single prompt using this specific sampling rate and workflow. With this specific sampling rate and workflow, Gemini's whole-broadcast summarization capabilities decline sharply – though it is still able to identify dominate visual themes. Directed Q&A performance is strong and it is able to interact with a human analyst to answer various questions about the broadcast, including identifying major figures who appear in the broadcast and the context of their appearances.

Here we'll use the same Russia 1 broadcast we've been using in our experiments to date, totaling 2.5 hours of airtime. We'll sample it at 1/4fps (one frame every 4 seconds) and then reassemble into a 2fps flipbook using:

apt-get -y install ffmpeg
time find ./Extract/ -depth -name "*.jpg" | sort > FRAMES.TXT
cat $(cat FRAMES.TXT) | ffmpeg -framerate 2 -i - -vcodec libx264 -vf "pad=ceil(iw/2)*2:ceil(ih/2)*2" -y ./gemini.mp4

We then provide the final video to Gemini. Despite capturing the overall visual narrative arc of 2.5 hours of airtime, the final surrogate video consumes just 334,536 tokens, suggesting we could use this approach to scale up to just over 7 hours of airtime in a single prompt.

Let's attempt to summarize the video:

Describe this video in detail. It is a sequence of frames sampled at 1/4fps from a longer video and played back at 2fps. Describe the major stories in it. VIDEO:

This yields the relatively unimpressive results:

Examining a range of summarization prompts yields extremely poor results at macro-scale summarization, with most attempts yielding results similar to those above, missing the majority of the broadcast's content, despite this approach working well at smaller scales. This suggests that at longer time horizons the more diverse content mix may be confusing the model.

How about asking about the dominate themes of the broadcast if we can't summarize it?

What is the dominate theme of the broadcast?

This yields the reasonable macro-level themes:

The dominant theme of the broadcast is International Women's Day. The broadcast features several segments about women, including Putin's address to the nation, a segment about women serving in the Russian Armed Forces, and a segment about the final of the vocal show "Nu-ka vse vmeste!", which featured several female performers.

The broadcast also touches on other themes, such as the ongoing conflict in Ukraine and the release of the romantic film "Led 3". However, these segments are shorter and less prominent than the segments about International Women's Day.

Overall, the broadcast is a celebration of women and their achievements, and it is clear that this is the dominant theme.

Let's shift our focus instead to directed Q&A:

Does President Biden appear in the video?

This works much better:

Yes, President Biden appears in the video. He is shown giving his State of the Union address to Congress.

Scaling up our prompt:

Give me a list of all of the public figures that appear in the video and the timecodes where they appear. For each appearance describe it briefly. Output it in a table.

Yielding:

Public Figure Timecode Description
Vladimir Putin 00:05-00:07, 00:21-00:35, 10:10-10:11 President of Russia, giving an address to the nation on International Women's Day.
Joe Biden 00:09-00:11, 04:40-05:07 President of the United States, giving his State of the Union address to Congress.
Donald Trump 05:01-05:02 Former President of the United States, commenting on Biden's State of the Union address on his social media platform Truth Social.
Bernie Sanders 04:58 U.S. Senator from Vermont, sitting in the audience during Biden's State of the Union address.
Kipr Machaika 01:39-01:43, 01:50-01:57, 06:36-06:37, 10:01, 10:50-10:53, 12:28-12:29, 14:34-14:35, 16:06-16:07, 16:15, 17:17-17:23, 17:38-17:39, 18:26-18:27, 18:48, 19:42-19:44 Honored journalist of the Russian Federation and political observer, being interviewed about his career.
Evgeny Shirokov 02:02-02:03 Journalist and Honored Worker of Culture of the Russian Federation, appearing in archival footage.
Mikhail Antonov 02:06-02:07 Director of the European Bureau of VGTRK, being interviewed about his career.
Oleg Posovin 05:44-05:45, 06:07 Captain of the transport refrigerator "Novy Svet", being interviewed about the capelin fishing season.
Rodion Andreev 05:47-05:48 Captain of the transport refrigerator "Novy Svet", being interviewed about the capelin fishing season.
Aleksandr Lizogub 05:57-05:58 Head of the Department of Fisheries Organization and State Control in the Maritime Regions of the Severomorsk TU of the Federal Agency for Fisheries, being interviewed about the capelin fishing season.
Oksana Garlinich 06:05-06:06 Salesperson-cashier at a fish store, being interviewed about the capelin fishing season.
Vasiliy Korolev 03:00-03:01, 03:04-03:05 Hunter, being interviewed about the restoration of the spotted deer population in the Nizhny Novgorod region.
Roman Vorobyov 02:56-02:57 Hunting inspector, being interviewed about the restoration of the spotted deer population in the Nizhny Novgorod region.
Sergey Balmaev 02:49-02:50 Hunting inspector, counting wild animals in the Nizhny Novgorod region.
Elena Klimenko 02:40-02:42 Chairperson of the Sverdlovsk Oblast Election Commission, explaining the use of video cameras and video recorders at polling stations.
Tatyana Merzlyakova 02:32 Commissioner for Human Rights in the Sverdlovsk Oblast, explaining the role of public observers in the election.
Marina Makarova 04:18-04:21 Honored Military Medic, being interviewed about her work in the Donbass.
Olga Arzamaztseva 04:27-04:29 Military doctor, being interviewed about her work in the Donbass.
Olesya Gorbachyova 04:34-04:35 Senior nurse of the anesthesiology and intensive care unit, being interviewed about her work in the Donbass.
Viktor Korolyov 07:36, 08:07, 08:32, 09:00, 09:20-09:21, 09:35, 10:20-10:21, 10:38-10:40, 10:46-10:47, 11:31-11:32, 12:38-12:41, 12:44-12:47, 13:34-13:35, 14:38-14:41, 15:43-15:44, 15:48-15:49, 16:06-16:07, 16:15, 16:26, 16:31, 17:13-17:14, 18:08, 18:23-18:24, 19:42-19:44, 20:12-20:13, 20:20-20:21, 20:36-20:37, 20:44-20:45 Singer, performing on the show "Privet, Andrey!".
Nikolay Baskov 06:18, 07:55, 08:35, 09:01, 09:19, 10:20-10:21, 10:38-10:40, 11:31-11:32, 12:38-12:41, 13:34-13:35, 14:38-14:41, 15:43-15:44, 15:48-15:49, 16:06-16:07, 16:15, 16:26, 16:31, 17:13-17:14, 18:08, 18:23-18:24, 19:42-19:44, 20:12-20:13, 20:20-20:21, 20:36-20:37, 20:44-20:45 Singer, performing on the show "Privet, Andrey!".
Aleksandr Serov 07:53, 08:32, 09:01, 09:19, 10:20-10:21, 10:38-10:40, 11:31-11:32, 12:38-12:41, 13:34-13:35, 14:38-14:41, 15:43-15:44, 15:48-15:49, 16:06-16:07, 16:15, 16:26, 16:31, 17:13-17:14, 18:08, 18:23-18:24, 19:42-19:44, 20:12-20:13, 20:20-20:21, 20:36-20:37, 20:44-20:45 Singer, performing on the show "Privet, Andrey!".
Nadezhda Kadysheva 12:38-12:41, 13:34-13:35, 14:38-14:41, 15:43-15:44, 15:48-15:49, 16:06-16:07, 16:15, 16:26, 16:31, 17:13-17:14, 18:08, 18:23-18:24, 19:42-19:44, 20:12-20:13, 20:20-20:21, 20:36-20:37, 20:44-20:45 Singer, performing on the show "Privet, Andrey!".
Grigory Kadyshev 12:38-12:41, 13:34-13:35, 14:38-14:41, 15:43-15:44, 15:48-15:49, 16:06-16:07, 16:15, 16:26, 16:31, 17:13-17:14, 18:08, 18:23-18:24, 19:42-19:44, 20:12-20:13, 20:20-20:21, 20:36-20:37, 20:44-20:45 Singer, performing on the show "Privet, Andrey!".

And Putin?

What does Putin say in the video?

Yielding the reasonable:

In the video, Putin gives an address to the nation on International Women's Day. He begins by praising the strength, beauty, and wisdom of Russian women. He then goes on to talk about the importance of family and the role that women play in raising children. He also mentions the challenges that women face in the workplace and pledges to do everything possible to support families with children. Finally, he expresses his gratitude to the women serving in the Russian Armed Forces and to the mothers and wives of those who are fighting in Ukraine.

And a different way of asking:

What is the context of Putin's appearance in the video?

Yielding:

Putin's appearance in the video is in the context of International Women's Day, which is celebrated on March 8th in Russia and many other countries. International Women's Day is a day to celebrate the social, economic, cultural, and political achievements of women. It is also a day to raise awareness of the challenges that women face and to call for action to promote gender equality.

Putin's address to the nation on International Women's Day is a way for him to show his support for women and to acknowledge their contributions to Russian society. It is also an opportunity for him to address some of the challenges that women face in Russia, such as gender inequality in the workplace and the burden of childcare.