The GDELT Project

Gemini 1.5 Pro's 1 Million Token Model: Summarizing In A Single Prompt A Full Week Of A Russian Television News Channel

Earlier today we leveraged the million-token context window of Google's new Gemini 1.5 Pro foundation model to summarize an entire day of one channel of Russian television news in less than a minute. Let's push that boundary even further and process an entire week of a Russian television news channel: five full days of one channel, transcribed by Google's Speech-to-Text API V2 Chirp model and translated into English by the Google Cloud Translation API, then processed by Gemini as a single combined transcript using a single prompt! This pushes the upper bounds of Gemini, clocking in at 976,973 tokens, but Gemini still manages to summarize the entire week of broadcasts in just 1m33s using a single prompt!

While we have not evaluated the summaries below for coverage or accuracy, they demonstrate that we have arrived at a point where it is now possible with a production foundation model to load a complete 5-day week of 24 hour transcripts and analyze them with a single prompt as a single block of text. Moreover, the entire pipeline consists of just three GCP APIs. We use GCP's Chirp ASR to transcribe 5 full days of a Russia 1 from February 11 to February 16 (inclusive) into Russian, then use Google Translate to translate them into English, then hand the complete block of transcribed and translated transcripts to Gemini 1.5 Pro as a single attachment and summarize it using a single prompt.

Summarize the attached television news transcript.

The Russian transcript was 1,083,741 tokens and unfortunately while it successfully loaded, under all scenarios any kind of output pushed past the model's maximum limit of 1,048,576 tokens and so we were unable to test it. The 7-day and 6-day English transcripts were also beyond the million token limit, so we limited ourselves to a 5-day period from February 11 to February 16 (inclusive) of Russia 1.

The English transcript weighed in at 976,973 tokens and took 93 seconds to produce the following summary:

The provided text appears to be a transcript of several Russian television news programs, covering a wide range of topics including:

International:

Domestic:

Human interest stories:

Overall, the transcript provides a glimpse into the current state of affairs in Russia and the world, as presented by Russian state television. It emphasizes Russia's strength and resilience, while criticizing the West and its policies.

Let's try using our story division prompt:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each. Include even the smallest discrete stories that are just barely mentioned.

This yields the following extremely long story summary. Of interest, it lumps the majority of the stories into the "Other" category where it gives them each just a single bullet. It is unclear whether these are all small stories or whether the model essentially "gave up" at a certain point. This took around 280 seconds (4.7 minutes):

Discrete Stories from the Transcript:

    1. War in Ukraine:
    1. Putin's Interview with Tucker Carlson:
    1. Personnel Changes in the Ukrainian Military:
    1. Farmers' Protests in Europe:
    1. US Political Landscape and the 2024 Elections:
    1. Other Notable Events:

What if we ask for better organization of the summary?

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each. Include even the smallest discrete stories that are just barely mentioned. Organize the list into a hierarchical breakdown of major themes and for each story estimate the total word count of the transcript devoted to it, as well as the word or character offset where it begins in the transcript.

This yields the following high-level breakdown which is far less useful:

Hierarchical Breakdown of Discrete Stories in the Transcript:

    1. Global Conflict and Geopolitics:
    1. War in Ukraine:
    1. Putin's Interview with Tucker Carlson:
    1. Personnel Changes in the Ukrainian Military:
    1. The West's Response to the War:
    1. Domestic Issues in Russia:
    1. Economic Development and Social Programs:
    1. Bad Weather and Infrastructure Challenges:
    1. Social Issues and Public Opinion:

III. Human Stories and Cultural Events:

    1. Stories from the Frontlines:
    1. Cultural Events and Celebrations:
    1. Personal Stories and Reflections:

Please note: These word counts and offsets are approximate and may vary slightly depending on the specific interpretation of where each story begins and ends.

Let's add some more instruction:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each. Include even the smallest discrete stories that are just barely mentioned. Organize the list into a hierarchical breakdown of major themes and list each story under the theme it best belongs to. For each story estimate the total word count of the transcript devoted to it, as well as the word or character offset where it begins in the transcript. In other words, I want you to read the entire transcript, determine the major themes represented, then make a hierarchical bulleted list of themes and a list of all of the stories that fall under each, with a brief summary of each story.

This isn't much better, suggesting there is an issue with this specific prompt or output structure:

Hierarchical Breakdown of Discrete Stories in the Transcript by Major Themes:

    1. Global Conflict and Geopolitics:
    1. Domestic Issues in Russia:

III. Human Stories and Cultural Events:

Please note: These word counts and offsets are approximate and may vary slightly depending on the specific interpretation of where each story begins and ends.

Multiple iterations of the above prompt were unable to arrive at a prompt that yielded better results that our original bulleted list of stories, so let's revert back to our original prompt with a few tweaks:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and list of keywords and few-sentence summary of each. Include even the smallest discrete stories that are just barely mentioned.

This time we get a much shorter set of stories

Discrete Stories from the Transcript:

    1. War in Ukraine:
    1. Putin's Interview with Tucker Carlson:
    1. Personnel Changes in the Ukrainian Military:
    1. Farmers' Protests in Europe:
    1. US Political Landscape and the 2024 Elections:
    1. Other Notable Events:

Please note: These summaries are brief overviews of the main points of each story. The transcript contains additional details and nuances that are not included in these summaries.

And let's try the same prompt again:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and list of keywords and few-sentence summary of each. Include even the smallest discrete stories that are just barely mentioned.

This time we get a richer more detailed story-by-story breakdown that includes summaries for every single story and doesn't lump the majority into the long-tail grouping we originally saw:

Discrete Stories from the Transcript:

    1. The Search for Mom:
    1. New Love Melts the Ice:
    1. Oops and the Brigade:
    1. Dana Borisova's Emergency Hospitalization:
    1. The Housing Issue:
    1. Azerchay – A Taste of Health:
    1. The Power of Music:
    1. Questioning Asago's Effectiveness:
    1. A School for Bell Ringers:
    1. Artificial Intelligence Takes Over Dirty Work:
    1. Farmers' Protests Across Europe:
    1. The Mystery of the Revived Portrait:
    1. A New Generation of Compact Hatchbacks:
    1. The Question of Euthanasia:
    1. The High Cost of Loving a Younger Man:
    1. The Search for Missing Children:
    1. A New Life for Deported Pensioner:
    1. The Rise of Financial Scams:
    1. Celebrating the Chinese New Year in Moscow:
    1. The Power of Love and Family:
    1. A New Generation of Mid-Size Crossovers:
    1. The Importance of Exercise for Health:
    1. The Mystery of the Leaky Head:
    1. The High Cost of Medical Treatment:
    1. The Life and Fate of Svetlana Bezrodnaya:
    1. The Dangers of Self-Prescribing Antibiotics:
    1. The Search for Identity in Ukraine:
    1. The Rise of Nationalism in Europe:
    1. The Question of Trust in International Relations:
    1. The Power of the People:
    1. The Future of the Global Order:
    1. The Human Cost of War:
    1. The Importance of Family:
    1. The Future of Russian Industry:
    1. The Beauty of the Moscow Metro:
    1. The Dangers of Hooking:
    1. The Question of Trust in the West:
    1. The Ukrainian Military Leadership Shake-Up:
    1. The Question of Compromise in Ukraine:
    1. The Future of NATO:
    1. The Power of the Russian Military-Industrial Complex:
    1. The Human Cost of the War in Ukraine:
    1. The Importance of Family Values:
    1. The Question of Trust in the West:
    1. The Rise of Nationalism in Europe:
    1. The Question of Compromise in Ukraine:
    1. The Future of NATO:
    1. The Power of the Russian Military-Industrial Complex:
    1. The Human Cost of the War in Ukraine:
    1. The Importance of Family Values:
    1. The Question of Trust in the West:
    1. The Ukrainian Military Leadership Shake-Up:
    1. The Question of Compromise in Ukraine:
    1. The Future of NATO:
    1. The Power of the Russian Military-Industrial Complex:
    1. The Human Cost of the War in Ukraine:
    1. The Importance of Family Values:
    1. The Question of Trust in the West:
    1. The Ukrainian Military Leadership Shake-Up:
    1. The Question of Compromise in Ukraine:
    1. The Future of NATO:
    1. The Power of the Russian Military-Industrial Complex:
    1. The Human Cost of the War in Ukraine:
    1. The Importance of Family Values:
    1. The Question of Trust in the West:
    1. The Ukrainian Military Leadership Shake-Up:
    1. The Question of Compromise in Ukraine:
    1. The Future of NATO:
    1. The Power of the Russian Military-Industrial Complex:
    1. The Human Cost of the War in Ukraine:
    1. The Importance of Family Values:
    1. The Question of Trust in the West:
    1. The Ukrainian Military Leadership Shake-Up: