The GDELT Project

Gemini 1.5 Pro's 1 Million Token Model: Summarizing Half A Day Of A Russian Television News Channel

With the release of Google's new Gemini 1.5 Pro model, for the first time we have a publicly accessible production-grade LLM with a one million token limit, whereas 100K tokens has typically been the upper bound until now, with most models maxing out at around 4K, 8K or 32K. Gemini's 1M token limit opens an entirely new frontier of capabilities, completely eliminating the need for prompt chaining, pyramidal distillation and other complex and fragile workarounds that can dramatically boost hallucination, reduce coherence and otherwise reduce the quality of results. Instead, with Gemini 1.5 Pro, we can hand the model the machine-generated transcripts of a complete day of Russian television news as-is, as a single TXT file and interact with it directly, without any additional work.

To explore this in more detail, we'll start with half a day of Russian television news from Russia 1, machine transcribed using GCP Speech-to-Text API (Chirp), along with an English machine translation via Google Cloud Translation API. This half-day transcript totals 70,000 tokens, placing it above the capacity of all but a handful of models today. To analyze it, we'll use the AI Studio interface to Gemini 1.5 Pro to make it easy to experiment with the model interactively. Remarkably, Gemini is able to summarize the entire half-day transcript in under a minute and a half, sometimes as quickly as 45 seconds. Unlike all of our previous LLM experiments to date across the major commercial models, Gemini actually produces superior results from the original native Russian machine transcription compared with the Google Translate English translation. This is in marked contrast to previous generation models, which typically exhibited dramatically higher quality results from English machine translation of the source content. Intriguingly, unlike our tests earlier today applying Gemini 1.5 Pro to much smaller content samples, where we saw strong differences across prompt instructions, this time we see far more limited changes between basic summarization and asking for more detailed story breakdowns, suggesting prompt coherence may be more limited at scale – a question we will be exploring in more detail in our forthcoming experiments. Overall, Gemini was able to load half a day of machine generated transcripts of a single Russian television news channel and summarize it in under a minute across the majority of our tests, marking a remarkable milestone in publicly accessible large-window LLM capability.

Let's start with a simple summarization prompt:

Summarize the attached television news transcript

Let's try the translated English transcript first:

Summary of Television News Transcript

This transcript covers a variety of news stories from Russia and around the world, with a focus on the following key topics:

1. Russia-Ukraine Conflict:

2. International Relations:

3. Domestic Issues in Russia:

4. Other Notable Events:

Overall, the transcript provides a snapshot of various news stories with a focus on the ongoing conflict in Ukraine and its international implications. It also highlights domestic issues and events within Russia.

Running the English transcript a second time yields nearly identical results:

Summary of Television News Transcript

This transcript covers a variety of news stories from Russia and around the world, with a focus on the following key topics:

1. Russia-Ukraine Conflict:

2. International Relations:

3. Domestic Issues in Russia:

4. Other Notable Events:

Overall, the transcript provides a snapshot of various news stories with a focus on the ongoing conflict in Ukraine and its international implications. It also highlights domestic issues and events within Russia.

Let's try the original Russian transcript. It covers a few of the same stories, but presents a very different summary, emphasizing a different selection of stories:

Summary of Television News Transcript

This transcript covers a variety of news stories and segments from the "Vesti" and "60 Minutes" programs on the Russia channel. Here are the key topics covered:

Military:

Politics:

Other news:

Additional segments:

Overall, this transcript provides a snapshot of the current news and political landscape in Russia and internationally, with a particular focus on the ongoing conflict in Ukraine.

And a second time:

Summary of Television News Transcript

This transcript covers a variety of news stories and segments from the "Vesti" and "60 Minutes" programs on the Russia channel. Here are the key topics covered:

Military:

Politics:

Other news:

Additional segments:

Overall, this transcript provides a snapshot of the current news and political landscape in Russia and internationally, with a particular focus on the ongoing conflict in Ukraine

Let's try a different prompt that asks for a more detailed summary to see if we can better align the two transcripts:

Summarize the attached television news transcript in extreme detail. Include all of the stories discussed in the transcript.

The results on the Russian transcript, which took the model 74 seconds to process:

Extremely Detailed Summary of Television News Transcript

This transcript covers a variety of news stories and segments from the "Vesti" and "60 Minutes" programs on the Russia channel. Here is a detailed breakdown of each topic:

Military:

Politics:

Other news:

Additional segments:

Overall, this transcript provides a detailed and comprehensive overview of the current news and political landscape in Russia and internationally, with a particular focus on the ongoing conflict in Ukraine and its broader implications.

Please note that this summary reflects the perspective and narrative presented in the original transcript, which may contain biases and specific viewpoints.

The results on the English transcript, which took the model 53 seconds to process and is far less detailed. The continued higher quality of the summaries from the original Russian content stands in stark contrast to LLM models to date, which have historically performed far more accurately on translated content, suggesting that Google has invested heavily in globalizing the language support of its models:

Extremely Detailed Summary of Television News Transcript:

This summary includes all the stories discussed in the attached transcript, categorized by topic:

I. Russia-Ukraine Conflict:

II. International Relations:

III. Domestic Issues in Russia:

IV. Other Notable Events:

V. Additional Points:

This detailed summary provides a comprehensive overview of the diverse news stories covered in the television news transcript.

Now let's have the model split the transcript into individual stories:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each. Include even the smallest discrete stories that are just barely mentioned.

The results from the English transcript, which took 47 seconds to process can be seen below. These results don't differ substantially from what we were getting with our generic summarization prompts, whereas with our previous experiments on shorter content we got dramatically different results:

Discrete Stories from Television News Transcript:

I. Russia-Ukraine Conflict:

II. International Relations:

III. Domestic Issues in Russia:

IV. Other Notable Events:

V. Additional Points:

This list provides a concise overview of all the discrete stories discussed in the television news transcript, regardless of their length or prominence.

The Russian transcript took 56 seconds to run and also yield similar results to before, raising the question of whether additional prompt instructions may influence results less with longer content:

Discrete Stories from Television News Transcript:

Military:

Politics:

Other news:

Additional segments:

This detailed breakdown provides a comprehensive overview of the various stories and segments included in the television news transcript.

Please note that this summary reflects the perspective and narrative presented in the original transcript, which may contain biases and specific viewpoints.