The GDELT Project

Gemini 1.5 Pro's 1 Million Token Model: Summarizing One Full Day Of A Russian Television News Channel

The release of Google's new Gemini 1.5 Pro marks the first production foundation model LLM to achieve a context window of one million tokens. A context window this large opens an incredible new door for media analysis: the ability to process a complete 24 hours of a spoken word transcripts of a single television news channel in a single pass, with no prompt chaining, pyramidal distillation or any of the other myriad traditional workarounds that tend to severely degrade the output. In other words, for the first time ever in a production foundation model, we can simply hand it a text file containing a complete 24 hours of a single television news channel and ask it about it in a single prompt! What does it look like to take a complete 24 hours of a Russian television news channel, transcribe it using GCP's Speech-to-Text API V2 Chirp model, translate it into English using Google Cloud Translation API and then hand the two complete 24 hour transcripts (the original Russian transcript from Chirp and the translated English version from Google Translate) to Gemini 1.5 Pro and ask it to summarize one full day?

The end result is that we get entirely plausible day-length summaries. No attempt has been made to evaluate the summaries below to estimate coverage (how well they capture all of the stories) or hallucination (how many of the details were made up), but overall the summaries appear quite reasonable. While its not perfectly clear-cut, anecdotally, the summaries based on the English translations do appear to be slightly better organized and better summarized, though both Russian and English-derived summaries are extremely similar. Notably, however, despite running each transcript twice, each time in a new session, both Russian and English transcripts were initially broken down into a nice hierarchical summary, whereas for both the second time they were merely reduced to a bulleted list of stories, capturing the high variability of the outputs, though this could likely be tuned through the prompt. Overall, for the first time ever, we are now able to process a complete 24 hour day of a single television news channel in a single context window to examine using a single prompt that is able to consider the full 24 hours in its response.

Let's start with the English machine translation. The full transcript is 143,740 tokens and the basic summarization prompt below took 44 seconds to complete.

Summarize the attached television news transcript.

This yields the following:

Summary of Television News Transcript

This transcript covers a variety of news stories from Russia and around the world, with a focus on politics, the war in Ukraine, and human interest stories.

Major themes:

Specific stories covered:

This summary provides a glimpse into the diverse range of topics covered in the news transcript, offering insights into current events and highlighting the human stories behind the headlines.

The Russian transcript is 159,131 tokens, but surprisingly took just 34 seconds to complete. However, it is also much briefer and less detailed than the English version:

The transcript covers various news segments from the "Vesti" program on the Russia channel, focusing on topics like:

Politics:

International:

Domestic:

Human Interest:

The transcript also includes brief mentions of other news items, cultural events, and program promotions.

Let's try our alternative prompt that explicitly asks for a story-by-story breakdown of the transcript:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each. Include even the smallest discrete stories that are just barely mentioned.

The English transcript took 38 seconds to complete:

Discrete Stories from Television News Transcript:

Politics & International Relations:

War in Ukraine:

Human Interest Stories:

Other:

This list provides a more detailed breakdown of the individual stories covered in the transcript, allowing for easier navigation and understanding of the various topics presented.

Running a second time took 56 seconds:

Discrete Stories from the Transcript:

    1. Belgorod Attack:
    1. Putin Visits Ural Factories:
    1. New High-Speed Train:
    1. Russian Military Action in Ukraine:
    1. Ukrainian Military Situation:
    1. Western Response to the Conflict:
    1. European Protests:
    1. New Pulmonology Department in Moscow:
    1. Scientific Tourism in Chelyabinsk:
    1. Space Park in Slavyansk:
    1. Sleigh Festival in Moriel:
    1. Tamarin Family at Kaliningrad Zoo:
    1. Naming the Amur Tiger:
    1. Dr. Myasnikov's Health Program:
    1. Dr. Bubnovsky's Exercise Therapy:
    1. Nana Group's 35th Anniversary:
    1. Unequal Marriage:
    1. Truck Driver's Struggles:
    1. Investigation of Drug Trafficking:
    1. Happy Ending:

The Russian transcript took 41 seconds:

Discrete Stories from the "Vesti" Transcript:

Politics:

International:

Domestic:

Human Interest:

Other:

This list provides a brief overview of the various stories covered in the transcript, highlighting the diversity of topics and perspectives presented in the "Vesti" program.

Running the Russian transcript a second time took 54 seconds:

Discrete Stories from the Transcript:

    1. The Business of Supplements:
    1. The Red Lada Niva:
    1. The Saab 9000:
    1. The Mysterious Girl in the Sundress:
    1. The Belgorod Attack:
    1. Putin's Visit to Uralvagonzavod:
    1. The Vampire MLRS Attack:
    1. The New Pulmonology Center in Moscow:
    1. The Search for Mom:
    1. The Love Story of an Older Woman and Younger Man:
    1. The Queen of the Road:
    1. The Mysterious Illness:
    1. The Belgorod Attack Aftermath:
    1. The New Train "Finist":
    1. The Belgorod Attack Investigation:
    1. The Ukrainian Soldier's Story:
    1. The Polish-Ukrainian Grain Dispute:
    1. The Price of War:
    1. The New Pulmonology Center in Moscow:
    1. The Dream of Becoming a Pilot:
    1. The Queen of the Road – Continued:
    1. The Mysterious Woman:
    1. The Family Reunion:
    1. The Future of Russia: