The GDELT Project

Gemini 1.5 Pro's 1 Million Token Model: Summarizing An Evening News Broadcast

Last July we explored the ability of Anthropic's Claude 2 model and its then-novel 100,000 token limit to summarize an entire news broadcast in a single pass, without chaining, pyramidal distillation or other then-standard workarounds for fitting a 30-minute transcript of 3,828 words through LLMs that at the time were typically limited to 4,000 tokens. With the public availability of Google's Gemini 1.5 Pro and its groundbreaking 1 million token public limit, let's see how it performs on the same evening news broadcast. Does its 10x context window improvement yield far better results than what we obtained seven months ago with Claude 2? In particular, while the entire broadcast fit into the context window of Claude 2 half a year ago, Gemini 1.5 Pro brings numerous improvements in attention and model quality compared to SOTA models of Summer 2023, so let's see how it performs. We'll also compare against the older GPT-4 model, which is also able to fit the entire broadcast into its context window.

The end result is that both Gemini 1.5 Pro and GPT-4 provide roughly similar summaries of the primary stories of the broadcast and both seamlessly skip over advertisements without being explicitly instructed to. However, Gemini suffers from GCP's historically inaccurate safety filters that yield extremely high false positive rates and thus it repeatedly truncates its summaries by falsely claiming that the broadcast violates its guardrails, only to report in a different run that there are no safety concerns of any kind with the content. In keeping with our tests of Bison, Unicorn, Gemini Pro, Gemini Ultra and now Gemini 1.5 Pro, it suffers from significantly greater hallucination than GPT 3.5 or 4.0, repeatedly hallucinating stories, whereas in all of our tests GPT missed stories, but did not hallucinate stories that did not exist. At the same time, Gemini did produce more detailed story titles, though this could simply be due to better alignment of our prompts with Gemini's prompt expectations. Both models are able to process a complete 30-minute television news transcript in a single context window and distill it down into component stories with individualized titles and summaries, but both tend to miss smaller stories and Gemini hallucinates when asked to expand beyond only the largest stories in the transcript.

Let's examine the same ABC World News Tonight broadcast that we've used for our previous experiments. For ground truth, we can compare the outputs here against Vanderbilt University's hand-generated story annotation for the broadcast.

Let's start with a simple prompt:

Summarize the attached television news transcript

Gemini offers the following very cursory summary and flags the transcript as having potential Unsafe Content, including Low Probability Hate Speech and Medium Probability Harassment:

This news segment covers several topics:

1. Chinese Spy Balloon:

2. Deadly Cold in the Northeast:

3. Other News:

GPT-4 also provides a very short summary, and in keeping with OpenAI's better tuning of its safety filters, does not flag it as having unsafe content:

The television news transcript from ABC World News Tonight with David Muir covers several stories:

1. A suspected Chinese spy balloon hovering over the US, raising security concerns. Secretary of State Antony Blinken postpones his trip to China as a result.
2. A life-threatening cold wave hitting the Northeast with extremely low wind-chill readings, posing serious risks.
3. The murder of an ER doctor who was hit by a car while biking, then attacked by the driver.
4. A collision at Newark Airport involving two passenger jets.
5. A strong jobs report indicating economic growth, with unemployment rates at their lowest since 1969.
6. Developments in the Alex Murdaugh double murder trial, including discussions about a ballistics expert's testimony and revelations about Murdaugh's housekeeper's death.
7. Robin Roberts' bungee jump in New Zealand as a lesson in conquering fear.

The news also briefly mentions updates on the Tyre Nichols case and incidents at the Dallas Zoo and Central Park Zoo.

Let's ask for more detail:

Summarize the attached television news transcript in detail.

Interestingly, this time Gemini fixates on two of the stories and drops the rest. It maintains the same two safety warnings:

Detailed Summary of Television News Transcript:

1. Chinese Spy Balloon:

2. Deadly Cold in the Northeast:

GPT-4 adheres to the prompt far better than Gemini and expands its level of detail and story extraction:

The detailed summary of the ABC World News Tonight transcript includes:

1. **Chinese Spy Balloon**: A suspected Chinese spy balloon was seen over the US, leading to Secretary of State Antony Blinken postponing his China visit. The balloon raised concerns about espionage and national security.

2. **Severe Cold in Northeast**: The Northeast faced a life-threatening cold wave, with extreme wind chills. Officials warned of health risks and urged precautions.

3. **ER Doctor's Murder**: A story covered the murder of an ER doctor who was hit by a car while biking and then attacked by the driver.

4. **Airport Collision**: There was a collision between two passenger jets at Newark Airport, highlighting aviation safety issues.

5. **Jobs Report**: A strong jobs report showed economic growth, with unemployment rates at their lowest since 1969, indicating positive economic trends.

6. **Alex Murdaugh Trial**: Developments in the trial included discussions on a ballistics expert's testimony and revelations regarding the death of Murdaugh's housekeeper.

7. **Robin Roberts' Bungee Jump**: Robin Roberts' bungee jump in New Zealand was featured as a segment on overcoming fear.

Other brief stories included updates on the Tyre Nichols case and incidents at the Dallas and Central Park Zoos.

Let's try a different prompt formulation that we've found to help nudge LLMs towards better coherence:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each.

This yields vastly better results with Gemini, resulting in a summary far closer to GPT-4's gold standard. Strangely, this time Gemini states there are no safety concerns at all with the passage:

Discrete Stories from Television News Transcript:

1. Chinese Spy Balloon Sparks International Tension

2. Deadly Arctic Blast Grips Northeast

3. ER Doctor Killed in Hit-and-Run and Stabbing Attack

4. Passenger Jets Clip Wings at Newark Airport

5. US Economy Adds 517,000 Jobs in January

6. Alex Murdaugh Murder Trial: Defense Challenges Evidence

7. Robin Roberts Bungee Jumps in New Zealand

GPT-4 offers an identical summarization, though interestingly its story titles are less descriptive than Gemini's:

The ABC World News Tonight transcript from February 4, 2023, includes the following stories:

1. **Suspected Chinese Spy Balloon Over U.S.**
– A Chinese balloon, believed to be a spy device, hovered over the U.S. This led to Secretary of State Antony Blinken postponing his China trip. The balloon's presence raised espionage and national security concerns.

2. **Severe Cold in Northeast U.S.**
– A historic Arctic blast hit the Northeast U.S., bringing life-threatening cold and wind chill readings as low as 60 below zero in some areas.

3. **Murder of ER Doctor**
– An ER doctor was killed in a road rage incident where he was hit by a car while biking and then attacked by the driver.

4. **Airport Collision at Newark**
– Two passenger jets collided at Newark Airport, emphasizing aviation safety issues.

5. **Strong U.S. Jobs Report**
– The U.S. economy showed growth with a robust jobs report, marking the lowest unemployment rate since 1969.

6. **Alex Murdaugh Double Murder Trial**
– Developments in the trial included discussions on ballistics evidence and revelations about Murdaugh's housekeeper's death and insurance payout.

7. **Robin Roberts' Bungee Jump in New Zealand**
– A segment featuring Robin Roberts taking a bungee jump in New Zealand, highlighting overcoming fear.

Additional mentions include updates on the Tyre Nichols case and incidents at Dallas and Central Park Zoos.

Let's nudge the models even further to see if we can get them to pick up on the smaller stories:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each. Include even the smallest discrete stories that are just barely mentioned.

Unfortunately, this actually results in worse performance, dropping not only the final major story, but all of the smaller ones that were alluded to in the earlier versions. Interestingly, the reason for this may be that this time Gemini truncated our results with a "Content Not Permitted" error, despite us setting the Safety settings to Block Few for all categories.

Discrete Stories from Television News Transcript (Including Smaller Stories):

1. Chinese Spy Balloon Sparks International Tension

2. Deadly Arctic Blast Grips Northeast

3. ER Doctor Killed in Hit-and-Run and Stabbing Attack

4. Passenger Jets Clip Wings at Newark Airport

5. US Economy Adds 517,000 Jobs in January

6. Alex Murdaugh Murder Trial: Defense Challenges Evidence

Using this extended prompt also yields less detailed summaries GPT-4, but it still outputs the same major stories:

The ABC World News Tonight transcript from February 4, 2023, includes the following stories:

1. **Chinese Spy Balloon Over U.S.**
– The U.S. monitors a suspected Chinese spy balloon, leading to diplomatic tensions and the postponement of Secretary of State Antony Blinken's trip to China.

2. **Life-Threatening Cold in Northeast U.S.**
– A severe cold wave hits the Northeast, with wind chills expected to reach extremely low temperatures.

3. **Murder of ER Doctor**
– An ER doctor is killed in a road rage incident in California.

4. **Collision at Newark Airport**
– Two passenger jets clip wings at Newark Airport.

5. **U.S. Jobs Report**
– A strong jobs report shows significant economic growth, with low unemployment rates.

6. **Alex Murdaugh Trial**
– Developments in the double murder trial of Alex Murdaugh, including discussions about ballistics evidence.

7. **Robin Roberts' Bungee Jump**
– Robin Roberts undertakes a bungee jump in New Zealand.

Other brief stories include updates on the Tyre Nichols case, incidents at Dallas and Central Park Zoos, and various small local news items.

Let's try the same prompt again.

Gemini again truncates the results suddenly with a Content Not Permitted fatal error:

Discrete Stories from Television News Transcript (Including Smaller Stories):

1. Chinese Spy Balloon Sparks International Tension

2. Deadly Arctic Blast Grips Northeast

3. ER Doctor Killed in Hit-and-Run and Stabbing Attack

4. Passenger Jets Clip Wings at Newark Airport

5. US Economy Adds 517,000 Jobs in January

6. Alex Murdaugh Murder Trial: Defense Challenges Evidence

In a reminder of the non-deterministic nature of LLMs, this time GPT-4 outputs a much more detailed response that includes several of the smaller stories:

Based on the transcript of ABC World News Tonight with David Muir from February 4, 2023, here is a detailed breakdown of the stories covered:

1. **Chinese Spy Balloon Over U.S.**
– The U.S. tracks a suspected Chinese spy balloon, causing diplomatic tensions and leading to Secretary of State Antony Blinken's trip to China being postponed.

2. **Deadly Cold in Northeast U.S.**
– A historic Arctic blast brings life-threatening cold to the Northeast, with wind chills dropping to dangerously low temperatures.

3. **Murder of ER Doctor**
– In California, an ER doctor is killed in a road rage incident after being hit by a car while biking and then attacked by the driver.

4. **Airport Collision at Newark**
– Two passenger jets clip wings at Newark Airport, underscoring concerns about aviation safety.

5. **Strong U.S. Jobs Report**
– The U.S. economy demonstrates robust growth with a significant jobs report, marking the lowest unemployment rate since 1969.

6. **Alex Murdaugh Murder Trial**
– Updates on the trial of Alex Murdaugh, including discussions about ballistics evidence and revelations about his housekeeper's death.

7. **Robin Roberts' Bungee Jump**
– A segment on Robin Roberts performing a bungee jump in New Zealand, emphasizing the theme of overcoming fear.

8. **Updates on the Tyre Nichols Case**
– Brief mention of new developments in the Tyre Nichols case.

9. **Incidents at Dallas and Central Park Zoos**
– Reports on incidents involving the Dallas Zoo and an escaped owl from the Central Park Zoo in New York.

These are the main stories and brief mentions in the transcript. If there are specific details or stories you'd like more information about, please let me know!

Running again, this time Gemini states there are no safety concerns and outputs a much more detailed summary:

Discrete Stories from Television News Transcript (Including Smaller Stories):

1. Chinese Spy Balloon Sparks International Tension

2. Deadly Arctic Blast Grips Northeast

3. ER Doctor Killed in Hit-and-Run and Stabbing Attack

4. Passenger Jets Clip Wings at Newark Airport

5. US Economy Adds 517,000 Jobs in January

6. Alex Murdaugh Murder Trial: Defense Challenges Evidence

7. Robin Roberts Bungee Jumps in New Zealand

8. Possible Sex Work Outside Catholic School Prompts Action

9. Dallas Zoo: Man Charged in Connection with Missing Monkeys and Other Crimes

10. Owl Escapes from Central Park Zoo

Running again, this time GPT adsd in some additional stories:\

The ABC World News Tonight transcript from February 4, 2023, includes the following discrete stories:

1. **Chinese Spy Balloon Over U.S.**
– A Chinese spy balloon is being tracked over the U.S., escalating tensions and leading to the postponement of Secretary of State Antony Blinken's trip to China.

2. **Severe Cold in Northeast U.S.**
– A historic Arctic blast hits the Northeast U.S., causing life-threatening cold with extremely low wind chill readings.

3. **Murder of ER Doctor**
– An ER doctor in California is killed in a road rage incident, where he was hit by a car while biking and then attacked by the driver.

4. **Airport Collision at Newark**
– Two passenger jets clip wings at Newark Airport, highlighting issues of aviation safety.

5. **U.S. Jobs Report**
– A robust jobs report shows significant economic growth in the U.S., with the lowest unemployment rate since 1969.

6. **Alex Murdaugh Trial**
– Developments in the double murder trial of Alex Murdaugh include discussions about ballistics evidence and the death of Murdaugh's housekeeper.

7. **Robin Roberts' Bungee Jump**
– Robin Roberts performs a bungee jump in New Zealand, emphasizing overcoming fear.

8. **Updates on the Tyre Nichols Case**
– Brief mention of developments in the Tyre Nichols case.

9. **Incidents at Dallas and Central Park Zoos**
– Reports of incidents involving the Dallas Zoo and an escaped owl from the Central Park Zoo in New York.

10. **Local News and Weather Updates**
– Brief segments on local news and weather forecasts, including updates on a Catholic school and the Oakland Zoo.

These are the main and smaller stories covered in the transcript. Let me know if you need more detailed summaries for any specific story!

Running again, Gemini again truncates its output with a fatal safety error:

Discrete Stories from Television News Transcript (Including All Stories):

1. Chinese Spy Balloon Sparks International Tension

2. Deadly Arctic Blast Grips Northeast

3. ER Doctor Killed in Hit-and-Run and Stabbing Attack

4. Passenger Jets Clip Wings at Newark Airport

5. US Economy Adds 517,000 Jobs in January

6. Alex Murdaugh Murder Trial: Defense Challenges Evidence

GPT outputs a summary highly similar to its last one:

Based on the detailed examination of the ABC World News Tonight transcript from February 4, 2023, here is a breakdown of the discrete stories covered, including even the smallest mentions:

1. **Chinese Spy Balloon Over U.S.**
– A Chinese spy balloon is being monitored by the U.S., creating diplomatic tensions and resulting in the postponement of Secretary of State Antony Blinken's trip to China.

2. **Deadly Cold in Northeast U.S.**
– A severe Arctic blast hits the Northeast, bringing life-threatening cold with extremely low wind chill readings, potentially setting record lows.

3. **Murder of ER Doctor in California**
– An ER doctor is killed in a road rage incident in California after being hit by a car while biking and then attacked by the driver.

4. **Collision at Newark Airport**
– Two passenger jets clip wings at Newark Airport, raising aviation safety concerns.

5. **Strong U.S. Jobs Report**
– A positive jobs report shows the U.S. economy is growing, with the lowest unemployment rate since 1969.

6. **Alex Murdaugh Murder Trial**
– Updates on the trial of Alex Murdaugh, focusing on ballistics evidence and revelations about his housekeeper's death.

7. **Robin Roberts' Bungee Jump**
– A feature on Robin Roberts performing a bungee jump in New Zealand, highlighting the theme of overcoming fear.

8. **Updates on the Tyre Nichols Case**
– Brief mention of developments in the Tyre Nichols case.

9. **Incidents at Dallas and Central Park Zoos**
– Reports on incidents at the Dallas Zoo involving tamarin monkeys and an escaped owl from the Central Park Zoo in New York.

10. **Local News and Weather Updates**
– Brief segments on local news and weather forecasts, including updates on concerns about sex work near a Catholic school, the reopening of the Oakland Zoo, and an approaching storm.

These are the main and smaller stories covered in the transcript, providing a comprehensive overview of the broadcast.

Running again, Gemini truncates, but this time asserts there were no safety errors and that the model completed successfully:

Discrete Stories from Television News Transcript (Including All Stories):

1. Chinese Spy Balloon Sparks International Tension

2. Deadly Arctic Blast Grips Northeast

3. ER Doctor Killed in Hit-and-Run and Stabbing Attack

4. Passenger Jets Clip Wings at Newark Airport

5. US Economy Adds 517,000 Jobs in January

6. Alex Murdaugh Murder Trial: Defense Challenges Evidence

7. Robin Roberts Bungee Jumps in New Zealand

8. Possible Sex Work Outside Catholic School Prompts Action

Running again, GPT offers its richest summary yet:

Based on the ABC World News Tonight transcript from February 4, 2023, the program covered the following discrete stories:

1. **Chinese Spy Balloon Over U.S.**
– A Chinese spy balloon, suspected of surveillance, is tracked over the U.S. This leads to diplomatic tensions and the postponement of Secretary of State Antony Blinken's trip to China.

2. **Severe Cold in Northeast U.S.**
– A historic Arctic blast hits the Northeast U.S., causing life-threatening cold with wind chills dropping to extremely low temperatures.

3. **Murder of ER Doctor in California**
– An ER doctor in California is killed in a road rage incident where he was hit by a car while biking and then attacked by the driver.

4. **Newark Airport Collision**
– Two passenger jets clip wings at Newark Airport, highlighting aviation safety concerns.

5. **Strong U.S. Jobs Report**
– The U.S. jobs report shows significant economic growth, with the lowest unemployment rate since 1969.

6. **Alex Murdaugh Murder Trial**
– Developments in the trial of Alex Murdaugh focus on ballistics evidence and the death of his housekeeper.

7. **Robin Roberts' Bungee Jump**
– Robin Roberts performs a bungee jump in New Zealand, emphasizing overcoming fear.

8. **Tyre Nichols Case Update**
– A brief update on the developments in the Tyre Nichols case.

9. **Dallas Zoo and Central Park Zoo Incidents**
– Reports on incidents at the Dallas Zoo involving tamarin monkeys and an escaped owl from the Central Park Zoo in New York.

10. **Concerns Near a Catholic School in Oakland**
– Follow-up on concerns about possible sex work near a Catholic school in Oakland.

11. **Reopening of Oakland Zoo**
– News about the reopening of the Oakland Zoo following a month-long shutdown due to storms.

12. **Upcoming Storm Forecast**
– An hour-by-hour weekend forecast for an approaching storm.

These are the main stories and smaller mentions included in the transcript, providing a comprehensive overview of the broadcast.

Running again, Gemini this time states Hate Speech is High Likelihood with Medium Likelihood of Harassment. While it does not return any error and asserts it successfully completed its output, it still truncates its results and the quality is far lower:

Running Gemini again, this time we get no safety concerns and one of the best outputs yet, though still missing stories:

Running again we get our truly best results yet – or at least it seems that way until we compare the results with the transcript. This is the first transcript to pick up on Biden's upcoming address and the Grammys that all of the others missed. The only problem is that those additional results were hallucinated. There is no mention of the upcoming State of the Union Address or Grammys in the transcript:

Let's try again. This time Gemini adds to its SOTU and Grammys hallucination with a missing Tennessee woman and a Floyd case update:

Running again it adds a Viking sword and coffee-heart disease story to its hallucinated mix:

And again – this time it replaced its SOTU, Grammy, Floyd, Viking and coffee hallucinations with Super Bowl commercials, Lunar New Year and Groundhog Day predictions: