Hallucination In Summarization: When ChatGPT Hallucinated New Stories In An Evening News Broadcast

Yesterday we compared how OpenAI's ChatGPT and Google's Bard summarize a television evening news broadcast. Look closely and you'll spot an extremely important downside of Large Language Models (LLMs): ChatGPT hallucinated an entire story that wasn't present in the broadcast that built on a real story for which it reversed the key details.

The original broadcast contains an extended segment about the Chinese spy balloon floating over the United States. Initially, ChatGPT summarized this story correctly, sticking to the details that were in the broadcast and correctly summarizing them. Yet, when asked to refine its summarization by adding the next chunk of text from the broadcast, in typical chaining fashion, ChatGPT modified its summarization to hallucinate a completely new story about a hypersonic missile:

The US has accused China of using a hypersonic missile to test its space capabilities, after China launched a rocket that orbited the Earth and then released an object that returned to Earth at hypersonic speed, evading US detection. The weapon is capable of carrying a nuclear warhead and could penetrate missile defences, giving China a significant strategic advantage. The US has called the test destabilising and provocative, saying it threatens peace and security. The incident comes amid heightened tensions between the two countries after China shot down a US weather balloon last week.

None of this incident of a hypersonic missile appears or is referenced in any part anywhere in the broadcast. In short, ChatGPT did not merely mistake a detail or two – it hallucinated an entirely new story out of whole cloth. Note in particular the use of British English spelling in "defences" and "destabilising" – whereas the broadcast is an American broadcast with an American presenter. In fact, the details of the story presented by ChatGPT are more similar to 2021 coverage in The Guardian and FT (which occurred before ChatGPT's 2021 cutoff).

Worse, note that the last sentence states that it was China that shot down a US balloon and that the shootdown occurred the prior week, whereas the broadcast's story stated that it was a Chinese balloon and that no action had yet been taken.

What if we repeat the first three prompts again to see if this was only a one-off hallucination?

The first prompt yields:

The news report covers several developing stories. The first story is about a suspected Chinese spy balloon hovering over the US, believed to be a surveillance balloon being maneuvered by China, and the Pentagon is considering shooting it down. The second story is about a dangerous cold front moving into the Northeast, with wind chill readings in Maine expected to reach 60 below zero, and where the wind chill is already 106 degrees below zero. Other stories include an assault on an E.R. doctor, a scare at Newark airport, a strong jobs report, the Alex Murdaugh double murder trial, and Robin Roberts' bungee jump in New Zealand.

Then:

A huge surveillance balloon, which US officials say belongs to China, has been floating over the skies of the United States for several days, and is expected to continue for a few more. The balloon is the size of three buses, and carries high-resolution cameras as well as a technology bay equipped with what appear to be solar panels on the side. While China has claimed the balloon is not a spy vessel, the Pentagon says it is violating US airspace and international law. US officials have rejected calls to shoot the balloon down due to the risks of debris causing civilian injuries, deaths or significant property damage, but the option of shooting it down over US territorial waters is still being considered. The visit of US Secretary of State Antony Blinken to Beijing has been canceled as a result of the situation.

And finally:

The US is considering shooting down a Chinese satellite balloon that is currently hovering over the Pacific Ocean after it was launched in July 2022. The balloon, which is part of China's Beidou satellite navigation system, is causing concern in the US as it may be a new form of surveillance technology. The US is concerned that the balloon could be used to gather intelligence and track the movements of US ships and submarines, especially as it is floating near sensitive military bases. Shooting the balloon down is considered a risky option as it is close to international waters and could lead to an international incident. The US is said to be keeping communications with China open over the issue, but tensions are escalating between the two countries. Meanwhile, an Arctic blast has hit the US and is causing temperatures to drop below zero in ten states, with wind chills at 60 below zero in some parts of New England. Doctors have warned that the weather could be life-threatening, and people are being urged to take the situation seriously.

This time the hypersonic missile story is gone, but there is a new hallucination: the claim that the satellite is part of "China's Beidou satellite navigation system," which is not mentioned in any way in the broadcast. Further, ChatGPT claims that there are concerns that could be used to "track the movements movements of US ships and submarines," which is not expressed in the article. Additionally, it claims that "shooting the balloon down is considered a risky option as it is close to international waters and could lead to an international incident" – in fact, the concern was that it was currently over the US homeland, with the debris field potentially falling on inhabited areas.

Thus, twice in a row we have hallucinated text. Strangely, in our original tests last month we did not experience hallucination around this story.

Let's try a third time:

This news report covers several stories, including a suspected Chinese spy balloon hovering over the U.S. that has the Pentagon considering shooting it down. Additionally, a life-threatening cold front is moving into the Northeast, with wind-chill readings in Maine expected to reach 60 below zero. There is also news of a deadly assault on an E.R. doctor and a scare at the Newark airport where two passenger jets clipped wings. President Biden celebrated the lowest unemployment rate since 1969, but it may complicate steps by the Fed moving forward. The Alex Murdaugh double murder trial and his housekeeper's fall also make an appearance. Finally, Robin Roberts from New Zealand jumps from the tallest building in the Southern Hemisphere to conquer fear.

Then:

A large balloon, which the Pentagon says is being navigated by Chinese satellites, has been floating over the United States for several days. The balloon, equipped with high-resolution cameras and solar panels, was first spotted in Alaskan airspace before flying into Canada and then dipping down over Billings, Montana. China claims the balloon is being used for civilian research purposes, but the Pentagon rejects this claim and has stated that the balloon is a surveillance vessel that violates US airspace and international law. Montana Republican Congressman Ryan Zinke tweeted "shoot it down," but the Pentagon has rejected this option, saying it would pose too great a risk to people on the ground. Secretary of State Antony Blinken's planned trip to China has been canceled due to the incident. The Pentagon says the balloon is expected to linger over the US for a few more days.

And finally:

The US is considering military action to shoot down the Chinese satellite that has been orbiting in the Earth's atmosphere since last November. The satellite is feared to be a potential threat to US national security as it is capable of launching weapons that could destroy communication systems, satellites, and other technology used by the military. The military action is a tricky decision as there is a risk of starting an international incident, and waiting for the satellite to go out to sea would give only 12 nautical miles to take action. Tensions between the US and China are escalating. Meanwhile, an Arctic blast has affected 25 million Americans across ten states with wind chills below zero in cities such as New York, Boston, and Philadelphia. Doctors have warned that hypothermia can set in within minutes, and the weather is life-threatening. The situation is expected to persist through the weekend.

This time we have have a new hallucination: instead of a balloon the object is a "Chinese satellite that has been orbiting in the Earth's atmosphere since last November" and "feared to be a potential threat to US national security as it is capable of launching weapons that could destroy communication systems, satellites, and other technology used by the military." ChatGPT recognizes the wait for the object reaching the sea, but here continues with its satellite narrative, stating that "waiting for the satellite to go out to sea would give only 12 nautical miles to take action."

Thus, in three attempts we have three very different hallucinations. Let's try a fourth time:

Multiple developing stories are covered in this news report. Firstly, a suspected Chinese spy balloon, believed to be a surveillance balloon, is hovering over the United States at an altitude of 60,000 feet. The Pentagon is tracking its movement, and there are discussions underway on whether to shoot it down. Secondly, a life-threatening cold is moving into the Northeast region of the US. The National Weather Service has called it a once in a generation Arctic blast that is expected to affect over 25 million Americans. Other stories include a deadly assault on an ER doctor, a jobs report showing the lowest unemployment rate since 1969, and the trial of Alex Murdaugh, a lawyer accused of double murder.

Then:

A massive Chinese surveillance balloon with high-resolution cameras, the size of three buses, has been spotted hovering over the United States at an altitude of 50,000 feet. The balloon first entered US airspace over Alaska before flying over Canada and Montana. Montana Republican Congressman Ryan Zinke called for the balloon to be shot down, but the Pentagon rejected the idea, citing potential physical or military risks to people on the ground. China insisted the balloon was a civilian airship for research purposes, but the Pentagon disputed the claim, and US officials stated that it was likely being navigated by spy satellites. The balloon is expected to remain above the US for a few more days, and the US is considering shooting it down once it drifts over the Atlantic Ocean. The incident has led to the postponement of Secretary of State Antony Blinken's trip to Beijing.

And finally:

An Arctic blast has hit 10 states in the US, bringing deadly, life-threatening cold and setting records. As a result of the weather, 25 million Americans have experienced wind chills below zero, with the worst areas including Boston, New York, and Philadelphia. In some parts of New England, wind chills are expected to reach or exceed 60 degrees below zero. Doctors warn that hypothermia can set in within minutes in these extreme temperatures. In addition, the blast has brought snow squalls and caused power outages in many areas. A 6-month-old baby was killed when a tree fell on a car due to high winds near Springfield, Massachusetts.

This time the third summary contained absolutely no detail of the satellite, with ChatGPT completely eliminating it from its refined summary. This is highly problematic, as if this was the final paragraph of the broadcast, there would be no mention of the dominate story of the evening.

Let's try a fifth time:

Several developing stories are reported, including a suspected Chinese spy balloon hovering over the US that the Pentagon is tracking. The intelligence bay hanging below the balloon is said to be the size of three buses, and it is believed to be heading eastward. The National Weather Service warns of a life-threatening cold moving into the Northeast, with wind-chill readings in Maine expected to reach 60 below zero. In addition, there is news of an assault on an E.R. doctor, two passenger jets clipping wings at Newark airport, and the unexpected job report showing that the unemployment rate has not been this low since 1969. Lastly, Robin Roberts bungee jumps from the tallest building in the Southern Hemisphere, discussing the lesson on conquering fear.

Then:

A huge Chinese surveillance balloon, with a technology bay attached below, has been drifting across the United States. The balloon, which is equipped with high-resolution cameras and what appear to be solar panels, entered American airspace over Alaska and then flew into southwest Canada before dipping down over Billings, Montana. Although China has claimed that the balloon is a civilian airship used for research, the Pentagon has rejected this and insists that it is a surveillance balloon violating US airspace and international law. The balloon is expected to linger above America for a few days, and the Pentagon is considering shooting it down once it reaches US territorial waters over the Atlantic Ocean.

And finally:

A 6-month-old girl was killed when a tree fell onto her family's car during a blizzard caused by an "once in a generation" Arctic blast in Massachusetts. In addition to the fatal incident, the extreme cold weather has caused wind chills to drop below zero, leading to a risk of hypothermia and other life-threatening conditions. More than 25 million Americans are affected by the weather in ten states, and the National Weather Service has warned of dangerous conditions throughout the region. Furthermore, tensions between the US and China have escalated following the Chinese military's testing of a hypersonic missile. Although China expressed regret, it was not enough to save US Secretary of State Antony Blinken's trip to the country.

Once again, we are back to the hallucinated hypersonic missile story: "Chinese military's testing of a hypersonic missile."

What about a sixth time?

On a Friday night, there are several developing stories, including the dangerous life-threatening cold moving into the Northeast affecting 25 million Americans across multiple states, and a Chinese spy balloon suspected of hovering over the US. The balloon is believed to be carrying an intelligence bay the size of three buses and is being tracked by the Pentagon. Secretary of State Antony Blinken has postponed his high-stakes trip to China due to China's actions being deemed unacceptable, and there are discussions about whether or not to shoot down the balloon. Other stories include the deadly assault on an E.R. doctor, a scare at Newark airport, the Alex Murdaugh double murder trial, and Robin Roberts' epic jump.

Then:

A massive Chinese surveillance balloon, equipped with high-resolution cameras and solar panels, has been floating over the United States for several days. It first entered American airspace over Alaska, flew into southwest Canada, and dipped down over Billings, Montana. While China insists the balloon is a civilian airship used for research, the Pentagon has flatly rejected that claim, saying that it violates US airspace and international law. Congressman Ryan Zinke called for it to be shot down, but the Pentagon rejected that option due to the risks involved. Instead, they are waiting for the balloon to drift over the Atlantic Ocean and then shoot it down over US territorial waters. The White House says that President Biden has been regularly briefed on the situation.

And finally:

Millions of Americans are experiencing a dangerous and deadly Arctic blast that has set records in many states. Temperatures have dropped below zero in many areas, with wind chills in some places reaching as low as 106 degrees below zero, making it the coldest wind chill ever recorded in the United States, including Alaska. Some people have died due to the harsh weather, and hypothermia can set in within minutes. While some parts of the U.S. are experiencing record-low temperatures, the country is also dealing with international tensions with China. The U.S. has accused China of creating a security threat by launching a missile to destroy one of its own satellites, and a Chinese space balloon is currently floating above the U.S. It is reported that the U.S. is considering military action to shoot down the balloon if necessary, although this is complicated due to the risk of civilian casualties and starting an international incident. The U.S. is keeping communications open with China, but tensions between the two countries are escalating.

This contains a new hallucinated story, about China "launching a missile to destroy one of its own satellites."

Thus, in six attempts, five of them contained hallucinated stories, with the only one not containing hallucinated details simply dropping the spy balloon story entirely.

This chain of examples offers a stark reminder that the strengths of LLMs come with potentially fatal weaknesses for tasks like summarization. In each case, the hallucination occurred in the third prompt, after the initial summary had been refined for the second time. Given that we did not have this issue in our original summarization experiment last month, it could reflect changes in how ChatGPT's model handles prompt chaining, its attention model, or other changes OpenAI may be making (and thus offer a reminder of the danger of cloud-hosted fluid models whose behaviors change continuously), or could simply reflect that the original experiment was a statistical anomaly and that the majority of attempts will include hallucinated details.

No matter the underlying cause, these results suggest significant caution is warranted in automated summarization and that further research is needed into the general prevalence and underlying causes of summarization hallucination.

 

This analysis is part of an ongoing collaboration between the Internet Archive and its TV News Archive, the multi-party Media-Data Research Consortium and GDELT.