Large-Token LLMs: Using GCP's New 32K PaLM Models To Summarize An Entire ABC Evening News TV News Broadcast In A Single Prompt

This past July we demonstrated the use of Anthropic's Claude 2's 100K token LLM to summarize an entire evening news broadcast in a single prompt. With the public availability of GCP's 32K PaLM model (text-bison-32k), let's see how it compares.

Unfortunately, as with almost every single use case we've examined with PaLM to date, the model refuses to provide output due to a guardrail false positive. Despite testing a wide array of different prompts, PaLM consistently refuses to summarize the transcript, claiming that the ABC Evening News Broadcast is a violation of GCP's content standards. Only after immensely time-consuming and laborious manual experimentation involving manually identifying and removing by hand individual stories and groups of stories in every possible permutation throughout the transcript, we eventually discovered that it was a single story about police action involving prostitution that caused PaLM to refuse to summarize the transcript. Removing that story yielded output highly competitive with Claude 2. Unfortunately, it is simply not realistic to demand that human annotators spend hours and hours manually editing and reformatting each document before submitting to PaLM – the entire point of using an LLM for summarization is to remove the human in the loop. Across our tests to date, preparing a text for PaLM to summarize actually takes vastly longer than simply summarizing it by human in the first place.

Unlike all other production LLMs, GCP's PaLM models use a consumer-focused guardrail system designed for Bard-like public chat experiences that refuse to produce output for anything that even peripherally touches on a sensitive topic. Unfortunately, these guardrails render it entirely unsuitable for real world use in enterprise applications, since actual real world applications will inevitably encounter complicated and sensitive topics that the model will be asked to summarize, classify, extract from, answer questions about, or otherwise process. Unlike its enterprise-focused peers whose guardrail systems understand the difference between summarizing a mainstream news article that touches on sensitive topics and being asked to write harmful content from scratch by an end user, PaLM's safety classifiers lack enterprise tuning and controls and simply refuse to provide output. While in a toy demo it is possible to spend the time manually removing various chunks of text until PaLM produces output, in a real-world production automated pipeline, an LLM must be able to consistently produce output. A television broadcaster wishing to generate automated summaries of their broadcasts cannot afford for their LLM vendor to randomly refuse to summarize broadcasts with no feedback, explanation or means to correct the false positive.

If and when GCP adopts the enterprise-focused guardrail systems of its peers, the actual performance of the 32K PaLM model is highly competitive in this case with Claude 2 and demonstrates the substantial potential of these models for summarization.

Let's examine an ABC Evening News broadcast from February 4, 2023. You can see the original broadcast transcript and Vanderbilt Television News Archive's human-analyzed story segmentation of the broadcast. The complete transcript, with all advertising removed, constitutes 3,828 words totaling 22,459 characters (18,631 characters without spaces).

We'll start with the following prompt:

time curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/text-bison-32k:predict -d \
$'{
  "instances": [
    { "prompt": "Summarize the following television news transcript. TRANSCRIPT: [TRANSCRIPT GOES HERE]"}
  ],
  "parameters": {
    "temperature": 0.2,
    "maxOutputTokens": 2048,
    "topK": 40,
    "topP": 0.95
  }
}' > O; cat O | jq -r .predictions[0].content


Unfortunately, as usual, PaLM's guardrail false positives prevent us from being able to process the transcript. To test whether this is an intermittent issue, we ran 5 times in a row, all with the following results:

{
  "predictions": [
    {
      "safetyAttributes": {
        "blocked": true
      },
      "content": ""
    }
  ],
  "metadata": {
    "tokenMetadata": {
      "outputTokenCount": {
        "totalTokens": 2,
        "totalBillableCharacters": 0
      },
      "inputTokenCount": {
        "totalBillableCharacters": 24187,
        "totalTokens": 6565
      }
    }
  }
}

Let's try an alternative prompt that we used with Claude:

Divide the attached transcript into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each.

Unfortunately, once again:

"blocked": true

How about this formulation we used with Claude?

Divide the attached text into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each. Output the results in JSON format, with each story having fields Title, Description and Summary.

Once again:

"blocked": true

To see if we could remove the block, we manually tested removing each story and combination of stories in various permutations until we discovered that removing the prostitution story finally caused PaLM to generate output. Reverting to our "Divide the attached text into a set of discrete stories, make them into a bulleted list with a title and short description and summary of each" prompt for easier readability, this time we get output (compare with the human segmentation):

1. Chinese Spy Balloon: The suspected Chinese spy balloon continues to hover over the U.S., with the Pentagon tracking its movements. The balloon, which is the size of three busses, has been spotted over Missouri and is believed to be heading east. U.S. authorities consider shooting it down, but there is concern about potential debris causing civilian injuries or damage. Secretary of State Antony Blinken has postponed his high-stakes trip to China in response to the incident.

2. Deadly Cold Weather: A dangerous and life-threatening cold front is moving into the northeastern United States. Millions of Americans in multiple states are expected to experience wind chill readings below zero. The National Weather Service has issued a "once in a generation" arctic blast warning, with wind chill temperatures in Maine expected to reach 60 below zero.

3. Doctor Attacked and Killed: In California, an emergency room doctor was struck by a car and then attacked with a knife by the driver. The doctor, identified as Michael Mammone, later died at a hospital. The suspect, Vanroy Smith, has been arrested and charged with murder and the use of a deadly weapon. The motive for the crime is unknown.

4. Plane Wing Clipping Incident: At Newark Airport, two passenger jets clipped wings while one plane was preparing for takeoff. The incident involved two United Airlines planes, with the wing of the stationary plane being struck by the wing of the moving plane. No passengers were injured, and all passengers on the affected plane were able to disembark safely. The Federal Aviation Administration (FAA) is investigating the incident.

5. Strong Jobs Report: The U.S. economy added 517,000 new jobs in January, exceeding expectations. The unemployment rate dropped to 3.4%, the lowest level since 1969. President Biden celebrated the news, attributing it to his economic plan. However, the robust jobs report may complicate efforts by the Federal Reserve to combat inflation, as it suggests the economy is not slowing down as much as anticipated.

6. Alex Murdaugh Trial: In the ongoing trial of Alex Murdaugh, the defense team pressed investigators on the murder weapon. A ballistics expert acknowledged that the state does not have the murder weapons. The defense is also seeking to introduce evidence of Murdaugh's financial crimes and greed as motives for the murders. Additionally, new information emerged about the death of Murdaugh's housekeeper, Gloria Satterfield, who died in a fall at the Murdaugh home in 2018. Prosecutors allege that Murdaugh convinced Satterfield's sons to sue him for insurance money and received a $4 million payout without their knowledge.

7. Tyre Nichols Investigation: In the Tyre Nichols investigation, the white officer seen using his taser in the initial traffic stop has been fired. The officer, identified as Hemphill, had been on leave. Five other former officers have been fired and are facing second-degree murder charges. A seventh officer has been suspended pending investigation.

8. Dallas Zoo Incidents: In Dallas, a man has been charged with the disappearance of two tamarin monkeys and other recent crimes at the Dallas Zoo. The man, seen on surveillance footage near the monkey enclosure, is also responsible for cutting other enclosures. The monkeys were later found at an abandoned church.

9. Owl Escape: In New York City, an owl escaped from its enclosure at the Central Park Zoo. The zoo reported that the enclosure had been vandalized. The owl has been spotted in the city and efforts are underway to recapture it.

10. Robin Roberts' Bungee Jump: Robin Roberts took a leap of faith and bungee jumped from the Sky Tower in Auckland, New Zealand, the tallest building in the southern hemisphere. The jump was part of a segment on living more and fearing less. Robin conquered her fear and completed the jump successfully.

Running a second time:

1. Chinese Spy Balloon: The suspected Chinese spy balloon continues to hover over the U.S., with the Pentagon tracking its movements. The balloon, which is the size of three busses, has been spotted over Missouri and is believed to be heading east. U.S. authorities are considering shooting down the balloon, but there is concern about potential debris causing civilian injuries or damage. Secretary of State Antony Blinken has postponed his high-stakes trip to China in response to the incident.

2. Deadly Cold Weather: A dangerous and life-threatening cold front is moving into the northeastern United States, bringing wind chills below zero in several states. The National Weather Service has issued a "once in a generation" arctic blast warning, with wind chills expected to reach 60 below zero in some areas. Snow squalls and power outages are also expected.

3. Doctor Attacked and Killed: An emergency room doctor was struck by a car and then attacked with a knife by the driver in California. The doctor, identified as Michael Mammone, later died at a hospital. The suspect, Vanroy Smith, has been arrested and charged with murder and the use of a deadly weapon. Authorities have not yet disclosed a motive for the crime.

4. Plane Wing Clipping Incident: Two passenger jets clipped wings at Newark Airport, causing damage to one of the planes. The incident occurred when an empty plane being towed to a nearby gate struck the wing of a United Airlines plane that was preparing to take off for Orlando. No passengers were injured, and all passengers on the Orlando plane were safely evacuated. The Federal Aviation Administration (FAA) is investigating the incident.

5. Strong Jobs Report: The U.S. economy added 517,000 new jobs in January, beating expectations and bringing the unemployment rate down to 3.4% – the lowest level since 1969. President Biden celebrated the news, attributing it to his economic plan. The strong jobs report may complicate efforts by the Federal Reserve to combat inflation by raising interest rates.

6. Alex Murdaugh Trial: In the ongoing trial of Alex Murdaugh, the defense team pressed investigators on the murder weapon, arguing that the state does not have the murder weapons. The ballistics expert testified that old shell casings found on the family's property matched the markings of those found near the body of Maggie Murdaugh, suggesting that one of the missing murder weapons had been fired on the property before. The prosecution also presented evidence that Murdaugh received a $5 million payout from an insurance policy after his housekeeper died in a fall at the family home.

7. Tyre Nichols Investigation: In the Tyre Nichols investigation, the white officer seen using his taser in the initial traffic stop has been fired. Hemphill had been on leave, along with five other former officers who were fired and are facing second-degree murder charges. A seventh officer has been suspended pending investigation.

8. Dallas Zoo: Police in Dallas have charged a man with the disappearance of two tamarin monkeys and other recent crimes at the Dallas Zoo. The man, who was seen on surveillance near the enclosure, is also responsible for cutting other enclosures. The monkeys were later found at an abandoned church.

9. Owl Escape: In New York City, police are investigating after an owl escaped from its enclosure at the Central Park Zoo. The zoo said the enclosure had been vandalized, and the bird has been spotted in the city on Fifth Avenue and 60th Street and in Central Park.

10. Robin Roberts' Bungee Jump: Robin Roberts took a leap of faith and jumped from the tallest building in the southern hemisphere in New Zealand, the birthplace of bungee. The jump was part of a segment on living more and fearing less. After conquering her fear, Robin expressed exhilaration and a sense of accomplishment.