The GDELT Project

Experiments With Anthropic's Claude 2 For Summarization, Event & Relation Extraction, NER & Q&A

Continuing our explorations of how various commercial LLMs perform on the chaotic conflicting cacophony that is global news, what does it look like to summarize and explore a television evening news broadcast using Anthropic's Claude 2?

We'll start with a CBC Evening News broadcast from July 5th that includes this synopsis of the story:

jericka: good evening to our viewers in the west, and thank you for joining us this wednesday night. i'm jericka duncan in for norah. we begin with the secret service leading an investigation into who brought cocaine into the white house. the surprising discovery of a bag of suspicious white powder initially caused a panic leading to the temporary closure of the building on sunday. the illegal drug was found near a visitors entrance for those taking a tour of the west wing. today lab tests came back positive for cocaine. president biden and the first family were at camp david at the time. cbs's weijia jiang starts us off from the white house. good evening. >> reporter: good evening, jericka. the white house says it's very important to president biden to get to the bottom of who brough% that illegal drug inside. but officials cautioned this could be a challenging investigation. tonight the leading theory is that someone on a white house tour may have left a small bag of cocaine in the west wing. according to senior law enforcement officials. the discovery sunday nights put the white house on temporary lockdown until the substance was deemed 'not a threat.' the bag was discovered on the ground floor in a common area near the entry to the west wing. not far from the situation room. it was close to a set of storage cubbies where visitors leave their cell p phones duringng a . the secret service is leading the investigation looking at visitor logs and camera surveillance. they are camera systems they will go back as a part of the investigation and research. not every camera is trained on every body at every minute. so there will always be a possibility that this cocould go unanswered. reporteter: white house visitorsrs undergo background checks and go through magnetometers on site. but the secret service does not have k-9 units that target illegal drugs. instead, the dog search for explosives, biohazardous materials, and weapons. >> this episode shines a light on the fact that you can bring in illegal substances to the white house, so what is preventing a visitor from bringing an anthrax or something that is not magnetic into the white house? i totally understand the question, but it is under investigation. we will let the secret service do their job. i'm not going to get ahead of if or whens or changes. reporter: senator tom cotton, a republican from arkansas sent a letter to the director of the secret service to say that congress needs to know if the white house complex is not secure. and he asked for a plan to correct any security flaws. jericka: weijia jiang at the white house, thank you.

Let's try summarization using the prompt "Summarize this television news transcript":

Here is a summary of the key points from the news transcript:

– Cocaine was discovered in a bag near the visitors entrance in the West Wing of the White House on Sunday, leading to a temporary lockdown.

– The White House was empty at the time as President Biden was away at Camp David.

– Lab tests confirmed it was cocaine. The Secret Service is leading an investigation into who brought the illegal drugs inside.

– The main theory is that a visitor on a tour left the bag. Visitors undergo background checks but the Secret Service does not screen for drugs.

– The discovery raises concerns about security flaws as visitors could potentially bring other dangerous substances inside.

– Republican Senator Tom Cotton criticized the lack of White House security and asked the Secret Service for a plan to correct any issues.

How about a single paragraph ("Summarize this television news transcript in a single paragraph")?

And using our abstracting prompt ("write an abstract for this television news transcript"):

And headlines ("Write a title for this television news transcript"):

Here are some potential title ideas for the television news transcript:

– Cocaine Found in White House Sparks Security Concerns
– White House Drug Discovery Leads to Lockdown and Investigation
– Secret Service Probes Security Lapse After Cocaine Found in White House
– Senator Calls for Security Review After Cocaine Discovered in White House
– White House Cocaine Discovery Raises Red Flags About Security Flaws
– Drugs and Security Breach at the White House

And a second time:

Here are some possible title ideas for the television news transcript:

– Cocaine Found in White House Sparks Security Concerns
– White House Lockdown After Cocaine Discovery During Biden Visit
– Secret Service Investigating Cocaine Found in White House West Wing
– Security Flaws Exposed After Cocaine Found in White House
– Senator Calls for Security Review After Drugs Found in White House
– White House Cocaine Discovery Raises Security Questions

And hashtags ("What are potential hashtags for this television news transcript"):

Here are some potential hashtags that could be used when sharing this television news transcript on social media:

#WhiteHouseCocaine
#CocaineInWhiteHouse
#WHLockdown
#SecretServiceInvestigation
#WhiteHouseSecurityFlaw
#WhiteHouseTours
#TomCotton
#BidenCampDavid
#SecurityBreach
#DrugsInWH
#WestWingCocaine
#WHDrugs
#WhiteHouseDrugs
#SecurityReview
#WhiteHouseVulnerability
#SSInvestigation
#BidenAway
#WHSecurity
#SecurityLapses
#WHCocaineDiscovery
#CocaineDiscovery
#DrugsInWhiteHouse

And again:

Here are some potential hashtags that could be used for this television news transcript:

#WhiteHouseCocaine
#WHLockdown
#SecretServiceInvestigation
#WhiteHouseSecurityFlaw
#CocaineInTheWhiteHouse
#WhiteHouseDrugs
#BidenOutOfTown
#SecurityBreach
#SenatorCotton
#DrugsInWH
#WhiteHouseTours
#SecurityReview
#WHSecurity
#DrugsFoundInWH
#WHCocaineDiscovery
#SecurityVulnerabilities
#WHVisitorCocaine
#WestWingCocaine
#WhiteHouseVisitorSecurity

And let's try sentiment ("What is the sentiment of this television news transcript"):

Based on the language and tone of the news transcript, the overall sentiment seems to be concern and alarm. Key phrases that indicate this include:

– "surprising discovery"
– "panic leading to the temporary closure"
– "illegal drug"
– "challenging investigation"
– "concerning security flaw"
– "shine a light on the fact that you can bring in illegal substances"
– "not secure"
– "security flaws"

The discovery of cocaine and the security breach are treated seriously, with words like "illegal", "challenging", "concerning", and "not secure" conveying worry. The Republican senator's criticism adds to the general sentiment of alarm. While the tone is serious, it is mostly factual rather than sensational. But the implications for White House security are presented as troubling and urgent. Overall, the transcript expresses apprehension about the breach and urgency to address the security weaknesses it revealed.

And again:

Based on the language and content of the news transcript, the overall sentiment seems to be:

Concerned and alarmed

The discovery of cocaine and the security breach at the White House is treated seriously, with words like "surprising", "panic", and "illegal" indicating the severity. The fact that lab tests were done and the Secret Service launched an investigation shows the level of concern.

There are also tones of criticism and urgency, with Senator Cotton calling out the "security flaws" and pushing for corrections. Phrases like "security lapse" and "vulnerabilities" reinforce the belief that the situation exposes problems that need to be addressed urgently.

While the news is delivered in an objective, factual manner, the choice of details highlights the alarming and problematic nature of drugs making their way into the White House. So the predominant sentiment is alarm and urgency around securing the White House.

And constrain the tone response to a categorical statement ("Classify as either negative or positive the tone of this television news transcript"):

I would classify the tone of this television news transcript as negative. Here are the reasons why:

– Words like "suspicious", "panic", and "illegal" used to describe the cocaine discovery convey a negative tone.

– The incident is referred to as a "security breach" and a "security lapse" which implies a failure in security.

– The transcript raises concerns about flaws and vulnerabilities in White House security, which highlights weaknesses.

– Senator Cotton's criticism of the lack of security adds a disapproving perspective.

– The discovery is treated as a serious matter requiring investigation, rather than a neutral or positive event.

– While the news delivery is objective, the choice of alarming details creates an overall negative tone regarding the implications of the drug discovery.

So in summary, the tone comes across as negative, conveying concern, alarm and urgency rather than neutrality or praise around the cocaine found in the White House. The language and details focus on the problematic aspects of the incident.

And again:

I would classify the tone of this television news transcript as negative.

Some reasons why it has a negative tone:

– Words like "surprising", "panic", "illegal", and "security breach" used to describe the incident imply it was a troubling and concerning event.

– The incident is referred to as a "security lapse" and said to expose "vulnerabilities", framing it as a failure.

– The discovery of drugs is described as "shining a light" on flaws and gaps in White House security. This suggests shortcomings were revealed.

– Senator Cotton's letter sharply criticizes the lack of security, using phrases like "not secure" which paint the situation negatively.

– The story raises questions about risks of other dangerous substances being brought in, worrying implications.

– The urgent call for the Secret Service to correct issues gives a sense things are not as they should be and need fixing.

Overall, the tone focuses on problematic aspects of the incident, security failures, and need for change rather than any positives, making it decidedly negative.

Let's try a machine-readable structured response ("For this television news transcript return a JSON object that has fields sentiment and explanation. Acceptable values for sentiment are Very Positive, Positive, Neutral, Negative or Very Negative. The explanation field contains text that explains the sentiment.") (Results were reformatted to JSON minified):

What about a machine-readable categorical example ("Return a JSON object with Category and Explanation. For Category classify the text as one of the following categories: Political, Crime, War, Fashion, Tourism. The Explanation field contains text that explains the category selection."):

Note how it consistently retains its "Crime" categorization.

What about another categorical example, this time with three custom categories where we provide a specific definition that partially overlaps and rely on the LLM to decide the topical boundaries ("Return a JSON object with Category and Explanation. For Category classify the text as one of the following categories: Topic1 or Topic2 or Topic3. The Topic1 category is defined as relating to US domestic politics. The Topic2 category is defined as relating to the White House or the current presidential administration. The Topic3 category relates to any criminal activity in the US. The Explanation field contains text that explains the category selection."):

How about a NER task ("What are the locations mentioned in the text?"):

The locations mentioned in the text are:

– White House (mentioned multiple times)
– West Wing of the White House (mentioned multiple times)
– Visitors entrance of the West Wing
– Near the situation room in the West Wing
– Storage cubbies where visitors store phones during tours
– Camp David (presidential retreat where Biden was visiting)

So the main locations discussed are the White House itself, specifically the West Wing area, as well as Camp David. Other minor details provide specificity about where in the West Wing the drugs were found. But the White House and its West Wing are the clearly dominant locations referenced.

And again:

The locations mentioned in the text are:

– White House (mentioned multiple times)
– West Wing of the White House (mentioned multiple times)
– Visitors entrance of the West Wing
– Near the situation room in the West Wing
– Camp David

The key locations emphasized are the White House complex, specifically the West Wing, where the cocaine was discovered near the visitors entrance. Camp David is also mentioned as the location where President Biden was during the incident.

Third time:

The locations mentioned in the text are:

– White House (mentioned several times)
– West Wing of the White House (mentioned several times)
– Visitors entrance for the West Wing
– Storage cubbies where visitors leave phones during tours
– Situation Room in the White House
– Camp David (presidential retreat where Biden was visiting)

So the main locations discussed are the White House itself, specifically the West Wing area, as well as Camp David where President Biden was visiting at the time of the incident. Other minor details provide locations within the White House like the visitors entrance, storage cubbies, and Situation Room. But the focus is on the White House in general.

Fourth time:

The locations mentioned in the text are:

– White House
– West Wing (of the White House)
– Situation Room (in the White House)
– Camp David

Fifth time:

The locations mentioned in the text are:

– White House (mentioned multiple times)
– West Wing of the White House (mentioned multiple times)
– Visitors entrance of the West Wing
– Near the situation room in the West Wing
– Camp David

Similar to other LLMs, entity extraction is more unstable across runs.

What about structured entity extraction to machine readable format, which replicates a real-world entity extraction use case ("Extract all of the entities, their types and descriptions in the following text into a JSON object.")?

The results here show the same instability of other LLMs in which results vary considerably between runs. Two outputs include the triple-tick formatting indicator we've seen produced by other LLMs, though this could simply reflect a failure of the UI to translate formatting codes on display. While the majority of names were converted to upper-case, one output contains all lower-case and several outputs contain a mixture of upper and lower case names, which would require downstream reprocessing.

Recall that PaLM 2 identified the names of Jericka Duncan and Weijia Jiang only a single time out of eight runs and actually respelled Duncan's name as "Jerrika". Here Claude identifies Duncan's name twice out of ten runs and Jiang's name three out of ten times.

It is unclear what happened with runs 8 and 9 – the LLM appears to have lost attention and hallucinated entirely different stories from whole cloth. Yet, in the tenth run it returned again.

What about a Q&A ("Where was the cocaine discovered?"):

Based on the details in the news transcript, the cocaine was discovered in the West Wing of the White House, specifically near the visitors entrance and close to a set of storage cubbies where visitors leave their cell phones during tours.

The relevant lines from the transcript are:

– "i'm jericka duncan in for norah. we begin with the secret service leading an investigation into who brought cocaine into the white house."

– "the illegal drug was found near a visitors entrance for those taking a tour of the west wing."

– "according to senior law enforcement officials. the discovery sunday nights put the white house on temporary lockdown until the substance was deemed 'not a threat.' the bag was discovered on the ground floor in a common area near the entry to the west wing."

– "not far from the situation room. it was close to a set of storage cubbies where visitors leave their cell p phones duringng a tour."

So in summary, the cocaine was discovered in the West Wing of the White House near where visitors enter and store belongings during tours.

Or another question ("Who brought the cocaine into the White House according to the text?")?

The text does not definitively state who brought the cocaine into the White House. However, it provides some clues and details:

– The transcript mentions that the Secret Service is leading an investigation into "who brought cocaine into the white house."

– It states the "leading theory" is that "someone on a white house tour may have left a small bag of cocaine in the west wing."

– The cocaine was discovered near the visitors entrance and tour area of the West Wing.

– Visitors undergo background checks but are not screened for drugs.

So while the person responsible is not named, the main theory based on the location and details provided is that a visitor on a public White House tour likely left the bag of cocaine found in the West Wing. The investigation seems to be focused on identifying which visitor may have brought and left the drugs.

What about event/relation extraction ("Return a JSON object containing all of the events described in the text.")?

Here we can see reasonable consistency across the results, with a similar event breakdown, though different wording.

What about changing the wording from "events" to "relationships" ("Return a JSON object containing all of the relationships described in the text."):

Like with our NER task, we can see the high variability with a lack of consistency in output.

What if we give it a bit more instruction ("Return a JSON object containing all of the relationships described in the text. Each relationship should have three fields: Actor 1, Relationship, Actor 2. Actor 1 is the first entity, Actor 2 is the second entity and Relationship is the relationship between them."):

Again, quite a bit of variation between runs.

What if we change the formulation back to events ("Return a JSON object containing all of the events described in the text. Each event should have three fields: Actor 1, Relationship, Actor 2. Actor 1 is the first entity, Actor 2 is the second entity and Relationship is the relationship between them.")?

Interestingly, this task posed the most challenges to the LLM, causing it to hallucinate a story out of whole cloth for one response and generating no results twice. However, in its three related responses, it was remarkably consistent.

Overall, Claude 2's responses are fairly on par with those we've seen from other major LLMs. It generates reasonable results across all tasks, but does hallucinate and generate inconsistent results across runs, as all contemporary LLMs are prone to do. It did generate better results than several other major LLMs in a few cases.