Yesterday we demonstrated how when using LLMs for entity extraction, the addition of a single apostrophe to a source text and the specific ordering of names in sentences can lead to major changes in the output, including triggering hallucination. Today we'll explore an even more challenging aspect of LLM entity extraction: infinite loops and failure modes. In short, two texts, differing by just a few words, can mean the difference between a reasonable output and an LLM entering an infinite loop and outputting a potentially infinite string of billable tokens or entering a failure mode in which the requested output format becomes unparseable and the prompt instructions are violated. Interestingly, there has been a dearth of work on LLMs entering infinite output loops in which they infinitely repeat the same sequence as their output up to the maximum output token cap. As LLM output caps continue to increase across the board, it is unclear whether such failure states might result in potentially truly infinite output that could yield a million-dollar bill from a single query. Interestingly, the passage below also causes problems with other LLMs, including leading to hallucination in summarization tasks with ChatGPT, suggesting that certain texts may unpredictably trigger latent hidden vulnerabilities in multiple LLM models. Regardless, the results below suggest that current generation LLMs are not yet at the point where they can be reliably and robustly used for highly structured codified extraction tasks like entity extraction and instead are better suited for more free-form tasks like ideation and distillation.
Let's take the first five minutes of this ABC Evening News broadcast and analyze it using PaLM 2 Text Bison using the recommended parameters for extraction tasks:
time curl \ -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/text-bison:predict -d \ $'{ "instances": [ { "prompt": "Extract a list of the people, organizations, locations and other entities in the following text, disambiguating and normalizing each name to the title of its corresponding Wikipedia article. Classify the Type of each as one of the following: \\"Organization,Person,Location,Event,Other\\". Output the results in JSON in the format \\"{name, wikipedia url, type}\\" : tonight, several developing stories as we come on the air. news coming in, the suspected chinese spy balloon hovering over the u.s. where it is now, and will there be a window to shoot it down? also tonight, the dangerous life-threatening cold moving into the northeast. the live readings already. first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it\'s believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. the deadly cold already tonight, the national weather service is calling it a once in a generation arctic blast. 25 million americans, multiple states. new york city, philadelphia, boston, wind-chill readings in maine expected to reach 60 below zero. and where the wind chill is already 106 degrees below zero tonight. victor oquendo in massachusetts. rob marciano timing this out. the deadly assault on an e.r. doctor riding his mountain bike. police say the suspect in his car hitting the victim, then allegedly getting out and attacking him. matt gutman reporting. another scare at the airport. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. the unemployment rate not this low since 1969. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh\'s housekeeper dying in a fall at the home. what the family says they never knew. prosecutors say murdaugh got a $5 million payout from a policy that covered the housekeeper\'s fall. but will the jury hear about it? eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. you\'ll see her jump from the tallest building in the southern hemisphere. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that\'s in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites. today the balloon was spotted over missouri. so where is it tonight, and where u.s. authorities believe it\'s headed. among the first sightings over billings, montana. that intelligence bay hanging beneath the balloon, that\'s the size of three busses. the pentagon tonight on discussions over whether or not to shoot it down, and will there be a small window of opportunity to do just that? what we\'re learning tonight. secretary of state antony blinken postponing his high-stakes trip to beijing, calling china\'s actions unacceptable. so, will this be shot down? if so, where? and what china is saying tonight. mary bruce at the white house, and abc\'s mola lenghi leading us off tonight in montana. reporter: tonight, the pentagon confirming the massive chinese spy balloon is on the move 60,000 feet above the ground and heading east. the balloon continues t move eastward and is currently over the center of the continental united states. what the heck is that? reporter: a senior u.s. official tells abc news the balloon now appears headed towards north carolina. across the country today, americans with their eyes on the skies, posting images like this one, the balloon floating over missouri. commercial pilots radioing in. we got that balloon in sight also. looks like it\'s way up there, maybe 50,000 feet or so. reporter: the balloon is huge with a technology bay attached below that is itself the size of three busses, loaded with high resolution cameras, according to a senior u.s. official, equipped with what appear to be solar panels on the side that could power its technology. it first entered american airspace over alaska, then flying into southwest canada before dipping down over billings, montana. this thing is up in the sky. what the heck is that? that thing is not the moon. any help would be appreciated. reporter: montana republican congressman ryan zinke was blunt, tweeting \\"shoot. it. down.\\" the pentagon today sayig that option was considered and rejected, for now. we assessed that currently it does not pose a physical or military risk to people on the ground. for now we are continuing to monitor and review options. reporter: the white house today saying president biden agreed with the pentagon\'s strong recommendation. the risks involved with shooting down the balloon just too great. any potential debris field would be significant and potentially cause civilian injuries or deaths or significant property damage. reporter: for its part, china insisting today the balloon is not a spy vessel at all but a civilian airship used for research, mainly meteorological purposes. the pentagon flatly rejecting that claim. we are aware of the prc\'s statement. however, the fact is, we know that it\'s a surveillance pballo. we have clearly communicated that this balloon is violating u.s. air space and international law and that this is unacceptable. reporter: china says the balloon deviated far from its planned course, but the pentagon questioning that as well, saying the balloon can be maneuvered. a senior u.s. official telling abc news, it\'s likely being navigated with spy satellites. this all happening as secretary of state antony blinken prepared for a weekend trip to beijing. "} ], "parameters": { "temperature": 0.0, "maxOutputTokens": 1024, "topK": 1, "topP": 0.0 } }'
The output is truncated and results in invalid JSON that is not parseable. It appears that PaLM 2 entered an infinite loop repeating the same object (Chinese => Chinese_people) repeatedly. It is unclear why hitting the upper token bound would cause this infinite loop, as in our experiments yesterday, hitting the output cap merely resulted in truncated output that was still reasonable.
[{"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/China", "type": "Country"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_culture", "type": "Culture"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_people", "type": "Ethnic group"}, {"name": "Chinese", "wikipedia_url": "https://en.wikipedia.org/wiki/Chinese_language", "type": "Language"}, {"name": "Chinese
Indeed, when we check the statistics we find that the 5,281 input characters yielded 1,406 input tokens, maxing out our 1,024 output tokens. Over time it is likely that these output limitations will increase, but for the time being this means that we have to chunk the text into smaller blocks to analyze:
"metadata": { "tokenMetadata": { "inputTokenCount": { "totalTokens": 1406, "totalBillableCharacters": 5281 }, "outputTokenCount": { "totalBillableCharacters": 3057, "totalTokens": 1024 } } }
What if we shrink the input text by 2.5x to just 2111 characters totaling 564 input tokens? Despite cutting our input by half to just 362 total words, we still get infinite looping, though this time it does include some additional entities at the top.
[{"name": "Chinese spy balloon", "wikipedia_url": "Chinese spy balloon", "type": "Other"}, {"name": "United States", "wikipedia_url": "United States", "type": "Country"}, {"name": "Newark", "wikipedia_url": "Newark, New Jersey", "type": "Location"}, {"name": "President Biden", "wikipedia_url": "Joe Biden", "type": "Person"}, {"name": "Alex Murdaugh", "wikipedia_url": "Alex Murdaugh", "type": "Person"}, {"name": "Robin Roberts", "wikipedia_url": "Robin Roberts", "type": "Person"}, {"name": "New Zealand", "wikipedia_url": "New Zealand", "type": "Country"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese government", "wikipedia_url": "Government of the People's Republic of China", "type": "Government"}, {"name": "Chinese military", "wikipedia_url": "People's Liberation Army", "type": "Military"}, {"name": "Chinese Communist Party", "wikipedia_url": "Chinese Communist Party", "type": "Political party"}, {"name": "Chinese people", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "E
We can see below that 564 input tokens is yielding 1024 output tokens. While the actual number of entities in the source text is not extraordinary, the JSON wrapper that makes the results machine readable adds significantly to the output text.
"metadata": { "tokenMetadata": { "outputTokenCount": { "totalTokens": 1024, "totalBillableCharacters": 3017 }, "inputTokenCount": { "totalBillableCharacters": 2111, "totalTokens": 564 } } }
Recall that this is the input text we're using right now:
tonight, several developing stories as we come on the air. news coming in, the suspected chinese spy balloon hovering over the u.s. where it is now, and will there be a window to shoot it down? also tonight, the dangerous life-threatening cold moving into the northeast. the live readings already. first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it's believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. another scare at the airport. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. the unemployment rate not this low since 1969. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh's housekeeper dying in a fall at the home. what the family says they never knew. prosecutors say murdaugh got a $5 million payout from a policy that covered the housekeeper's fall. but will the jury hear about it? eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. you'll see her jump from the tallest building in the southern hemisphere. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that's in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites.
That this tiny amount of text exceeds the total output capabilities of PaLM 2 suggests that end user applications will have to break input text into extremely small chunks for processing and lose the contextual disambiguation that becomes possible when examining the entire text. At the same time, this also doesn't make sense, since when we hit the output cap yesterday with our sample texts, we did not observe such infinite looping.
What if we remove a single short "noise" sentence from the above text? We'll remove this 5-word sentence that appears midway through the text:
another scare at the airport.
That's it, we removed a single 5-word sentence from the text. Note that there are no proper names that sentence, just the common noun "airport". Our input drops by just 25 characters and just 6 tokens. Yet look at these output numbers. Removing just 5 words drops our output from 3017 characters to just 512 characters and 1024 tokens to just 183 tokens.
"metadata": { "tokenMetadata": { "outputTokenCount": { "totalTokens": 183, "totalBillableCharacters": 512 }, "inputTokenCount": { "totalBillableCharacters": 2086, "totalTokens": 558 } }
The results can be seen below. (The original results had all carriage returns and spaces removed so we've prettified here to make it easier to read). We can see that while the results aren't truncated, they are missing numerous entities including Northeast, Pentagon, US, Antony Blinken, Mola Lenghi, Mary Bruce, White House, The Fed, Rebecca Jarvis, Eva Pilgrim, China and Southern Hemisphere to name a few. Yet, somehow the removal of a single noise sentence changes the output state from an infinite loop to actual JSON results in the format we requested.
[{ "name": "Chinese spy balloon", "wikipedia_url": "Chinese_spy_balloon", "type": "Other" }, { "name": "United States", "wikipedia_url": "United_States", "type": "Country" }, { "name": "Newark", "wikipedia_url": "Newark,_New_Jersey", "type": "Location" }, { "name": "President Biden", "wikipedia_url": "Joe_Biden", "type": "Person" }, { "name": "Alex Murdaugh", "wikipedia_url": "Alex_Murdaugh", "type": "Person" }, { "name": "Robin Roberts", "wikipedia_url": "Robin_Roberts", "type": "Person" }, { "name": "New Zealand", "wikipedia_url": "New_Zealand", "type": "Country" }]
The discontinuous drop from 1024 to 183 output tokens (a 5.6x decrease) by the removal of a single 5-word sentence suggests the underlying issue is not that we've exceeded the model's genuine limitations, but rather we've triggered the LLM equivalent of an infinite loop. In the five-word sentence we've removed, only one word is a name: airport. This leaves only two possibilities: either PaLM 2 is extracting airport and generating an 841-token long JSON response that hits the cap and causes some kind of failure state or else we've triggered an infinite loop. Instead of removing the entire sentence "another scare at the airport." we'll remove the "at the airport" to leave just "another scare.":
time curl \ -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://us-central1-aiplatform.googleapis.com/v1/projects/[YOURPROJECTID]/locations/us-central1/publishers/google/models/text-bison:predict -d \ $'{ "instances": [ { "prompt": "Extract a list of the people, organizations, locations and other entities in the following text, disambiguating and normalizing each name to the title of its corresponding Wikipedia article. Classify the Type of each as one of the following: \\"Organization,Person,Location,Event,Other\\". Output the results in JSON in the format \\"{name, wikipedia url, type}\\" : tonight, several developing stories as we come on the air. news coming in, the suspected chinese spy balloon hovering over the u.s. where it is now, and will there be a window to shoot it down? also tonight, the dangerous life-threatening cold moving into the northeast. the live readings already. first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it\'s believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. another scare. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. the unemployment rate not this low since 1969. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh\'s housekeeper dying in a fall at the home. what the family says they never knew. prosecutors say murdaugh got a $5 million payout from a policy that covered the housekeeper\'s fall. but will the jury hear about it? eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. you\'ll see her jump from the tallest building in the southern hemisphere. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that\'s in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites."} ], "parameters": { "temperature": 0.0, "maxOutputTokens": 1024, "topK": 1, "topP": 0.0 } }' | jq -r .predictions[0].content
Thus, the only difference between this text and the one above is the addition of two words: "another scare", neither of which contains a name. Our input tokens go up by just 13 characters (3 tokens), none of which contain an extractable name. Yet our output jumps back up to the 1024 token limit.
"metadata": { "tokenMetadata": { "inputTokenCount": { "totalBillableCharacters": 2099, "totalTokens": 561 }, "outputTokenCount": { "totalBillableCharacters": 3017, "totalTokens": 1024 } } }
And sure enough, we're back to the infinite repeating text. We've found an LLM infinite loop!
[{"name": "Chinese spy balloon", "wikipedia_url": "Chinese spy balloon", "type": "Other"}, {"name": "United States", "wikipedia_url": "United States", "type": "Country"}, {"name": "Newark", "wikipedia_url": "Newark, New Jersey", "type": "Location"}, {"name": "President Biden", "wikipedia_url": "Joe Biden", "type": "Person"}, {"name": "Alex Murdaugh", "wikipedia_url": "Alex Murdaugh", "type": "Person"}, {"name": "Robin Roberts", "wikipedia_url": "Robin Roberts", "type": "Person"}, {"name": "New Zealand", "wikipedia_url": "New Zealand", "type": "Country"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese government", "wikipedia_url": "Government of the People's Republic of China", "type": "Government"}, {"name": "Chinese military", "wikipedia_url": "People's Liberation Army", "type": "Military"}, {"name": "Chinese Communist Party", "wikipedia_url": "Chinese Communist Party", "type": "Political party"}, {"name": "Chinese people", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "E
What if we take the text above and remove another sentence?
the unemployment rate not this low since 1969.
Suddenly our output drops back again:
"metadata": { "tokenMetadata": { "inputTokenCount": { "totalTokens": 548, "totalBillableCharacters": 2060 }, "outputTokenCount": { "totalTokens": 183, "totalBillableCharacters": 512 } } }
But we get the same entity list as earlier that misses most of the names in the text (the original output as been prettified for display):
[{ "name": "Chinese spy balloon", "wikipedia_url": "Chinese_spy_balloon", "type": "Other" }, { "name": "United States", "wikipedia_url": "United_States", "type": "Country" }, { "name": "Newark", "wikipedia_url": "Newark,_New_Jersey", "type": "Location" }, { "name": "President Biden", "wikipedia_url": "Joe_Biden", "type": "Person" }, { "name": "Alex Murdaugh", "wikipedia_url": "Alex_Murdaugh", "type": "Person" }, { "name": "Robin Roberts", "wikipedia_url": "Robin_Roberts", "type": "Person" }, { "name": "New Zealand", "wikipedia_url": "New_Zealand", "type": "Country" }]
What if we drop a few more sentences to yield:
news coming in, the suspected chinese spy balloon hovering over the u.s. where it is now, and will there be a window to shoot it down? also tonight, the dangerous life-threatening cold moving into the northeast. the live readings already. first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it's believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. another scare. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh's housekeeper dying in a fall at the home. what the family says they never knew. eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that's in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites.
The output remains largely the same, but this time notice how Biden is missing even though he is still mentioned in the text.
[{ "name": "Chinese spy balloon", "wikipedia_url": "Chinese spy balloon", "type": "Other" }, { "name": "United States", "wikipedia_url": "United States", "type": "Country" }, { "name": "Newark", "wikipedia_url": "Newark, New Jersey", "type": "Location" }, { "name": "Alex Murdaugh", "wikipedia_url": "Alex Murdaugh", "type": "Person" }, { "name": "Robin Roberts", "wikipedia_url": "Robin Roberts", "type": "Person" }, { "name": "New Zealand", "wikipedia_url": "New Zealand", "type": "Country" }]
What if we take the text above and remove just the three words "news coming in," from the beginning to yield:
the suspected chinese spy balloon hovering over the u.s. where it is now, and will there be a window to shoot it down? also tonight, the dangerous life-threatening cold moving into the northeast. the live readings already. first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it's believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. another scare. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh's housekeeper dying in a fall at the home. what the family says they never knew. eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that's in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites.
Now suddenly we're back to our infinite loop output:
[{"name": "Chinese balloon", "wikipedia_url": "Chinese balloon", "type": "Other"}, {"name": "United States", "wikipedia_url": "United States", "type": "Country"}, {"name": "Newark", "wikipedia_url": "Newark, New Jersey", "type": "Location"}, {"name": "Alex Murdaugh", "wikipedia_url": "Alex Murdaugh", "type": "Person"}, {"name": "Robin Roberts", "wikipedia_url": "Robin Roberts", "type": "Person"}, {"name": "New Zealand", "wikipedia_url": "New Zealand", "type": "Country"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese government", "wikipedia_url": "Government of the People's Republic of China", "type": "Government"}, {"name": "Chinese military", "wikipedia_url": "People's Liberation Army", "type": "Military"}, {"name": "Chinese intelligence", "wikipedia_url": "Chinese intelligence", "type": "Organization"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"}, {"name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group"},
What if we drop the next sentence from the beginning of the text that says "the suspected chinese spy balloon hovering over the u.s. where it is now, and will there be a window to shoot it down?" to yield this input:
also tonight, the dangerous life-threatening cold moving into the northeast. the live readings already. first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it's believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. another scare. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh's housekeeper dying in a fall at the home. what the family says they never knew. eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that's in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites.
This time we get something bizarre: a comma-separated list that is not JSON, but is enclosed by a JSON starting brace and still follows the same general structure of Name, Wikipedia URL and Type. But the Wikipedia URLs are not Wikipedia entries anymore, the Types range far beyond the requested list (Day, Number, etc) and it appears to have extracted every noun it found. The results hit the 1024 cap again and are truncated. The removal of a single sentence to reduce the input text suddenly causes a complete and total change in the output behavior and the loss of JSON output.
{ "Chinese", "Chinese people", "Organization", "Newark", "Newark, New Jersey", "Location", "Alex Murdaugh", "Alex Murdaugh", "Person", "Robin Roberts", "Robin Roberts", "Person", "New Zealand", "New Zealand", "Location", "Bungee jump", "Bungee jumping", "Event", "Friday", "Friday", "Day", "100 degrees", "100", "Number", "Chinese", "Chinese people", "Organization", "U.S.", "United States", "Country", "Pentagon", "United States Department of Defense", "Organization", "Chinese", "Chinese people", "Organization", "Intelligence", "Intelligence", "Other", "60,000 feet", "60,000", "Number", "Secretary of State", "United States Secretary of State", "Organization", "Antony Blinken", "Antony Blinken", "Person", "White House", "White House", "Building", "Mola Lenghi", "Mola Lenghi", "Person", "Mary Bruce", "Mary Bruce", "Person", "Two", "2", "Number", "Passenger jets", "Passenger jet", "Vehicle", "Clipping wings", "Clipping wings", "Other", "One", "1", "Number", "Plane", "Airplane", "Vehicle", "Passengers", "Passenger", "Person", "Stunning", "Stunning", "Other", "Unexpected", "Unexpected", "Other", "Jobs report", "Employment in the United States", "Event", "President", "President of the United States", "Organization", "Joe Biden", "Joe Biden", "Person", "Fed", "Federal Reserve System", "Organization", "Rebecca Jarvis", "Rebecca Jarvis", "Person", "Alex Murdaugh", "Alex Murdaugh", "Person", "Double murder trial", "Double murder trial", "Event", "Defense team", "Defense team", "Organization", "Ballistics expert", "Ballistics expert", "Person", "Alex Murdaugh", "Alex Murdaugh", "Person", "Housekeeper", "Housekeeper", "Occupation", "Fall", "Fall", "Event", "Family", "Family", "Group", "Eva Pilgrim", "Eva Pilgrim", "Person", "Robin Roberts", "Robin Roberts", "Person", "New Zealand", "New Zealand", "Location", "Birthplace", "Birthplace", "Other", "Bungee jump", "Bungee jumping", "Event", "Lesson", "Lesson", "Other", "Conquering fear", "Conquering fear", "Other", "Good evening", "Good evening", "Other", "Friday night", "Friday night", "Day", "Brutally cold", "Brutally cold", "Other", "Dangerous", "Dangerous", "Other", "Life-threatening cold", "Life-threatening cold", "Other", "Wind chill", "Wind chill", "Other", "100 degrees below zero", "100 degrees below zero", "Temperature", "Begin", "Begin", "Other", "Chinese", "Chinese people", "Organization", "Spy balloon", "Spy balloon", "Vehicle", "Hovering", "Hovering", "Other", "U.S.", "United States", "Country", "Believe", "Believe", "Other", "Surveillance balloon", "Surveillance balloon", "Vehicle", "Maneuvered", "Maneuvered", "Other", "China", "China", "Country", "Spy satellites", "Spy satellite", "Vehicle", "Tonight", "Tonight", "Day", "China", "China", "Country", "Claiming", "Claiming", "Other", "Weather balloon", "Weather balloon", "Vehicle", "Blew off course", "Blew off course", "Other", "U.S. authorities", "United States authorities", "Organization", "Believe", "Believe", "Other", "Chinese", "Chinese people", "Organization", "Likely", "Likely", "Other", "Spy satellites", "Spy satellite", "Vehicle", "Manoeuvered", "Manoeuvered", "Other", "China", "China", "Country", "Tonight", "Tonight", "Day", "Chinese", "Chinese people", "Organization", "Spy balloon
What if we remove the next starting sentence that says "also tonight, the dangerous life-threatening cold moving into the northeast." to yield this as our input text:
the live readings already. first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it's believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. another scare. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh's housekeeper dying in a fall at the home. what the family says they never knew. eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that's in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites.
Suddenly we're back to JSON output and for the first time many of the additional names that were excluded from the earlier outputs are present! However, Pentagon, U.S., White House and The Fed are still missing. The Wikipedia URLs are not URLs and thus would require further work for normalization when compared against outputs from other texts. Given that the intent of such entity extraction is often to connect extracted entities back to the source text for visual labeling, proximity search, etc, it is also problematic that "President Biden" in the source text is converted to "President Joe Biden" in the extracted entity list, which does not actually appear in the source text. Thus, while not an infinite loop, we are still missing critical entities from the list below, while those that were extracted would be difficult to utilize in downstream applications.
Most problematic of all is that the removal of a single sentence once again pivots us from infinite loop output to structured output.
[{ "name": "Chinese balloon", "wikipedia_url": "Chinese balloon", "type": "Other" }, { "name": "Chinese", "wikipedia_url": "Chinese people", "type": "Ethnic Group" }, { "name": "United States", "wikipedia_url": "United States", "type": "Country" }, { "name": "Secretary of State Antony Blinken", "wikipedia_url": "Antony Blinken", "type": "Person" }, { "name": "Mola Lenghi", "wikipedia_url": "Mola Lenghi", "type": "Person" }, { "name": "Mary Bruce", "wikipedia_url": "Mary Bruce", "type": "Person" }, { "name": "Newark", "wikipedia_url": "Newark, New Jersey", "type": "Location" }, { "name": "President Joe Biden", "wikipedia_url": "Joe Biden", "type": "Person" }, { "name": "Rebecca Jarvis", "wikipedia_url": "Rebecca Jarvis", "type": "Person" }, { "name": "Alex Murdaugh", "wikipedia_url": "Alex Murdaugh", "type": "Person" }, { "name": "Eva Pilgrim", "wikipedia_url": "Eva Pilgrim", "type": "Person" }, { "name": "Robin Roberts", "wikipedia_url": "Robin Roberts", "type": "Person" }, { "name": "New Zealand", "wikipedia_url": "New Zealand", "type": "Country" }, { "name": "Bungee jump", "wikipedia_url": "Bungee jumping", "type": "Event" }]
What if we remove the next sentence that says "the live readings already. " to yield the following input:
first, the pentagon tracking that chinese balloon, the intelligence bay hanging beneath it, the size of three busses. where it was spotted today over the u.s., where it's believed to be headed now, 60,000 feet in the air. will the u.s. shoot this down? secretary of state antony blinken postponing his high stakes trip to china. mola lenghi, mary bruce standing by at the white house. another scare. this time, newark. two passenger jets clipping wings. one plane full of passengers. the stunning and unexpected jobs report tonight. president biden celebrating the numbers. so, how does this complicate steps by the fed moving forward? rebecca jarvis, as always, breaking it down. the alex murdaugh double murder trial. the defense team, and what a ballistics expert acknowledged today. and tonight, the story of alex murdaugh's housekeeper dying in a fall at the home. what the family says they never knew. eva pilgrim reporting. robin roberts tonight and her epic jump. robin from new zealand, the birthplace of the bungee jump. tonight, the lesson on conquering fear. good evening. it is great to have you with us on a friday night. almost to the weekend, but it will be brutally cold, dangerous, life-threatening cold. the readings coming in right now, and where the wind chill is already more than 100 degrees below zero. that's in a moment here. but we begin with the chinese spy balloon hovering over the u.s. tonight. china claiming it is a weather balloon that blew off course. u.s. authorities believe this is a surveillance balloon being maneuvered by china, likely using spy satellites.
Suddenly all of the additional names from above vanish. The removal of a single noise sentence that contains no names causes our output to collapse and revert us back to our earlier output that lacks many of the names in the text:
[{ "name": "Chinese balloon", "wikipedia_url": "Chinese balloon", "type": "Other" }, { "name": "United States", "wikipedia_url": "United States", "type": "Location" }, { "name": "Secretary of State Antony Blinken", "wikipedia_url": "Antony Blinken", "type": "Person" }, { "name": "Newark", "wikipedia_url": "Newark, New Jersey", "type": "Location" }, { "name": "President Joe Biden", "wikipedia_url": "Joe Biden", "type": "Person" }, { "name": "Alex Murdaugh", "wikipedia_url": "Alex Murdaugh", "type": "Person" }, { "name": "Robin Roberts", "wikipedia_url": "Robin Roberts", "type": "Person" }, { "name": "New Zealand", "wikipedia_url": "New Zealand", "type": "Location" }]
In fact, proceeding further and removing sentences at random from the text causes massive shifts in the output, with the removal of just a few noise words leading to existential changes in the output.