
Unlike written news articles, which often focus on a single story, television news transcripts represent continuous stream-of-consciousness streams of words covering many different stories in a single 30 minute to multi-hour transcript that often interleaves and weaves stories within each other, returning to earlier stories, jumping ahead to preview a later story, injecting a single detail midway through the broadcast relating to a story that appeared in the first few seconds and so on. Journalists and scholars seeking to understand how stories are being covered by the news must construct an index over a given broadcast to understand what stories it covers and where they are mentioned throughout the transcript. Historically this has been done by hand by institutions like the Vanderbilt Television News Archive, with automated story segmentation approaches yielding relatively poor results.
Increasingly, the most powerful reasoning LLMs are yielding better and better results segmenting television news transcripts into their component stories, often achieving human-level results. However, they have proven steadfastly poor at connecting their indexes back to the underlying transcript timecodes, resulting in accurate story lists that cannot be connected back to the underlying portions of the broadcast where those stories are mentioned. Instead of asking an LLM to construct an index of the stories and list their start/end timecodes, what if we ask it to chunk the transcript into a JSON-NL file and assign each chunk a story ID? In other words, inverse our previous process. Here we ask Gemini 2.5 Pro to take an evening news broadcast transcript, compile a list of the stories described within, assign each story an ID number and then break the transcript into continuous chunks and label each with the story it pertains to. The results are extremely strong. Running the workflow below three times we got slightly different lists of stories each time, but the major stories were consistent. Some transcript lines were missing across the three runs, but overall this workflow yields massively better results than all of our previous approaches and suggests a chunking/annotation workflow might be more amenable to LLM processing than our previous timecode/indexing approach.
For our workflow, first we took the JSON transcript of this ABC Evening News broadcast from yesterday and concatenated it into a single massive blob of text:
jq -r '[.transcript[].txt] | join(" ")' transcript.json > transcript.txt
We then used this prompt with Gemini 2.5 Pro Thinking:
Below is a television news transcript. You are a news cataloger whose job it is to read television news transcripts and identify each of the stories described within. Read the transcript below and make a list of all of the stories it describes. Each story may be mentioned multiple times in multiple places in the transcript. Number the stories in sequential order from 1. Output the story list as JSON records, one per row, with fields STORYID, TITLE and DESCRIPTION and a field TYPE that is "STORY". Then go through the transcript below and break it into chunks relating to each story and output each chunk as a JSON record, one per row, that has fields STORYID and TXT and TYPE="CHUNK" to create an annotated version of the transcript and fill in the STORYID with the story number for that story.
This yields the following story JSON entries:
{"STORYID": 1, "TITLE": "Taylor Swift Regains Control of Her Entire Music Catalog", "DESCRIPTION": "Taylor Swift announced she has successfully acquired the rights to her first six albums, meaning she now owns her entire music catalog, including masters, music videos, concert films, album art, and unreleased songs. This follows a 2019 sale of her original masters to Scooter Braun and her subsequent efforts to re-record her work.", "TYPE": "STORY"} {"STORYID": 2, "TITLE": "Severe Weather Threatens Eastern U.S. with Tornadoes and Deadly Storms", "DESCRIPTION": "A major storm system brings a tornado watch to the Eastern U.S., from Washington D.C. to Philadelphia and New York City, affecting 45 million Americans. The system has already produced a deadly EF-2 tornado in Kentucky, killing one person, and caused damage in Florida and Georgia, including a tornado crossing I-75 and injuring an actor's son.", "TYPE": "STORY"} {"STORYID": 3, "TITLE": "FAA Investigates Near-Collision of Two Passenger Jets in San Francisco", "DESCRIPTION": "The FAA is investigating a close call at San Francisco International Airport where a United Airlines plane and a SkyWest plane, taking off on parallel runways, nearly collided. The United plane allegedly turned into the path of the SkyWest jet without authorization, prompting evasive action from the SkyWest plane.", "TYPE": "STORY"} {"STORYID": 4, "TITLE": "Elon Musk Departs Role as Special Government Employee, Trump Praises Service", "DESCRIPTION": "President Trump held a farewell for Elon Musk, who is stepping down from his role as a special government employee focused on slashing federal agencies. Musk acknowledged his job was challenging, commented on the President's budget bill, and explained a black eye as resulting from playing with his son.", "TYPE": "STORY"} {"STORYID": 5, "TITLE": "Supreme Court Rules in Favor of Trump Administration on Immigration Policy", "DESCRIPTION": "The Supreme Court handed the Trump administration a victory, ruling it can end protections for over 500,000 immigrants from Cuba, Haiti, Nicaragua, and Venezuela, at least for now. Justice Ketanji Brown Jackson issued a scathing dissent.", "TYPE": "STORY"} {"STORYID": 6, "TITLE": "President Trump Comments on Potential Pardon for Sean Diddy Combs", "DESCRIPTION": "When asked, President Trump stated he would consider the facts if a pardon request for Sean Diddy Combs, currently on trial for sex trafficking and racketeering, were to come before him, irrespective of their past relationship or Combs' political views.", "TYPE": "STORY"} {"STORYID": 7, "TITLE": "Sean Diddy Combs Trial: Assistant's Cross-Examination Highlights Contradictions", "DESCRIPTION": "During Sean Diddy Combs' trial for sex trafficking and racketeering, his former assistant, who accused him of repeated sexual assault, faced cross-examination. The defense highlighted her past social media posts praising Combs, which she attributed to manipulation and a desire to survive in a toxic environment. Combs was in court when Trump's potential pardon comments were made.", "TYPE": "STORY"} {"STORYID": 8, "TITLE": "Todd and Julie Chrisley Pardoned by President Trump After Fraud Conviction", "DESCRIPTION": "Reality TV personalities Todd and Julie Chrisley were issued full pardons by President Trump after serving 2.5 years in federal prison for tax evasion and bank fraud. Todd Chrisley spoke out, maintaining his innocence and expressing plans to advocate for prison reform and potentially return to reality TV.", "TYPE": "STORY"} {"STORYID": 9, "TITLE": "Large Sinkhole Swallows Cars and Threatens Buildings in California", "DESCRIPTION": "A major sinkhole opened up next to a construction site in Ventura, California, swallowing multiple cars and threatening nearby buildings. City officials cited a shoring failure at the construction project as the likely cause, and crews are working to prevent further damage.", "TYPE": "STORY"} {"STORYID": 10, "TITLE": "Remembering Loretta Swit, \"M.A.S.H.\" Star", "DESCRIPTION": "Actress Loretta Swit, best known for her Emmy-winning role as Major Margaret \"Hot Lips\" Houlihan on the iconic TV series \"M.A.S.H.,\" has passed away at the age of 87. Tributes are pouring in for the actress, whose work, including the M.A.S.H. finale watched by nearly 106 million viewers, had a significant impact.", "TYPE": "STORY"} {"STORYID": 11, "TITLE": "Faizan Zaki Wins Scripps National Spelling Bee", "DESCRIPTION": "13-year-old Faizan Zaki from Allen, Texas, won the Scripps National Spelling Bee by correctly spelling \"eclaircissement.\" Zaki, who was the runner-up last year, became the fifth contestant to win the year after finishing second and mentioned a possible future in math competitions.", "TYPE": "STORY"} {"STORYID": 12, "TITLE": "CDC Guidance on COVID-19 Vaccines for Children Differs from RFK Jr. Stance", "DESCRIPTION": "The CDC released its latest immunization schedule, recommending the COVID-19 vaccine for healthy children, utilizing a shared decision-making model for parents and doctors. This guidance appears to counter Health Secretary Robert F. Kennedy Jr.'s recent announcement about removing the shot recommendation for healthy children.", "TYPE": "STORY"} {"STORYID": 13, "TITLE": "NBA Star Zion Williamson Accused of Rape in Lawsuit", "DESCRIPTION": "New Orleans Pelicans forward Zion Williamson is facing a lawsuit filed in Los Angeles by a woman alleging rape and abuse between 2018 and 2023. Williamson's legal team has denied all allegations, calling the suit an exploitation attempt.", "TYPE": "STORY"} {"STORYID": 14, "TITLE": "President Trump Announces Doubling of Steel Tariffs", "DESCRIPTION": "During a visit to Pittsburgh, Pennsylvania, President Trump announced he is doubling tariffs on steel from 25% to 50%. The announcement came as he was touting a new partnership between U.S. Steel and Japanese company Nippon Steel.", "TYPE": "STORY"}
Followed by a complete chunking of the transcript. Here we show selected lines from the first few minutes (the actual output contained the entire transcript / longer lines are truncated here, but appeared complete in the LLM output):
{"STORYID": 1, "TXT": "TO HER MUSIC CATALOG AND NOW OWNS ALL OF HER ALBUMS. THIS MORNING, SHE POSTED A LETTER TO SWIFTIES ALL AROUND THE WORLD ANNOUNCING THE BIG NEWS...", "TYPE": "CHUNK"} {"STORYID": 2, "TXT": ">>> TONIGHT, WE HAVE BREAKING NEWS. THE TORNADO WATCH RIGHT NOW IN THE EAST. WASHINGTON, D.C. RIGHT UP THROUGH PHILADELPHIA. THE STORMS THEN MOVING INTO NEW YORK CITY.", "TYPE": "CHUNK"} {"STORYID": 2, "TXT": ">>> TONIGHT, 45 MILLION AMERICANS ON ALERT RIGHT NOW. THE SYSTEM DEADLY ALREADY. NOW THE DANGEROUS STORMS FROM FLORIDA TO THE CAROLINAS RIGHT INTO THE NORTHEAST THIS EVENING. ...", "TYPE": "CHUNK"} {"STORYID": 2, "TXT": ">> THAT IS A TORNADO? >> YEP.", "TYPE": "CHUNK"} ... {"STORYID": 3, "TXT": ">>> TONIGHT, ABC NEWS NEWS LEARNING OF A VERY CLOSE CALL FOR TWO PASSENGER JETS, A UNITED PLANE AND A SKY WEST PLANE TAKING OFF ON PARALLEL RUNWAYS....", "TYPE": "CHUNK"} {"STORYID": 4, "TXT": ">>> THE SEAT IN THE OVAL OFFICE TODAY. PRESIDENT TRUMP'S FAREWELL TO ELON MUSK. WHAT ELON MUSK ACKNOWLEDGED TODAY. ALSO, MUSK EXPLAINING HIS BLACK EYE.", "TYPE": "CHUNK"} {"STORYID": 5, "TXT": ">>> ALSO TONIGHT, THE SUPREME COURT HANDING THE PRESIDENT A MAJOR VICTORY ON IMMIGRATION.", "TYPE": "CHUNK"} {"STORYID": 6, "TXT": "AND WHAT PRESIDENT TRUMP SAID TODAY WHEN ASKED IF HE WOULD CONSIDER A PARDON FOR SEAN DIDDY COMBS. IT MADE NEWS.", "TYPE": "CHUNK"} {"STORYID": 7, "TXT": ">>> AT THE COMBS TRIAL IN NEW YORK, THE SEARING NEW TESTIMONY...", "TYPE": "CHUNK"} {"STORYID": 8, "TXT": ">>> FORMER REALITY SHOW DAD TODD CHRISLEY SPEAKING OUT AFTER PRESIDENT TRUMP ISSUED PARDONS TO HIM AND HIS WIFE...", "TYPE": "CHUNK"} {"STORYID": 1, "TXT": ">>> TONIGHT, THE MAJOR NEWS HERE INVOLVING TAYLOR SWIFT. THE SUPERSTAR EMOTIONAL AFTER SHE REVEALED WHAT SHE NOW CONTROLS.", "TYPE": "CHUNK"} {"STORYID": 9, "TXT": ">>> TONIGHT, THE ALARMING SCENE BESIDE A CONSTRUCTION SITE. A MAJOR SINKHOLE SWALLOWING MULTIPLE CARS AND THREATENING NEARBY BUILDINGS AT THIS HOUR.", "TYPE": "CHUNK"} {"STORYID": 10, "TXT": ">>> REMEMBERING LORETTA SWIT. FROM \"M.A.S.H.,\" SHE WON MULTIPLE EMMYS FOR THE ROLE. THE TRIBUTES POURING IN TONIGHT.", "TYPE": "CHUNK"} {"STORYID": 11, "TXT": ">>> AND \"AMERICA STRONG\" THIS EVENING. COULD YOU SPELL THIS? HE WAS THE RUNNER-UP LAST YEAR. HE CAME BACK TO WIN IT ALL THIS YEAR. YOU HAVE TO SEE THE MOMENT.", "TYPE": "CHUNK"} {"STORYID": 2, "TXT": ">> David: GOOD EVENING, AND WE FINALLY MADE IT TO 45. 45 MINUTE AMERICANS ON ALERT RIGHT NOW. IN THE EAST A TORNADO WATCH AT THIS HOUR FROM WASHINGTON, D.C. RIGHT UP THROUGH PHILADELPHIA. ....", "TYPE": "CHUNK"} ... {"STORYID": 2, "TXT": ">> David: LEE GOLDBERG OF WABC, YOU'RE TRACKING THESE TORNADO WATCHES UP RIGHT NOW?", "TYPE": "CHUNK"} ... {"STORYID": 3, "TXT": ">>> NEXT, ABC NEWS LEARNING OF A VERY CLOSE CALL FOR TWO PASSENGER JETS, A UNITED PLANE AND SKY WEST PLANE...", "TYPE": "CHUNK"} {"STORYID": 3, "TXT": ">> Reporter: TONIGHT, FRIGHTENING NEW DETAILS EMERGING FROM A RECENT CLOSE CALL AT SAN FRANCISCO INTERNATIONAL AIRPORT. A PAIR OF COMMERCIAL JETS, ...", "TYPE": "CHUNK"} ... {"STORYID": 13, "TXT": ">>> WHEN WE COME BACK HERE TONIGHT, NBA SUPERSTAR ZION WILLIAMSON HIT WITH MAJOR ACCUSATIONS TONIGHT...", "TYPE": "CHUNK"} {"STORYID": 1, "TXT": ">>> TAYLOR SWIFT CRYING TEARS OF JOY AFTER HER BIG REVEAL TONIGHT.", "TYPE": "CHUNK"} {"STORYID": 13, "TXT": ">> David: TO THE INDEX OF OTHER NEWS ON A FRIDAY NIGHT. A BOMBSHELL ACCUSING ZION WILLIAMSON OF RAPE. A WOMAN FILING THE SUIT IN LOS ANGELES AGAINST THE PELICANS FORWARD...", "TYPE": "CHUNK"}