Yesterday we examined how GCP's new Chirp 2 ASR model hallucinates speech during non-verbal musical interludes in news broadcasts, resulting in entire paragraphs in its resulting transcripts that never appear in the actual input audio file. Today we'll examine how it also drops out entire passages of text in its transcripts, resulting in transcripts that contain a mixture of hallucinated passages that were never spoken and excluded passages that were spoken but don't appear in the transcript – behaviors nearly identical to OpenAI's Whisper ASR model. In contrast, its predecessor Chirp model does not exhibit either of these issues. These results offer a stark reminder that in the AI race, newer models aren't necessarily better than the ones they replace and organizations need to carefully benchmark each new model release to see how it performs in their use cases and what tradeoffs it presents.
Let's look at a typical 30 minute American evening news broadcast in English to compare its transcriptions in Chirp and Chirp 2. English-language American-sourced spoken audio is typically strongly represented in the training datasets of most ASR systems and thus offers a best-case scenario to compare the two models. Here we'll focus on just 4 of the 964 changes between the two transcriptions of this 30-minute broadcast (though many of these differences involve punctuation and capitalization).
We'll start with this brief one-minute clip:
Here is the broadcaster-provided official transcript of this clip:
00:20:45,645|00:20:47,446|POP| Ask your doctor about BREZTRI. 00:21:39,297|00:21:40,231|RU2|♪ ♪ 00:21:40,298|00:21:41,433|RU2|>> Margaret: A SURGE IN 00:21:41,499|00:21:43,034|RU2|HAITIAN MIGRANTS TO SPRINGFIELD 00:21:43,101|00:21:46,638|RU2|OHIO, IS STRAINING THE LOCAL 00:21:46,705|00:21:47,305|RU2|COMMUNITY, AND THE GOVERNOR 00:21:47,372|00:21:47,906|RU2|THERE IS NOW ASKING FOR FEDERAL 00:21:47,973|00:21:49,507|RU2|HELP.
Only around one third of all advertisements are captioned on most American television news channels and this clip is no exception, with the ad that appears between the Breztri ad and the resumption of the news programming being excluded from the official transcript. This is the actual transcript of what was said:
Ask your doctor about BREZTRI. All the homes are gone. And the rent's too high. I work too damn hard. I work too damn hard. Still can't afford to stay. Can't afford to stay. California's leaving. California's leaving. The dreams drifting away. Support rent control. I feel things when people talk about their struggles, about their problems, especially families. I have a kid. When I see parents who are struggling or kids who are struggling. I really feel all of those stories. The late news with Sarah Donchi, weeknights at 11:00. A surge in Haitian migrants to Springfield, Ohio is straining the local community and the governor there is now asking for federal help.
Here is Chirp's transcription:
ask your doctor about breasttry, all the homes are gone. support rent control, i feel things when people talk. about their struggles, about their problems, especially families. I have a kid, when I see parents who are struggling or kids who are struggling, I really feel all of those stories. The Late News with Sarah Donci, week nights at 11. A surge in Haitian migrants to Springfield, Ohio is straining the local community, and the governor there is now asking for federal help.
And Chirp 2's:
Ask your doctor about breath tree. and the rent too high. I work too damn high. I work too damn high. Still can't afford to stay. Can't afford to stay. California's leaving. California's leaving. The dreams drifting away. Support rent control. I feel things when people talk about their struggles, about their problems, especially families. I have a kid when I see parents who are struggling or kids who are struggling. I really feel all of those stories. The late news with Sarah Donchi, weeknights at 11:00. A surge in Haitian migrants to Springfield, Ohio is straining the local community and the governor there is now asking for federal help.
In this case, Chirp 2 actually provides a more accurate transcription than Chirp in that it correctly transcribes some of the musical narration overlay. At the same time, however, it excludes "all the homes are gone". In all there are 9 changes (though many are punctuation and capitalization related):
How about this clip?
The broadcaster-provided transcript:
00:16:12,304|00:16:13,372|RU2|>> Margaret: ONE OF THE HERO 00:16:13,439|00:16:15,240|RU2|PILOTS ABOARD THAT 00:16:15,307|00:16:16,508|RU2|ALASKA AIRLINES PLANE THAT HAD A 00:16:16,575|00:16:17,710|RU2|DOOR BLOW OUT IN JANUARY IS NOW 00:16:17,776|00:16:20,979|RU2|SPEAKING OUT FOR THE FIRST TIME 00:16:21,046|00:16:21,647|RU2|TONIGHT ABOUT THOSE TERRIFYING 00:16:21,714|00:16:25,117|RU2|MOMENTS. 00:16:25,184|00:16:25,718|RU2|CBS'S KRIS VAN CLEAVE HAS THE 00:16:25,784|00:16:26,585|RU2|EXCLUSIVE INTERVIEW. 00:16:26,652|00:16:28,120|RU2|>> ALASKA 1282, DECLARE AN 00:16:28,186|00:16:28,787|RU2|EMERGENCY. 00:16:28,854|00:16:29,855|RU2|>> Reporter: ALASKA AIRLINES 00:16:29,922|00:16:30,856|RU2|PILOT EMILY WIPRUD SOUNDED THE 00:16:30,923|00:16:33,792|RU2|ALARM. 00:16:33,859|00:16:34,793|RU2|SOMETHING HAD GONE TERRIBLY 00:16:34,860|00:16:39,598|RU2|WRONG. 00:16:39,665|00:16:41,133|RU2|>> THE FIRST INDICATION WAS IT 00:16:41,199|00:16:42,067|RU2|WAS AN EXPLOSION IN MY EARS. 00:16:42,134|00:16:44,669|RU2|AND THEN A WHOOSH OF AIR. 00:16:44,736|00:16:47,406|RU2|MY BODY WAS FORCED FORWARD. 00:16:47,473|00:16:49,274|RU2|AND THERE WAS A LOUD BANG. 00:16:49,341|00:16:50,342|RU2|>> Reporter: DID YOU KNOW AT 00:16:50,409|00:16:51,476|RU2|THIS POINT THERE WAS A HOLE IN 00:16:51,543|00:16:54,079|RU2|THE AIRPLANE? 00:16:54,146|00:16:54,279|RU2|>> NO. 00:16:54,346|00:16:55,280|RU2|I DIDN'T KNOW THAT THERE WAS A 00:16:55,347|00:16:56,548|RU2|HOLE IN THE AIRPLANE UNTIL WE 00:16:56,615|00:16:57,149|RU2|LANDED. 00:16:57,215|00:16:59,484|RU2|>> WE JUST NEED TO DEPRESSURIZE 00:16:59,551|00:17:00,485|RU2|TRY TO MAINTAIN 10,000 AND WE 00:17:00,552|00:17:02,488|RU2|NEED TO RETURN BACK TO PORTLAND. 00:17:02,554|00:17:03,956|RU2|>> IT WAS SO INCREDIBLY LOUD 00:17:04,023|00:17:05,157|RU2|AND I REMEMBER PUTTING THE 00:17:05,223|00:17:07,693|RU2|OXYGEN MASK ON AND TRYING TO 00:17:07,760|00:17:09,161|RU2|TRANSMIT TO ATC AND WONDERING 00:17:09,227|00:17:10,162|RU2|WHY CAN'T I HEAR ANYTHING? 00:17:10,229|00:17:11,897|RU2|>> Reporter: A DOOR PANEL ON 00:17:11,964|00:17:14,366|RU2|THE BOEING 737 MAX HAD BLOWN OUT 00:17:14,433|00:17:16,035|RU2|WITH ENOUGH FORCE TO RIP OFF HER 00:17:18,070|00:17:18,504|RU2|HEADSET. 00:17:18,570|00:17:21,173|RU2|BUT WIPRUD SAYS THEIR TRAINING 00:17:21,239|00:17:22,307|RU2|TOOK OVER AS THE TWO PILOTS 00:17:22,374|00:17:27,112|RU2|WORKED TO SAFELY LAND. 00:17:27,179|00:17:29,114|RU2|>> AND I OPENED THE FLIGHT DECK 00:17:29,181|00:17:30,315|RU2|DOOR. 00:17:30,382|00:17:31,316|RU2|AND I SAW CALM, QUIET, HUNDREDS 00:17:31,383|00:17:36,521|RU2|OF EYES STARING RIGHT BACK AT 00:17:36,588|00:17:37,055|RU2|ME. 00:17:37,122|00:17:38,190|RU2|AND I LOOKED AT MY FLIGHT 00:17:38,256|00:17:42,394|RU2|ATTENDANTS, AND I SAID, ARE YOU 00:17:42,995|00:17:43,128|RU2|OKAY? 00:17:43,195|00:17:44,663|RU2|AND IN THAT RESPONSE, I HEARD 00:17:44,729|00:17:45,730|RU2|HOLE, FOUR, FIVE EMPTY SEATS 00:17:45,797|00:17:49,801|RU2|AND INJURIES. 00:17:49,868|00:17:50,735|RU2|>> Reporter: AND ARE YOU 00:17:50,802|00:17:51,403|RU2|THINKING, WHEN THEY SAID EMPTY 00:17:51,470|00:17:55,407|RU2|SEATS, THAT YOU LOST PEOPLE? 00:17:55,474|00:17:57,676|RU2|>> YES. 00:17:57,743|00:17:58,543|RU2|AND I REMEMBER IT NOT TAKING 00:17:58,610|00:17:59,945|RU2|VERY LONG FOR US TO CONFIRM WE 00:18:00,011|00:18:01,280|RU2|HAD 177 SOULS ON BOARD. 00:18:01,346|00:18:02,748|RU2|>> Reporter: THAT HAD TO BE AN 00:18:02,815|00:18:03,816|RU2|EMOTIONAL ROLLER COASTER. 00:18:03,882|00:18:05,084|RU2|>> YEAH. 00:18:05,150|00:18:06,685|RU2|I WAS SO THANKFUL. 00:18:06,751|00:18:10,689|RU2|I WAS IN SHOCK. 00:18:10,756|00:18:14,159|RU2|I -- DISBELIEF. 00:18:15,293|00:18:15,627|RU2|EVERYBDY WAS THERE. 00:18:15,694|00:18:18,296|RU2|>> Reporter: WIPRUD IS BACK ON 00:18:18,363|00:18:20,098|RU2|THE FLIGHT DECK, TELLING US IT 00:18:20,165|00:18:21,233|RU2|WAS VERY IMPORTANT FOR HER TO 00:18:21,300|00:18:22,234|RU2|GET BACK IN THE SADDLE. 00:18:22,300|00:18:23,635|RU2|SHE AND HER CAPTAIN WILL BE 00:18:23,702|00:18:26,438|RU2|HONORED TOMORROW BY THE AIRLINE 00:18:26,505|00:18:27,239|RU2|PILOTS ASSOCIATION FOR SAFELY 00:18:27,306|00:18:28,240|RU2|LANDING THAT DAMAGED PLANE. 00:18:28,306|00:18:29,441|RU2|MARGARET? 00:18:29,508|00:18:31,176|RU2|>> Margaret: AN AMAZING STORY. 00:18:31,243|00:18:32,911|RU2|KRIS, THANK YOU FOR BRINGING IT 00:18:32,978|00:18:33,645|RU2|TO US. 00:18:33,712|00:18:35,981|RU2|WE'LL BE RIGHT BACK.
Chirp :
One of the hero pilots aboard that Alaska Airlines plane that had a door blow-out mid-flight in January is now speaking out for the first time tonight about those terrifying moments. CBS says Chris Van Ceve has the exclusive interview. Alaska Airlines pilot Emily Whiproot sounded the alarm something had gone terribly wrong. The first indication was it was an explosion in my ears. and then a woosh of air, my body was forced forward and there was a loud bang. did you know at this point there was a hole in the airplane? no, i didn't know that there was a hole in the airplane until we landed. we just depressurized and suddenly maintained 10,000 and we need to return back to Portland. it was so incredibly loud and i remember putting the oxygen mask on and trying to transmit to ATC and wondering, why can't I hear anything? A door panel on the Boeing 737 Max had blown out with enough force to rip off her headset, but Wiproot says their training took over as the two pilots worked to safely land, and I opened the flight deck door, and I saw calm, quiet, hundreds of eyes staring right back at me and I looked at my flight attendants and I said, are you okay? and in that response I heard whole four, five empty seats and injuries. And are you thinking when they said empty seats, that you'd lost people? Yes, and I remember it not taking very long for us to confirm we had 177 souls on board. That had to be an emotional roller coaster. Yeah, I was so thankful. I was in… shock, i disbelieve everybody was there. Whip rude is back on the flight deck, telling us it was very important for her to get back in the saddle, she and her captain will be honored tomorrow by the airline pilots association for safely landing that damaged plane. Margaret, an amazing story. Chris, thank you for bringing it to us. We'll be right back.
Chirp 2:
One of the hero pilots aboard that Alaska Airlines plane that had a door blowout mid-flight in January is now speaking out for the first time tonight about those terrifying moments. CBS's Chris Van Cleave has the exclusive interview. Alaska Airlines pilot Emily Whiproot sounded the alarm. Something had gone terribly wrong. The first indication was it was an explosion in my ears.And then a wish of air. My body was forced forward and there was a loud bang. Did you know at this point there was a hole in the airplane? No. I didn't know that there was a hole in the airplane until we landed. It was so incredibly loud and I remember putting the oxygen mask on and trying to transmit to ATC and Wondering, why can't I hear anything? A door panel on the Boeing 737 Max had blown out with enough force to rip off her headset, but Whiprut says their training took over as the two pilots worked to safely land. And I opened the flight deck door. And I saw calm, quiet, hundreds of eyes staring right back at me. And I looked at my flight attendants and I said, "Are you okay?" And in that response, I heard whole, four, five empty seats and injuries. And are you thinking when they said empty seats that you'd lost people? Yes. And I remember it not taking very long for us to confirm we had 177 souls on board. That had to be an emotional roller coaster. I was so thankful. I was in shock. I disbelief. Everybody was there. Whiprood is back on the flight deck telling us it was very important for her to get back in the saddle. She and her captain will be honored tomorrow by the Airline Pilots Association for safely landing that damaged plane. Margaret. An amazing story. Chris, thank you for bringing it to us. We'll be right back.
Comparing the two, Chirp 2 excludes an entire passage: "we just depressurized and suddenly maintained 10,000 and we need to return back to Portland.". This time there are 27 changes:
Another clip:
Broadcaster-provided transcripts are themselves not perfect, with this one transcribing "the federal government" whereas the actual spoken words were "the government". It also excludes the faint "its all about" that is heard during a clip transition:
00:09:09,014|00:09:10,549|RU2|>> THE FEDERAL GOVERNMENT AND 00:09:10,616|00:09:11,550|RU2|DONALD TRUMP CERTAINLY SHOULD 00:09:11,617|00:09:13,085|RU2|NOT BE TELLING A WOMAN WHAT TO 00:09:13,152|00:09:14,553|RU2|DO WITH HER BODY. 00:09:14,620|00:09:15,688|RU2|>> Reporter: THE TWO 00:09:15,755|00:09:17,290|RU2|CANDIDATES SPARRED OVER THE 00:09:17,356|00:09:19,224|RU2|ECONOMY AND INFLATION. 00:09:19,291|00:09:20,960|RU2|>> DONALD TRUMP LEFT US THE 00:09:21,026|00:09:21,961|RU2|WORST UNEMPLOYMENT SINCE THE 00:09:22,094|00:09:23,429|RU2|GREAT DEPRESSION.
Chirp:
The government, and Donald Trump certainly should not be telling a woman what to do with her. body, the two candidates sparred over the economy and inflation. Donald Trump left us the worst unemployment since the great depression.
Chirp 2:
The government and Donald Trump certainly should not be telling a woman what to do with her body. It's all about Donald Trump left us the worst unemployment since the Great Depression.
Chirp 2 excluded "the two candidates sparred over the economy and inflation" from the transcript:
And a fourth clip:
Broadcaster-provided transcript:
00:08:29,375|00:08:30,643|RU2|THE HARRIS TEAM IS PUSHING FOR 00:08:30,709|00:08:32,044|RU2|THE TWO CONTENDERS TO MEET AGAIN 00:08:32,111|00:08:32,845|RU2|IN OCTOBER, BUT TRUMP, WHO 00:08:32,912|00:08:34,046|RU2|REPEATEDLY FELL INTO RHETORICAL 00:08:34,113|00:08:35,314|RU2|TRAPS SET BY HARRIS, TODAY SAID 00:08:35,381|00:08:36,516|RU2|HE'S NOT SO SURE. 00:08:36,582|00:08:39,051|RU2|>> I THINK THAT, ARE WE GOING TO 00:08:39,118|00:08:40,653|RU2|DO A REMATCH? 00:08:40,719|00:08:41,721|RU2|I JUST DON'T KNOW. 00:08:41,787|00:08:43,589|RU2|BUT WE'LL THINK ABOUT IT. 00:08:43,656|00:08:45,591|RU2|>> Reporter: THE FORMER 00:08:45,658|00:08:46,725|RU2|PRESIDENT TRIED TO DISMISS THE 00:08:46,792|00:08:48,995|RU2|SURPRISE ENDORSEMENT OF HARRIS 00:08:49,061|00:08:49,729|RU2|LATE TUESDAY BY POP STAR 00:08:49,795|00:08:50,596|RU2|TAYLOR SWIFT, WHO HAS 00:08:50,663|00:08:51,597|RU2|283 MILLION FOLLOWERS ON 00:08:51,664|00:08:52,798|RU2|INSTAGRAM. 00:08:52,865|00:08:54,133|RU2|>> SHE'LL PROBABLY PAY PRICE FOR 00:08:54,199|00:08:55,334|RU2|IT IN THE MARKETPLACE.
Chirp:
The Harris team is pushing for the two contenders to meet again in October, but Trump, who repeatedly fell into rhetorical traps set by Harris, today said he's not so sure. I think that uh, are we going to do a rematch? I just don't know, but we'll think about it. The former president tried. dismissed the surprise endorsement of Harris late Tuesday by popstar Taylor Swift, who has 283 million followers on Instagram. She probably pay a price for it at the uh in the marketplace.
Chirp 2:
The Harris team is pushing for the two contenders to meet again in October. But Trump, who repeatedly fell into rhetorical traps set by Harris, today said he's not so sure. The former president tried dismissed a surprise endorsement of Harris late Tuesday by pop star Taylor Swift, who has 283 million followers on Instagram. She'll pop it probably pay a price for it at the in the marketplace.
Chirp 2 drops the entire passage "I think that uh, are we going to do a rematch? I just don't know, but we'll think about it.". Surprisingly, it also "corrects" the phrase "the surprise" to "a surprise", which is statistically more common (8x more common), but is not what was spoken. This is one of the great dangers of models like Chirp 2 and Whisper: they change what was actually said into alternative wording that is more statistically common, but is not what was actually said. While in this case there is likely little difference between "a" and "the", we have found myriad examples where the meaning becomes substantially different, such as "the answer to the NATO threat to Putin" vs "the answer of NATO to Putin's threats." Yet, even in this simple example, imagine an exact match phrase search that is searching for all mentions of a given statement by a political leader: a simple word change of "a" to "the" would prevent a match, offering a reminder that even small changes can have big impacts.