Using GCP's Chirp + Gemini 1.5 Pro + Speech-To-Text API To Summarize A Day Of Russian TV News Into A 3 Minute "Top Stories" Podcast

What might it look like to use GCP's Speech-to-Text API's Chirp LSM model to machine transcribe a full day of a Russian television news channel, then feed that transcript into Gemini 1.5 Pro to generate a short summary of the day's top stories, then feed that summary into GCP's Text-to-Speech API to generate a human-sounding spoken version of that summary: in essence a pipeline that begins with a day of television news coverage and ends with a brief 3-minute podcast digest of the top headlines to let viewers know what to tune into? News organizations could leverage such pipelines to instantly deploy their own custom podcast services, combining them with user preferences to offer per-user bespoke podcasts, all with just a few lines of code. Here we unveil a first glimpse of these results summarizing 24 hours of a single Russian television news channel on October 1st, 2024 and comparing a variety of GCP's TTS humanlike voice models, including Casual, Journey, Neural2, News, Studio, and Wavenet voices, each with its own distinctive characteristics, to examine what this might look like for a newsroom. With relatively little coding, broadcast newsrooms could use workflows like this to readily add customized podcasts to their existing media workflows.

For those interested in jumping ahead, you can listen to the final results below – each of the recordings below represents a different TTS voice model. All of them narrate the same ~500 word digested summary, but have very different "personalities" and characteristics. We focused primarily on "male" voices for this sample due to feedback that they sounded more natural and less robotic and mechanized then the current set of TTS "female" voices. You can learn more about each of these models, as well as listen to samples of the complete list of all available models and supported languages in the TTS Supported Voices and Languages documentation and Types Of Voices documentation that gives more detail about the model classes.

TTS Model: en-US-Casual-K

TTS Model: en-US-Journey-D

TTS Model: en-US-Journey-F

TTS Model: en-US-Journey-O ("Female")

TTS Model: en-US-Neural2-J

TTS Model: en-US-News-N

TTS Model: en-US-Polyglot-1

TTS Model: en-US-Studio-Q

TTS Model: en-US-Wavenet-D

TTS Model: en-US-Wavenet-H-Female ("Female")

TTS Model: en-US-Wavenet-J

 

To generate the machine transcripts of the full day of coverage, GCP's Cloud Speech-to-Text's Chirp LSM model was used and then translated into English using the GCP Cloud Translation API. The resulting full-day transcript was then analyzed through Gemini 1.5 Pro using the following prompt:

Attached is an English transcript of a day of Russian television news. Summarize all of the news stories into around 500 words of a digestible summary suitable for a podcast summary of the day.

This yielded the following summary:

Headlines from Russia, October 1st, 2024:

International Tensions Flare:

Israel launched a limited ground operation in southern Lebanon, striking dozens of villages and prompting evacuations in Beirut. Rumors of Israeli tanks crossing the border circulated, though unconfirmed. International calls for a ceasefire were largely ignored, with Israel citing its right to self-defense against Hezbollah attacks.

Iran responded to the escalating conflict by launching hundreds of ballistic missiles at Israel, claiming to have destroyed military and infrastructure targets. While Israel claimed its Iron Dome defense system intercepted most missiles, the scale of the attack raised questions about the system's effectiveness against a sophisticated barrage. This sparked concerns about a wider regional conflict, with Iran threatening further retaliation against any Israeli response.

The US pledged support to Israel, while condemning Iran's actions and calling for de-escalation. Russia denounced the strikes on Lebanon, warning of further destabilization in the region.

The SVR (Foreign Intelligence Service) accused the US and UK of planning a chemical weapons provocation in Syria to discredit the Syrian and Russian militaries.

Domestic Affairs:

Russia marked Ground Forces Day with ceremonies and the awarding of medals to soldiers in the special military operation zone. A captured German Leopard tank was brought to Uralvagonzavod for analysis.

The autumn conscription began, with 133,000 recruits expected to join the armed forces. Officials emphasized conscripts would not be deployed to the special military operation zone.

Several new laws came into effect, including increased utilization fees for imported cars, new regulations for the taxi industry (requiring additional insurance), and a ban on adoptions by citizens of countries that permit gender reassignment.

Prime Minister Mishustin attended the Eurasian Economic Forum in Yerevan, emphasizing the importance of digitalization, technological sovereignty, and food security within the Eurasian Economic Union.

The FSB launched a large-scale operation against radical internet communities, detaining dozens accused of recruiting teenagers for terrorist acts, often using the Discord messenger platform.

Multiple high-profile corruption investigations were underway, including the arrest of the mayor of Izberbash for bribery and the detention of a former Russian Guard colonel for inflating prices in defense contracts.

Wildfires continued to rage in the Luhansk People's Republic and the Rostov region, with some officials blaming arson. Strong winds and drought hampered firefighting efforts.

Other News:

MGIMO announced plans to open a branch in Kazakhstan.

Russian Railways CEO Oleg Belozerov was elected president of the newly unified Russian Gymnastics Federation.

Musician Vyacheslav Dobrynin passed away at age 78.

The news from Russia this day was dominated by escalating international tensions in the Middle East and domestic concerns about security and political stability.

We then rendered the text into speech using GCP's Speech-to-Text API. To make it easier to experiment with various options, we used the Cloud Console STT sandbox, which allows you to copy-paste up to 5000 bytes of text, select the desired voice and adjust various options and parameters.