Continue Reading

Snopes: Fact Check: Trump WH Falsely Claimed USAID Funded 'Transgender Comic Book' In Peru

Snopes uses the TV News Archive in its fact check of claims about USAID funding. Read The Full Article.

Continue Reading

Behind The Scenes: A First Glimpse At ASR Statistics From 2.5 Million Hours Of Global TV News Spanning 50 Countries & A Quarter Century

Last year we announced the successful completion of Large Speech Model (LSM)-powered ASR over the totality of the uncaptioned Television…

Continue Reading

Behind The Scenes: A Look At 16 Years Of Advertising Density On Television News

We are tremendously excited to announce today the completion of our analysis of captioning mode information across the totality of…

Continue Reading

Behind The Scenes: 1.9 Million Hours & 13.8 Billion Words Of Closed Captioning Spanning 17 Years Of Television News

Yesterday we previewed some initial statistics from our work identifying and removing advertisements from closed captioning transcripts across the TV…

Continue Reading

Behind The Scenes: Some Initial Archive-Scale Closed Captioning Statistics

Only a portion of the TV News Archive's broadcasts contain broadcaster-provided closed captioning, but by virtue of being largely human-transcribed…

Continue Reading

At-Scale OCR Of Television News: 18.8 Billion Seconds Of Global Television News OCR'd For $71K Vs $47M

We are tremendously excited to announce today that in collaboration with the Internet Archive's Television News Archive, we have completed…

Continue Reading

Behind The Scenes: Identifying Mismatches Between Expected And Real Video File Durations & Single Version Of The Truth (SVOT)

One of the most complex and time-consuming aspects of working with vast historical archives is diagnosing and addressing the myriad…

Continue Reading

At-Scale OCR Of Television News Experiments: OCR Of Interlaced Video Using GCP's Cloud Vision

Amongst the TV News Archive's quarter-century of global broadcasts are interlaced broadcasts, which produce the tell-tale jagged ghosting seen below…

Continue Reading

At-Scale OCR Of Television News Experiments: Optimizing The Still Frame File Storage Format

Analyzing petascale video archives poses unique computational challenges, from the underlying processor and accelerator requirements to simply moving that much…

Continue Reading

From LSM's To LMMs For ASR: Evaluating Gemini's Performance At Transcribing An Evening News Broadcast

As we continue to evaluate the rapid progress of large model ASR systems, from lightly to heavily generative LSMs to…

Continue Reading

Comparing GCP's Chirp & Chirp 2 ASR Models: Dropping Entire Passages

Yesterday we examined how GCP's new Chirp 2 ASR model hallucinates speech during non-verbal musical interludes in news broadcasts, resulting…

Continue Reading

Comparing GCP's Chirp & Chirp 2 ASR Models: Hallucinating Speech During Music

Over the past six months we have continued to compare GCP's Chirp and Chirp 2 ASR models, each time finding…

Continue Reading

Audience-Specific Podcasts: Customizing Our Daily "Top Stories" Biosurveillance Podcast Concept For Experts, Policymakers & The American Public

Yesterday we demonstrated feeding a daily roundup of global disease outbreak news headlines from around the world into a "thinking"…

Continue Reading

A Daily "Top Stories" Global Disease Outbreak Podcast Concept Using GCP's Gemini 2.0 Thinking + Text-to-Speech API

What might it look like to feed a daily roundup of global disease outbreak news headlines in all the world's…

Continue Reading

Using GCP's Chirp + Gemini 1.5 Pro + Speech-To-Text API To Summarize A Day Of Russian TV News Into A 3 Minute "Top Stories" Podcast

What might it look like to use GCP's Speech-to-Text API's Chirp LSM model to machine transcribe a full day of…

Continue Reading

A Daily "Top Stories" Global Investment News Podcast Concept Using GCP's Gemini 2.0 Thinking + Text-to-Speech API

What might it look like to feed a daily roundup of global investment news headlines in all the world's languages…

Continue Reading

A Daily "Top Stories About NVIDIA" News Podcast Concept Using GCP's Gemini 2.0 Thinking + Text-to-Speech API

What might it look like to feed a daily roundup of news headlines about NVIDIA from across the world in…

Continue Reading

At-Scale OCR Of Television News Experiments: OCR'ing 10 Billion Seconds Of Global TV News For Just $47.5K Vs $26.9M

In collaboration with the Internet Archive's Television News Archive, we have successfully OCR'd 4.2 million television news broadcasts from around…

Continue Reading

Behind The Scenes: Identifying Failed Recordings: Using Large Multimodal Modals Like ChatGPT & Gemini: Part 3

Continuing our series examining whether Large Multimodal Models (LMMs) like ChatGPT and Gemini might be able to help us identify…

Continue Reading

Behind The Scenes: Identifying Failed Recordings: Using Large Multimodal Modals Like ChatGPT & Gemini: Part 2

Earlier this week we demonstrated the limitations of using Large Multimodal Models (LMMs) like ChatGPT and Gemini to detect corrupted…

Continue Reading

Behind The Scenes: Identifying Failed Recordings: Using Large Multimodal Modals Like ChatGPT & Gemini

As we continue our efforts to scan the TV News Archive for failed recordings, how might Large Multimodal Models (LMMs)…

Continue Reading

The Influence of Media Propaganda on Green Housing Consumption in China Based on GDELT Big Data

As part of China's two-carbon strategy, green buildings are a vital component in addressing climate change.Formulating a media propaganda strategy to…

Continue Reading

Behind The Scenes: Identifying Failed Recordings: Examining Curation Metadata

Any large longitudinal audiovisual archive will have some number of recordings that suffer from technical errors, ranging from minor audio…

Continue Reading

Behind The Scenes: API Quotas & The Impact Of A Fraction Of A QPS

All hosted APIs have rate-limited quotas of some form to protect them from abuse and to ensure equal sharing of…