As we prepare to bring the TV News Explorer back online, we needed a way to perform high-resolution categorization of television news broadcasts spanning over 150 languages from more than 50 countries across the past quarter-century to allow journalists and scholars to perform rich analyses of the world's biggest stories and their underlying narrative structures to better understand how news stories are told across the world. No human could ever watch millions of hours of global television news and manually categorize it second by second, but our work to date has demonstrated that modern reasoning models are exceptionally accurate at precisely this kind of categorization task and that with proper structuring and task scaffolding we can effectively eliminate measurable hallucination for the narrow task of categorization. In all, we used Gemini 2.5 Flash Thinking to "watch" via reading their ASR transcripts more than 7 billion seconds (1.94M hours) of global television news from 3.7M broadcasts spanning the past 25 years totaling 129B characters across 11.5B spoken words. Only the public Gemini 2.5 Flash Thinking model was used and no data was used to train or tune any model.
In all, those 11.5 billion words, when including prompt and JSON structuring overhead, consumed 50.1B input tokens, but given batch processing discounts, the input token consumption of the entire 7B seconds of airtime cost just $7,500. Extensive benchmarking demonstrated that analyzing broadcasts at 2 second resolution yielded the highest quality results. This required considerably higher token overhead than coarser batching, with the final token counts being 11.4B candidate tokens and 21.2B thinking tokens. Note that there were twice as many thinking as output tokens: this is in line with these kinds of complicated reasoning tasks and requires careful budgeting given that output and thinking tokens are priced at 8.16x the cost of input tokens, yielding a cost of $14,296 for the candidate tokens and $26,552 for the thinking tokens. In all, more than 82.7 billion tokens were processed.
In the end, having Gemini 2.5 Flash Thinking, one of the world's most advanced reasoning models, "watch" almost 2M hours of global television news by reading their ASR transcripts and categorize all 7B seconds at 2 second resolution cost just $48,367. That kind of unimaginable economy of scale suddenly makes it possible for libraries of all sizes to begin cataloging and categorizing their vast video archives for the first time to help journalists and scholars use them to explore the world's most important public interest questions.