Continue Reading

Compiling A List Of All Non-News Broadcasts In The TV News Archive From Business Channels Over The Past Decade: Part 2

Last month we compiled a list of distinct show names marked as non-news from each of the three business news…

Continue Reading

WashPost: Mark Robinson Offers Up The 2024 Version Of The I-Was-Hacked Defense

The Washington Post's Philip Bump examines media coverage of Mark Robinson. Read The Full Article.

Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Track Historical Backfilling Progress

With our new BigQuery + Bigtable digital twin over our GCS archive, we can trivially compile ongoing inventories of our…

Continue Reading

Experiments With CCExtractor Using Our BigQuery + Bigtable + GCS Digital Twin

In December 2020 we unveiled a massive new initiative in collaboration with the Internet Archive's TV News Archive to catalog…

Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Make Date-Based Random Samples For Content Analysis & Testing

A key concept in "content analysis" methodologies over large temporally diverse archives is the notion of time-based random samples: creating…

Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Identify Missing Channels

One of the most powerful aspects of our BigQuery-analyzable Bigtable-based GCS digital twin is the capability it makes possible to…

Continue Reading

CJR: Can Kamala Harris Use The Debate To Keep Her Media Momentum?

Meghnad Bose and Dhrumil Mehta examine media coverage of Kamala Harris in a piece for Columbia Journalism Review (CJR). Read…

Continue Reading

How Much Attention Have Presidential Year Debates Gotten On TV News Over The Years? Hint: 2012 Was The Recent Peak

How much attention do presidential year debates get on television news? The timeline below shows total mentions of "debate" across…

Continue Reading

How Are Business Television News Channels Covering Bitcoin & Crypto?

The timeline below shows the percentage of daily airtime (in 15 sec blocks) across Bloomberg, CNBC and Fox Business over…

Continue Reading

Leveraging Bigtable For Highly Scalable Digital Twin Architectures

As we continue to load our entire historical GCS archive into our Bigtable digital twin, BigTable's remarkable scalability has allowed…

Continue Reading

The Covid-Era Focus On "Experts" Has Faded

During the pandemic, mentions of "experts" were everywhere as news media emphasized the credentials and expertise of those they interviewed….

Continue Reading

Scaling In The Cloud: Storing Billions Of Files Totaling Petabytes In GCS

One of the most remarkable aspects of working at "cloud scale" is the sheer scalability of the modern public cloud….

Continue Reading

OCR'ing Television News: Comparing GCP Cloud Vision API, Paligemma, Tesseract, Gemini 1.5 Pro, Gemini 1.5 Flash & GPT 4o

Television news in a number of countries contains copious onscreen text scattered across multiple locations on the screen, in multiple…

Continue Reading

Leveraging Bigtable's Versioning To Visualize An Evolving Video Archive Over Time & Prioritize Reprocessing

Yesterday we examined how we use BigQuery to perform archive-scale operational summaries of our Bigtable-based digital twin to visualize the…

Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Map The Status & Error Codes Of Analyzing A Quarter-Century Of The TV News Archive

Making it possible for us to perform archive-scale analyses over the massive Internet Archive TV News Archive lies a powerful…

Continue Reading

A Timeline Of The TV News Archive: 214 Channels Over A Quarter Century

Earlier this week we made a table tallying the total number of broadcasts and seconds of airtime preserved by the…

Continue Reading

The Challenges Of Multilingualism In The The Large Model Era: Using LSMs & LMMs To Transcribe An Amharic Broadcast

The Internet Archive's TV News Archive spans more than 2.5 million hours of global television news in 150 languages spanning…

Continue Reading

Cataloging The TV News Archive By Channel Over The Past Quarter-Century: 8 Million Broadcasts & 5.65M Hours From 214 Channels

As we continue to analyze, explore and examine the Internet Archive's TV News Archive's insights into our global world, we…

Continue Reading

What OpenAI's Whisper Teaches Us About The Dependence Of The Large Models Revolution On YouTube

The Internet Archive's TV News Archive spans millions of hours of television news programming from across the world in more…

Continue Reading

Charting The Internet Archive TV News Archive's Collection By Location Over The Past Quarter-Century

The Internet Archive's TV News Archive has preserved television news coverage from more than 50 countries and territories over the…

Continue Reading

More Experiments In LLM Filtering Of TV News Shows: Adding More Detail Doesn't Improve Consistency Or Accuracy

Continuing our experiments in using SOTA foundation model LLMs to categorize television news shows into "news" and "not news", earlier…

Continue Reading

Charting The TV News Archive's Transition From SD To HD Video Resolution

Earlier this week we examined the storage growth of the Internet Archive's TV News Archive over the past quarter-century, charting…

Continue Reading

Charting The TV News Archive's Belarusian, Russian & Ukrainian Archive

Continuing our series using BigQuery to analyze our Bigtable-based GCS digital twin, how can we use this same approach to…

Continue Reading

Plotting Cumulative Archival Growth Using Our BigQuery + Bigtable + GCS Digital Twin

On Monday, we explored how BigQuery can be combined with Bigtable to create a digital twin over a vast GCS…