Compiling A List Of All Non-News Broadcasts In The TV News Archive From Business Channels Over The Past Decade: Part 2
Last month we compiled a list of distinct show names marked as non-news from each of the three business news…
WashPost: Mark Robinson Offers Up The 2024 Version Of The I-Was-Hacked Defense
The Washington Post's Philip Bump examines media coverage of Mark Robinson. Read The Full Article.
Using Our BigQuery + Bigtable + GCS Digital Twin To Track Historical Backfilling Progress
With our new BigQuery + Bigtable digital twin over our GCS archive, we can trivially compile ongoing inventories of our…
Experiments With CCExtractor Using Our BigQuery + Bigtable + GCS Digital Twin
In December 2020 we unveiled a massive new initiative in collaboration with the Internet Archive's TV News Archive to catalog…
Using Our BigQuery + Bigtable + GCS Digital Twin To Make Date-Based Random Samples For Content Analysis & Testing
A key concept in "content analysis" methodologies over large temporally diverse archives is the notion of time-based random samples: creating…
Using Our BigQuery + Bigtable + GCS Digital Twin To Identify Missing Channels
One of the most powerful aspects of our BigQuery-analyzable Bigtable-based GCS digital twin is the capability it makes possible to…
CJR: Can Kamala Harris Use The Debate To Keep Her Media Momentum?
Meghnad Bose and Dhrumil Mehta examine media coverage of Kamala Harris in a piece for Columbia Journalism Review (CJR). Read…
How Much Attention Have Presidential Year Debates Gotten On TV News Over The Years? Hint: 2012 Was The Recent Peak
How much attention do presidential year debates get on television news? The timeline below shows total mentions of "debate" across…
How Are Business Television News Channels Covering Bitcoin & Crypto?
The timeline below shows the percentage of daily airtime (in 15 sec blocks) across Bloomberg, CNBC and Fox Business over…
Leveraging Bigtable For Highly Scalable Digital Twin Architectures
As we continue to load our entire historical GCS archive into our Bigtable digital twin, BigTable's remarkable scalability has allowed…
The Covid-Era Focus On "Experts" Has Faded
During the pandemic, mentions of "experts" were everywhere as news media emphasized the credentials and expertise of those they interviewed….
Scaling In The Cloud: Storing Billions Of Files Totaling Petabytes In GCS
One of the most remarkable aspects of working at "cloud scale" is the sheer scalability of the modern public cloud….
OCR'ing Television News: Comparing GCP Cloud Vision API, Paligemma, Tesseract, Gemini 1.5 Pro, Gemini 1.5 Flash & GPT 4o
Television news in a number of countries contains copious onscreen text scattered across multiple locations on the screen, in multiple…
Leveraging Bigtable's Versioning To Visualize An Evolving Video Archive Over Time & Prioritize Reprocessing
Yesterday we examined how we use BigQuery to perform archive-scale operational summaries of our Bigtable-based digital twin to visualize the…
Using Our BigQuery + Bigtable + GCS Digital Twin To Map The Status & Error Codes Of Analyzing A Quarter-Century Of The TV News Archive
Making it possible for us to perform archive-scale analyses over the massive Internet Archive TV News Archive lies a powerful…
A Timeline Of The TV News Archive: 214 Channels Over A Quarter Century
Earlier this week we made a table tallying the total number of broadcasts and seconds of airtime preserved by the…
The Challenges Of Multilingualism In The The Large Model Era: Using LSMs & LMMs To Transcribe An Amharic Broadcast
The Internet Archive's TV News Archive spans more than 2.5 million hours of global television news in 150 languages spanning…
Cataloging The TV News Archive By Channel Over The Past Quarter-Century: 8 Million Broadcasts & 5.65M Hours From 214 Channels
As we continue to analyze, explore and examine the Internet Archive's TV News Archive's insights into our global world, we…
What OpenAI's Whisper Teaches Us About The Dependence Of The Large Models Revolution On YouTube
The Internet Archive's TV News Archive spans millions of hours of television news programming from across the world in more…
Charting The Internet Archive TV News Archive's Collection By Location Over The Past Quarter-Century
The Internet Archive's TV News Archive has preserved television news coverage from more than 50 countries and territories over the…
More Experiments In LLM Filtering Of TV News Shows: Adding More Detail Doesn't Improve Consistency Or Accuracy
Continuing our experiments in using SOTA foundation model LLMs to categorize television news shows into "news" and "not news", earlier…
Charting The TV News Archive's Transition From SD To HD Video Resolution
Earlier this week we examined the storage growth of the Internet Archive's TV News Archive over the past quarter-century, charting…
Charting The TV News Archive's Belarusian, Russian & Ukrainian Archive
Continuing our series using BigQuery to analyze our Bigtable-based GCS digital twin, how can we use this same approach to…
Plotting Cumulative Archival Growth Using Our BigQuery + Bigtable + GCS Digital Twin
On Monday, we explored how BigQuery can be combined with Bigtable to create a digital twin over a vast GCS…