The GDELT Project

A Digital Twin Glimpse At The Internet Archive's TV News Archive: 27 Billion Seconds Over 10.9 Million Broadcasts From 327 Channels In 50+ Countries & 150+ Languages Over A Quarter-Century

Using our new BigQuery + Bigtable GCS digital twin, we can look across the entire Internet Archive's TV News Archive to explore just how large it is. In all, the Archive consists of 27.8 billion seconds (463.5M minutes / 7.7M hours) of airtime from 10.9 million broadcasts (131,624 distinct show names) across 327 channels from more than 50 countries and territories in over 150 languages and dialects spanning nearly a quarter century, making it one of the largest global archives for journalists and scholars to understand how the world's television news outlets have covered the planet's biggest stories of the millennium's first quarter-century.

select
SUM(durSec) totSec, SUM(CASE WHEN status='SUCCESS' THEN durSec ELSE 0 END) totSecSuccess,
COUNT(1) totBroadcasts, COUNTIF(status='SUCCESS') totBroadcastsSuccess,
COUNT(DISTINCT(chan)) totChans, COUNT(DISTINCT IF(status='SUCCESS' AND chan IS NOT NULL, chan, NULL)) totChansSuccess,
COUNT(DISTINCT(showName)) totShowNames, COUNT(DISTINCT IF(status='SUCCESS' AND showName IS NOT NULL, showName, NULL)) totShowNamesSuccess,
 from (
select 
  SAFE_CAST(JSON_EXTRACT_SCALAR(DOWN, '$.durSec') AS FLOAT64) durSec, JSON_EXTRACT_SCALAR(DOWN, '$.status') status, JSON_EXTRACT_SCALAR(DOWN, '$.chan') chan, JSON_EXTRACT_SCALAR(DOWN, '$.metaProgram') showName, DOWN
   FROM (
  SELECT  
    rowkey,
    ( select array(select value from unnest(cell))[OFFSET(0)]  from unnest(cf.column) where name in ('DOWN') ) DOWN
  FROM `[PROJECTID].bigtableconnections.digtwin` where SAFE_CAST(substr(rowkey, 0, 8) as NUMERIC) > 20000000 
 )  
)