The GDELT Project

Compiling A List Of All Non-News Broadcasts In The TV News Archive From Business Channels Over The Past Decade

The Internet Archive's TV News Archive uses a set of human-curated filters to classify each broadcast it monitors as "news" or "not news" using keyword filters over the EPG show title information. Over time these filters can become outdated or miss novel show titles, with business channels being especially problematic. To seed work in automated title-based classification and filtering, we've compiled a master list of all of the unique EPG show titles for each of the three business channels in the TV News Archive (BLOOMBERG, CNBC and FBC) that were classified as non-news by the Archive's filters.

For the technically-minded, here is the underlying SQL query:

select showName, count(1) cnt from (
select 
  JSON_EXTRACT_SCALAR(DOWN, '$.metaProgram') showName, DOWN
   FROM (
  SELECT  
    rowkey,
    ( select array(select value from unnest(cell))[OFFSET(0)]  from unnest(cf.column) where name in ('DOWN') ) DOWN
  FROM `[PROJECTID].bigtableconnections.digtwin` where  CAST(substr(rowkey, 0, 8) as NUMERIC) > 20000000 
 ) 
) where JSON_EXTRACT_SCALAR(DOWN, '$.status') like '%FAILPERM_NOTNEWS%' AND JSON_EXTRACT_SCALAR(DOWN, '$.chan') in ('BLOOMBERG') group by showName order by cnt desc