With our new BigQuery + Bigtable digital twin over our GCS archive, we can trivially compile ongoing inventories of our archive over time and perform whole-of-archive sweeps to identify and retry failed ingests. For example, we can use the following SQL to compile a breakdown by status code:
select status, count(1) cnt from (
select
JSON_EXTRACT_SCALAR(DOWN, '$.status') status, DOWN
FROM (
SELECT
rowkey,
( select array(select value from unnest(cell))[OFFSET(0)] from unnest(cf.column) where name in ('DOWN') ) DOWN
FROM `[PROJECTID].bigtableconnections.digtwin` where SAFE_CAST(substr(rowkey, 0, 8) as NUMERIC) > 20000000
)
) group by status order by cnt desc
Yielding:
| status | cnt |
| SUCCESS | 7511136 |
| FAILPERM_CORRUPT | 38507 |
| FAILTMP_SRTDOWNFAIL | 5998 |
| FAILPERM_AVCHECKCORRUPT | 3442 |
| FAILPERM_NOEXISTS | 1029 |
| FAILTMP_SRTNOTREADY | 21 |
| FAILTMP_VIDDOWN | 14 |
| FAILTMP_NOVALIDVIDEOFORMAT | 13 |
| READY | 11 |
| FAILTMP_VIDMD5MISMATCH | 11 |
| FAILTMP_VIDNOEXIST | 4 |
| FAILTMP_GCSRENAME | 2 |
Similarly, we can compile a by-channel breakdown:
select chan, count(1) cnt from (
select
JSON_EXTRACT_SCALAR(DOWN, '$.chan') chan, DOWN
FROM (
SELECT
rowkey,
( select array(select value from unnest(cell))[OFFSET(0)] from unnest(cf.column) where name in ('DOWN') ) DOWN
FROM `[PROJECTID].bigtableconnections.digtwin` where SAFE_CAST(substr(rowkey, 0, 8) as NUMERIC) > 20000000
)
) group by chan order by cnt desc
This returns 327 total channels. However, at least a portion of these channels were brief monitoring experiments consisting of only a handful of shows that encountered technical issues. Thus, to arrive at the total number of channels containing at least one successfully monitored broadcast we can simply add a status filter:
select chan, count(1) cnt from (
select
JSON_EXTRACT_SCALAR(DOWN, '$.chan') chan, DOWN
FROM (
SELECT
rowkey,
( select array(select value from unnest(cell))[OFFSET(0)] from unnest(cf.column) where name in ('DOWN') ) DOWN
FROM `[PROJECTID].bigtableconnections.digtwin` where SAFE_CAST(substr(rowkey, 0, 8) as NUMERIC) > 20000000
) where JSON_EXTRACT_SCALAR(DOWN, '$.status') = 'SUCCESS'
) group by chan order by cnt desc
This yields the ultimate total: 260 channels!
