What explains the large percentage of uncaptioned television news airtime in 2009 that skews our commercial airtime analyses? Using the Advertising Airtime Dataset it is straightforward to dive into the causatives of the underlying trends we find.
Recall that this query gives us the percentage of the airtime on CNN each day that is uncaptioned or is advertising content since 2009:
SELECT DAY, SUM(UNCAPSEC) UNCAPSEC, SUM(ADSEC) ADSEC, SUM(TOTSEC) TOTSEC, SUM(UNCAPSEC)/SUM(TOTSEC)*100 PERC_UNCAPTIME, SUM(ADSEC)/SUM(TOTSEC)*100 PERC_ADTIME FROM ( SELECT DATE(date) DAY, count(1) UNCAPSEC, 0 ADSEC, 0 TOTSEC FROM `[TEMPTABLE]` where station='FOXNEWS' and (type='UNCAPTIONED') AND DATE(date) >= "2009-07-02" group by DAY UNION ALL SELECT DATE(date) DAY, 0 UNCAPSEC, count(1) ADSEC, 0 TOTSEC FROM `[TEMPTABLE]` where station='FOXNEWS' and (type='ADVERTISEMENT') AND DATE(date) >= "2009-07-02" group by DAY UNION ALL SELECT DATE(date) DAY, 0 ADSEC, 0 ADSEC, count(1) TOTSEC FROM `[TEMPTABLE]` where station='FOXNEWS' AND DATE(date) >= "2009-07-02" group by DAY ) group by DAY having TOTSEC>=64800 order by DAY ASC
Using the results we can see that on July 4, 2009, a total of 24,950 out of 83,263 seconds of CNN's airtime were uncaptioned, representing around 29.9% of the day's broadcast time. Where is all of this uncaptioned airtime coming from? Recall that if a show has no captioning at all it is excluded from these results.
We can use the following query to tally the top broadcasts on CNN that day that had the most uncaptioned airtime:
SELECT iaShowId, count(1) cnt FROM `[TEMPTABLE]` where station='CNN' and DATE(Date) = "2009-07-04" and type='UNCAPTIONED' group by iaShowId order by cnt desc
The top few results from this query include the following:
iaShowId | cnt |
CNN_20090704_070000_Larry_King_Live | 3106 |
CNN_20090704_060000_Anderson_Cooper_360 | 2809 |
CNN_20090704_090000_Larry_King_Live | 2478 |
CNN_20090704_080000_Lou_Dobbs_Tonight | 2307 |
Look at each of the videos above and you will see that they are almost entirely devoid of captioning, meaning there was either a broadcast or a capture error. However, unlike a broadcast in which all of the captioning is missing, which would have been discarded from this dataset, there are sporadic snippets of captioning through the broadcast, so it was flagged as having captioning.
We can drill into the top result to confirm that for the captioning that is there the "type" field correctly differentiates news versus advertising content:
SELECT * FROM `[TEMPTABLE]` where DATE(date) = '2009-07-04' and station='CNN' and iaShowId='CNN_20090704_070000_Larry_King_Live' and type!='UNCAPTIONED' order by date asc
So, it appears that the driving force here is that there may have been an elevated number of broadcasts in this early period lacking captioning. We will be refining the AIF dataset to exclude broadcasts like this that include only a few snippets of captioning which should fix this problem.