The GDELT Project

Using The Advertising Airtime Dataset To Explain 2009's Uncaptioned Airtime Trends

What explains the large percentage of uncaptioned television news airtime in 2009 that skews our commercial airtime analyses? Using the Advertising Airtime Dataset it is straightforward to dive into the causatives of the underlying trends we find.

Recall that this query gives us the percentage of the airtime on CNN each day that is uncaptioned or is advertising content since 2009:

SELECT DAY, SUM(UNCAPSEC) UNCAPSEC, SUM(ADSEC) ADSEC, SUM(TOTSEC) TOTSEC, SUM(UNCAPSEC)/SUM(TOTSEC)*100 PERC_UNCAPTIME, SUM(ADSEC)/SUM(TOTSEC)*100 PERC_ADTIME FROM (
SELECT DATE(date) DAY, count(1) UNCAPSEC, 0 ADSEC, 0 TOTSEC FROM `[TEMPTABLE]` where station='FOXNEWS' and (type='UNCAPTIONED') AND DATE(date) >= "2009-07-02" group by DAY
UNION ALL
SELECT DATE(date) DAY, 0 UNCAPSEC, count(1) ADSEC, 0 TOTSEC FROM `[TEMPTABLE]` where station='FOXNEWS' and (type='ADVERTISEMENT') AND DATE(date) >= "2009-07-02" group by DAY
UNION ALL
SELECT DATE(date) DAY, 0 ADSEC, 0 ADSEC, count(1) TOTSEC FROM `[TEMPTABLE]` where station='FOXNEWS' AND DATE(date) >= "2009-07-02" group by DAY
) group by DAY having TOTSEC>=64800 order by DAY ASC

Using the results we can see that on July 4, 2009, a total of 24,950 out of 83,263 seconds of CNN's airtime were uncaptioned, representing around 29.9% of the day's broadcast time. Where is all of this uncaptioned airtime coming from? Recall that if a show has no captioning at all it is excluded from these results.

We can use the following query to tally the top broadcasts on CNN that day that had the most uncaptioned airtime:

SELECT iaShowId, count(1) cnt FROM `[TEMPTABLE]` where station='CNN' and DATE(Date) = "2009-07-04" and type='UNCAPTIONED' group by iaShowId order by cnt desc

The top few results from this query include the following:

iaShowId cnt
CNN_20090704_070000_Larry_King_Live 3106
CNN_20090704_060000_Anderson_Cooper_360 2809
CNN_20090704_090000_Larry_King_Live 2478
CNN_20090704_080000_Lou_Dobbs_Tonight 2307

Look at each of the videos above and you will see that they are almost entirely devoid of captioning, meaning there was either a broadcast or a capture error. However, unlike a broadcast in which all of the captioning is missing, which would have been discarded from this dataset, there are sporadic snippets of captioning through the broadcast, so it was flagged as having captioning.

We can drill into the top result to confirm that for the captioning that is there the "type" field correctly differentiates news versus advertising content:

SELECT * FROM `[TEMPTABLE]` where DATE(date) = '2009-07-04' and station='CNN' and iaShowId='CNN_20090704_070000_Larry_King_Live' and type!='UNCAPTIONED' order by date asc

So, it appears that the driving force here is that there may have been an elevated number of broadcasts in this early period lacking captioning. We will be refining the AIF dataset to exclude broadcasts like this that include only a few snippets of captioning which should fix this problem.