Using FFMPEG's "BlackDetect" Filter To Identify Commercial Blocks

A common methodology for detecting commercials in television news is to scan for the tell-tale "fade to black" used to fade into and out of commercial breaks on most channels. It turns out that the venerable open source "ffmpeg" utility has a built-in filter for detecting such intervals called "blackdetect" (note that Debian users might need to compile a modern ffmpeg from source or use a precompiled binary). Applied to television news, it scans the entire broadcast and identifies all groups of frames in which each frame is more than the specified black threshold and the total duration of the video sequence is longer than the specified length of time.

Scanning a video for such transitions is as simple as:

ffmpeg -i ./VIDEO.mp4 -vf "blackdetect=d=0.05:pix_th=0.10" -an -f null - 2>&1 | grep blackdetect

In this example we scan for frames that are the default 0.10 pixel luminance value that defines "black" and at least 0.05 seconds in length. On a 64 core system it takes around 31 seconds to process an hour-long CNN broadcast.

For example, here are the results for the June 11, 2013 1-2PM PDT broadcast of The Lead With Jake Tapper broadcast that we previously scanned for commercials using the captioning mode field of its closed captioning:

black_start:660.927 black_end:661.194 black_duration:0.266933
black_start:663.663 black_end:665.398 black_duration:1.73507
black_start:691.024 black_end:691.09 black_duration:0.0667333
black_start:721.02 black_end:721.12 black_duration:0.1001
black_start:781.014 black_end:781.08 black_duration:0.0667333
black_start:796.029 black_end:796.095 black_duration:0.0667333
black_start:811.044 black_end:811.144 black_duration:0.1001
black_start:826.059 black_end:826.125 black_duration:0.0667333
black_start:841.04 black_end:842.942 black_duration:1.9019
black_start:1372.34 black_end:1373.34 black_duration:1.001
black_start:1382.05 black_end:1382.18 black_duration:0.133467
black_start:1403.2 black_end:1403.3 black_duration:0.1001
black_start:1418.22 black_end:1418.28 black_duration:0.0667333
black_start:1433.23 black_end:1433.37 black_duration:0.133467
black_start:1493.26 black_end:1493.36 black_duration:0.1001
black_start:1508.07 black_end:1508.54 black_duration:0.467133
black_start:1568.47 black_end:1569.8 black_duration:1.33467
black_start:1574.31 black_end:1575.27 black_duration:0.967633
black_start:2116.58 black_end:2116.75 black_duration:0.166833
black_start:2131.7 black_end:2131.83 black_duration:0.133467
black_start:2176.71 black_end:2176.84 black_duration:0.133467
black_start:2191.72 black_end:2191.82 black_duration:0.1001
black_start:2236.73 black_end:2236.87 black_duration:0.133467
black_start:2296.76 black_end:2298.56 black_duration:1.8018
black_start:2656.05 black_end:2656.42 black_duration:0.367033
black_start:2659.09 black_end:2659.22 black_duration:0.133467
black_start:2662.39 black_end:2662.46 black_duration:0.0667333
black_start:2664.9 black_end:2667.4 black_duration:2.5025
black_start:2669.1 black_end:2669.53 black_duration:0.433767
black_start:2674.34 black_end:2676.34 black_duration:2.002
black_start:2716.15 black_end:2716.25 black_duration:0.1001
black_start:2746.14 black_end:2746.24 black_duration:0.1001
black_start:2776.14 black_end:2776.27 black_duration:0.133467
black_start:2791.05 black_end:2791.52 black_duration:0.467133
black_start:2810.64 black_end:2810.84 black_duration:0.2002
black_start:2821.45 black_end:2821.59 black_duration:0.133467
black_start:2832.1 black_end:2832.23 black_duration:0.133467
black_start:2835.67 black_end:2836 black_duration:0.333667
black_start:2839.84 black_end:2839.9 black_duration:0.0667333
black_start:2845.31 black_end:2845.64 black_duration:0.333667
black_start:2851.52 black_end:2852.65 black_duration:1.13447
black_start:2857.19 black_end:2858.69 black_duration:1.5015
black_start:3061.59 black_end:3061.76 black_duration:0.166833
black_start:3091.66 black_end:3091.82 black_duration:0.166833
black_start:3121.65 black_end:3121.79 black_duration:0.133467
black_start:3151.72 black_end:3151.85 black_duration:0.133467
black_start:3196.66 black_end:3197.06 black_duration:0.4004
black_start:3226.96 black_end:3227.09 black_duration:0.133467
black_start:3256.99 black_end:3258.12 black_duration:1.13447
black_start:3262.69 black_end:3264.33 black_duration:1.63497
black_start:3449.08 black_end:3450.88 black_duration:1.8018
black_start:3480.78 black_end:3481.04 black_duration:0.266933
black_start:3510.77 black_end:3510.84 black_duration:0.0667333
black_start:3525.79 black_end:3525.92 black_duration:0.133467
black_start:3540.8 black_end:3540.87 black_duration:0.0667333
black_start:3600.83 black_end:3601 black_duration:0.166833
black_start:3605.8 black_end:3605.9 black_duration:0.1001
black_start:3615.85 black_end:3616.68 black_duration:0.834167

At first glance there are a lot of transitions identified in this list, but look specifically at the "black_duration" field which gives the length of the transition. Fades to/from commercial breaks are likely to be longer, especially a second or more. This yields the following transition points:

black_start:663.663 black_end:665.398 black_duration:1.73507
black_start:841.04 black_end:842.942 black_duration:1.9019
black_start:1372.34 black_end:1373.34 black_duration:1.001
black_start:1568.47 black_end:1569.8 black_duration:1.33467
black_start:1574.31 black_end:1575.27 black_duration:0.967633
black_start:2296.76 black_end:2298.56 black_duration:1.8018
black_start:2664.9 black_end:2667.4 black_duration:2.5025
black_start:2851.52 black_end:2852.65 black_duration:1.13447
black_start:2857.19 black_end:2858.69 black_duration:1.5015
black_start:3256.99 black_end:3258.12 black_duration:1.13447
black_start:3262.69 black_end:3264.33 black_duration:1.63497
black_start:3449.08 black_end:3450.88 black_duration:1.8018
black_start:3615.85 black_end:3616.68 black_duration:0.834167

Recall that using the closed captioning mode field, we previously derived the following commercial/news chronology for that broadcast:

Commercials Start: 667
News Resumes: 851
Commercials Start: 1375
News Resumes: 1582
Commercials Start: 2119
News Resumes: 2306
Commercials Start: 2659
News Resumes: 2864
Commercials Start: 3064
News Resumes: 3273
Commercials Start: 3455
News Resumes: 3629

Remember that closed captioning can be a few seconds delayed from the actual video since they are typed by human captioners transcribing the broadcast as they watch it, so the two sets of timecodes will be slightly off. Interweaving the two yields:

Commercials Start: 667
black_start:663.663 black_end:665.398 black_duration:1.73507

News Resumes: 851
black_start:841.04 black_end:842.942 black_duration:1.9019

Commercials Start: 1375
black_start:1372.34 black_end:1373.34 black_duration:1.001

News Resumes: 1582
black_start:1568.47 black_end:1569.8 black_duration:1.33467
black_start:1574.31 black_end:1575.27 black_duration:0.967633

Commercials Start: 2119

News Resumes: 2306
black_start:2296.76 black_end:2298.56 black_duration:1.8018

Commercials Start: 2659
black_start:2664.9 black_end:2667.4 black_duration:2.5025

News Resumes: 2864
black_start:2851.52 black_end:2852.65 black_duration:1.13447
black_start:2857.19 black_end:2858.69 black_duration:1.5015

Commercials Start: 3064

News Resumes: 3273
black_start:3256.99 black_end:3258.12 black_duration:1.13447
black_start:3262.69 black_end:3264.33 black_duration:1.63497

Commercials Start: 3455
black_start:3449.08 black_end:3450.88 black_duration:1.8018

News Resumes: 3629
black_start:3615.85 black_end:3616.68 black_duration:0.834167

Immediately clear is that the two datasets are not perfectly aligned. In areas they align exactly (adjusting for the ~10s delay of the captioning from the video), but in other areas they entirely diverge. Here are some of the key differences:

  • At timecode 1582 captioning shows a transition back to news, which the video analysis shows as well, but then video shows a second transition around 5 seconds later. It turns out this is a brief stock clip of Wolf Blitzer saying "This is CNN".
  • At timecode 2119, captioning shows a transition to commercial break, which the video analysis does not show. The video analysis shows a very brief transition of 0.166 seconds at this time ("black_start:2116.58 black_end:2116.75 black_duration:0.166833"). A look at the clip itself confirms that this is a commercial break, but that instead of a long pause, the video transitions to the commercial almost instantly. Most importantly, the sound from the news programming continues to play during the black fade, making it even more difficult to distinguish as a commercial break.
  • At timecode 2864, there are once again two long transition periods in the video analysis. This time it is a stock clip of Erin Burnett saying "This is CNN."
  • At timecode 3064, again there is near-instant transition from programming to commercial break, which captioning catches but is too brief to distinguish in the video analysis from an intra-clip fast cut.
  • At timecode 3273,┬áthere are once again two long transition periods in the video analysis. This time it is a stock clip of Anderson Cooper saying "This is CNN."

Of the 12 transitions above, 3 feature the "This is CNN" double-break and two commercial breaks are missed entirely, meaning there are issues with 44% of the divisions. Of the 6 commercial breaks, this approach misses a full third of them. Thus, there are challenges to rely exclusively on blackframe detection without secondary signals. In particular, it is from the two missed commercial breaks that ALL blackframe divisions, including extremely brief ones with audio from the previous block continuing to play through the divider, must be considered.

The consistency of the "This is CNN" breaks make them readily identifiable in this particular broadcast, but it is unclear whether this remains consistent over the years or whether a varying set of such special case filters would be needed to cover the entire 2009-2020 period using blackframe detection.

Most commercial breaks are bookended by second-long black screens, but as the broadcast above shows, not all commercial breaks are marked by such lengthy divisions and in at least one case the audio from the news programming extended over part of the transition period, making it even harder to distinguish as a commercial break.

Perhaps most problematically, however, is that there is no systematic difference between the blackframe period that begins a commercial break and that which ends it. In essence, blackframes are a universal "divider" without any indication of whether the divider marks the beginning or the end of the commercial break. Most commercial breaks appear to have rapid-fire scene changes and further blackframe separation, but overall it is clear that relying on visual cues alone will yield a much reduced accuracy rate compared with the captioning.

In the end, blackframe analysis by itself appears to struggle with reliably identifying commercial breaks, whereas caption mode information offers a flawless subdivision of the broadcast. Combined with the captioning mode data the blackframe timecodes could be used to time-shift the captioning to reduce the captioning delay and more precisely align it with the video, but it is clear that caption mode is a far more reliable tool for commercial detection.