Visual Explorer: A Layered Workflow For Detecting Corrupt & Empty Broadcasts Using FFMPEG & Chirp ASR

Kalev Leetaru

3 months ago

Any archive of television news coverage spanning hundreds of channels from more than 50 countries over more than a quarter century will inevitably contain some percentage of blank and corrupted broadcasts, from signal loss to recording glitches. As we prepare to relaunch the Visual Explorer, we have been working to detect these at scale across the archive. Complicating matters is the sheer variety of ways in which such broadcasts can be presented: detecting off-the-air broadcasts involves identifying blank, colorbars and tone, logo and music, ads, preview clips, subaudible tones and nearly every possible permutation of video and audio signal that can be imagined, multiplied by the varied broadcast practices of more than 50 nations that have changed repeatedly over the more than 25 years contained in the archive. Detecting corrupt broadcasts spans a visual and audible landscape as large as the broadcasting medium itself. Yet, with some clever observations we've been able to eliminate a vast swath of these problematic broadcasts.

The first stage of filtration occurs at the initial ingest stage. We first scan each MPG or MP4 container using ffprobe/ffmpeg t0 assess the existence, duration and specifications of its audio and video channels. In all, we discarded 155,184 broadcasts through this filtering:

96,247 (62%) of discarded broadcasts contained audio and video channels that had substantially different durations. This means that audio and video either are not synced (if one of the tracks begins later than the other) or are truncated. Differences of tens of seconds are accepted, but differences of tens of minutes typically indicates systematic corruption.
42,624 (27.5%) of discarded broadcasts had unrecoverable corruption such that ffmpeg was unable to successfully process one or both of the media streams regardless of format relaxation.
9,255 (5.9%) of discarded broadcasts contained a valid audio channel, but no video channel (not blank video, there is no video channel at all in the container). (The inverse, video without audio, is checked in the next stage of the pipeline during audio channel verification).
3,250 (2%) of discarded broadcasts contained a valid video channel, but reported a resolution of less than 100 x 100 pixels after SAR correction, which is below the known resolution of any broadcast channel in the collection over the 25 year period.
1,832 (1.2%) of discarded broadcasts had no available video container file or it was zero length.
1,732 (1.1% of discarded broadcasts contained valid audio and video streams but were shorter than 60 seconds, which we now remove since while potentially valid, these typically contain a variety of format corruption issues.
121 (0.07%) of discarded broadcasts reported video dimensions outside the known range of that broadcast channel (such as a 16K broadcast appearing on an SD channel in 2010) or outside the known scope of current video technology (such as a video reporting 32K or 64K resolution).
108 (0.07%) of discarded broadcasts had a server-unrecoverable video container file.
15 (0.009%) of discarded broadcasts had a valid container file that contained an unsupported video format.

The second stage involves more detailed assessment of the audio channel to determine if it should be passed onward to ASR. In all, we discarded 127,294 broadcasts at this stage. Note that only videos that have successfully passed the first stage filtering are examined here:

95,599 (75%) of discarded broadcasts are entirely or almost entirely devoid of any signal in their audio channel (they are silent). Here we use ffmpeg's "silencedetect" filter.
31,289 (24.5%) of discarded broadcasts contain an audio and video stream that are the same duration, but where the extractable audio stream is just a fraction of that length. For example, ffprobe reports that the container file contains a 30 minute audio stream, but upon extraction it encounters too many unrecoverable errors and is unable under any format relaxation to export more than a small portion of the stream.
406 (0.32%) of discarded broadcasts contain no audio channel at all.

Finally, after performing ASR, if it contained audio, but there is not even a single word of recoverable speech anywhere in the broadcast (typically instrumental music, massive audio channel corruption, excessive audio artifacts (clipping, dropout, improper expander/gate/compressor/equalization/levels/etc) or blank/colorbars with tones outside the range of human hearing), we mark the broadcast, discarding a final list of 99,662 broadcasts. A much larger set of broadcasts contain just a few recognizable words, but at present we retain these in the Visual Explorer without ASR transcripts.

Note that at this time we retain broadcasts that have valid recoverable audio, but corrupt or blank video. This means that broadcasts where the audio is transcribable, but the video is lost will remain in the Visual Explorer, since it is still possible to understand what the speakers are saying.

With just this relatively simple workflow we have been able to remove the majority of the blank and corrupt broadcasts from the archive.