How might replacing our current single-frame television news broadcast thumbnails with montage thumbnails of algorithmically-selected representative frames from throughout a broadcast vastly improve our current Visual Explorer interface? In short, for some channels the current approach of using a single frame roughly one minute into the broadcast as the default thumbnail for each broadcast fails miserably, yielding an endless sea of the same uniform image from every broadcast. Replacing this with a single representative frame drawn algorithmically from the first few minutes of each broadcast yields vastly superior results. Yet, while visually understandable, it is nearly impossible to distill the wide-ranging narratives of a television news broadcast that may cover many different stories into a single image that captures the entire gist of the broadcast. Instead, what if we divide the broadcast into 9, 12 or 16 equal-sized chunks of time and selected the most representative frame from each of those chunks and then formed those into a composite thumbnail montage? While making it impossible to discern any given subframe, such montages offer an at-a-glance assessment of the overall visual narratives of a broadcast, such as whether it is mostly in a studio or filmed in the field, the kinds of imagery it features and some understanding of its topical emphasis.
Below is the current Visual Explorer thumbnail sequence for a single day of a Chinese television news channel. Unfortunately, the current approach of selecting a frame at roughly around the one minute mark yields a monotonous sequence of nearly identical thumbnails that offer absolutely no insight into the stories covered in each broadcast.
What if we instead replaced these thumbnails with a single algorithmically-selected frame from the first 100 1/4fps frames (the first 6.6 minutes of the broadcast)? This yields vastly superior results, with visible and understandable thumbnails for each broadcast in place of the uniform sea of the current interface:
At the same time, it is often impossible to distill the complex narrative arc of an entire television news broadcast into a single image that might reflect just a single story out of many in that broadcast. What if we instead represent each broadcast as a montage of frames, drawn from across the broadcast? Let's start with our 9-frame algorithmic montages at 200 pixel resolution:
How about our 12-frame montages?
And 16 frame?