Analyzing petascale video archives poses unique computational challenges, from the underlying processor and accelerator requirements to simply moving that much data across the network from storage to the various compute infrastructure components. In fact, many of the most robust and feature-rich computer vision systems are designed for still imagery, including most foundational LMMs, which decompose uploaded video files into sequences of still frames. Even medium-agnostic systems like OCR can achieve orders of magnitude cost savings when applied to still image surrogates. This means that there are considerable benefits to creating a 1fps still frame surrogate for each video file in which we sample the video at one frame per second and save the frame sequence alongside the video to be used for the vast majority of analytic tasks that don't require actual moving video input. Which file format yields the best tradeoff of image quality and file compression to minimize the storage requirements of these 1fps surrogates while maintaining sufficient image quality to yield results comparable to the original source material with most vision systems? After extensive testing with commonly recommended formats like JPEG2000, WebP and AVIF through a variety of toolkits, we find that JPEG images yield the best overall results when used with a quality setting of 6 with ffmpeg. While other formats yield reduced filesizes, they dramatically increase the compute time, with a ratio of compute-to-compression that is far less favorable than the JPEG images. For example, for an hour-long HD broadcast, the original source JPEGs total 223MB, while compressing to AVIF format reduces this to 187MB (a 17.5% reduction), while taking 275m (4.5 hours) – an unacceptable tradeoff in this case.
Below you can see the results of each experiment, including the filesize of the source MP4, the size of the source JPGs output by ffmpeg with "-q:v 6" and the GraphicsMagick and ImageMagick convert results for WEBP and ImageMagick AVIF output.
First, we installed the various libraries tools on a fresh Debian installation via the following. We ran all experiments on a 64-core VM with 400GB RAM, but artificially limited the processing to a single thread to capture the actual resource requirements of each workflow.
apt search libavif-bin apt-get install libavif-bin apt-get install webp apt-get install imagemagick apt-get install graphicsmagick
The results of an hour-long SD resolution broadcast at 1fps sampling:
MP4 => 370MB 1FPSJPG => 97MB mkdir GM-WEBP time find 1FPS/*.jpg | parallel --eta 'gm convert {} ./GM-WEBP/{/}.webp' #3m39.459s => 90MB mkdir IM-WEBP time find 1FPS/*.jpg | parallel --eta 'convert {} ./IM-WEBP/{/}.webp' #3m15.161s => 92MB mkdir IM-AVIF time find 1FPS/*.jpg | parallel --eta 'convert {} ./IM-AVIF/{/}.avif' #41m57.668s => 64MB
The results of an hour-long HD resolution broadcast at 1fps sampling:
#MP4 => 939MB #JPG => 223M time find *.jpg | parallel --eta 'gm convert {} {}.webp' #8m41.052s => 326M time find *.jpg | parallel --eta 'convert {} {}.webp' #8m20.221s => 326M time find *.jpg | parallel --eta 'convert {} {}.avif' #275m20.471s => 187M
How about an hour-long HD resolution broadcast using 1/4fps sampling rate for the JPEGs and an extended set of testing libraries and formats:
MP4: 948MB JPG: 89MB time find *.jpg | parallel --eta 'convert {} {}.jp2' #3m26.547s => 449MB time find *.jpg | parallel --eta 'convert {} {}.avif' #18m52.082s => 32MB time find *.jpg | parallel --eta 'gm convert {} {}.jp2' #0m28.714s => 82MB time find *.jpg | parallel --eta 'avifenc {} {}.jp2' #17m15.144s => 44M time find *.jpg | parallel --eta 'avifenc --codec rav1e {} {}.jp2' #53m12.041s => 52MB time find *.jpg | parallel --eta 'avifenc --codec svt {} {}.jp2' #52m19.977s => 91MB time find *.jpg | parallel --eta 'convert {} {}.webp' #2m10.821s => 77MB time find *.jpg | parallel --eta 'gm convert {} {}.webp' #2m6.959s => 49MB time find *.jpg | parallel --eta 'cwebp {} -o {}.webp' #1m48.976s => 49MB time find *.jpg | parallel --eta 'cwebp {} -m 6 -pass 10 -o {}.webp' #30m32.014s => 48MB