Generative AI Experiments: GenAI Coding Copilots: Asking GPT-4 & Gemini Ultra To Help Brainstorm A Trivial Video Quality Filter

The wonderful world of digital video is a vast, complex, nuanced and often arcane landscape of containers, formats, codecs, bitrate, resolution, aspect ratios, framerate and myriad other consideration. Each specific application can involve myriad tradeoffs amongst all of these moving parts that can be complex for a professional and utterly bewildering for someone new to digital video. Take one specific use case – a directory of broadcast videos, each with multiple transcoded versions targeted to different devices and you want to select from amongst those available versions based on different criteria. How might LLMs like GPT-4 and Gemini Ultra help a novice developer new to digital video understand all of underlying considerations and provide sample code that showcases different approaches?

The end result is that both GPT-4 and Gemini Ultra fail utterly spectacularly at this trivial task. Told the directory contains both the original broadcast video file and transcoded derivatives and asked to select the one that is closest to the original video, neither LLM suggests to select the original broadcast video file, instead offering complex convoluted workflows that do everything from filter by bitrate to perform pairwise perceptual hash similarity comparisons. Neither explains that transcoded files can increase resolution, bitrate and other factors through interpolation or that all lossy processes yield video that is degraded compared with the original – core concepts at the very heart of digital video but which may be entirely unfamiliar to someone new to the field. Asked to come up with ideas for how to tradeoff quality versus filesize, both GPT-4 and Gemini Ultra fixate on singular attributes (bitrate or filesize) and even in their step-by-step reasoning narratives fail to provide any real useable workflows of the kind a developer would actually use in a real application.

Despite both products being heavily positioned as coding copilots, it is surprising that neither yielded strong results even for a field as strongly represented in their training data as digital video and that both models missed what should have been a trivial noop example to merely select the original file – demonstrating critical limitations in their reasoning abilities.

Let's start with a trivial example, where we have a collection of broadcasts and for each we have the original source video and multiple transcoded variants targeted to different devices. We ask the LLMs to select the "highest quality" video. Since the original source video is included, this is a trivial noop, since the original source video will always be the highest quality content.

You have a collection of broadcast videos that originated in MPEG2 / MP2 format. For each video there is the original MPEG2 video and multiple alternative transcoded versions aimed at various devices. Write a simple Perl script that runs on a Linux machine to select the highest quality version of each video.

Gemini Ultra completely misses that the source video is available and doesn't even bother to provide sample code, leaving it to the user as an exercise to figure out how to apply ffprobe to assess the resolution and bitrate. Yet, in the presence of the source material, assessing bitrate, resolution, codec, etc is the wrong metric, since the original content will always offer the highest quality. Imagine a video that originates at a 512kb bitrate and 200×200 pixels and is transcoded to 10MB/s and resized to 4000×4000 pixels – the latter may be much larger along the dimensions specified below, but is much poorer quality:

sub get_video_quality {
    my $file = shift;

    # Use tools like 'mediainfo' or 'ffprobe' to extract resolution, bitrate, etc.
    # Calculate a quality score based on your preferred criteria
    # Example placeholder:
    return rand(100);  # Replace with your actual quality calculation
}
  • get_video_quality Function: This function is crucial. You'll need to implement logic to determine what is meant by "quality". Here are common ways using command-line tools:

    • mediainfo or ffprobe: Use these tools to extract data like video resolution (higher is better), bitrate (higher is better), video codec (some are more modern/efficient), etc.

    • Your Logic: Design a calculation of "quality" based on factors that are important to you.

GPT-4 similarly misses that the original video is available. Worse, it fixates on bitrate exclusively, completely ignoring the impact of resolution and codec.

my $ffprobe_cmd = "ffprobe -v error -select_streams v:0 -show_entries stream=bit_rate -of default=noprint_wrappers=1:nokey=1 '$video_dir/$file'";
my $bitrate = `$ffprobe_cmd`;

What if we cheat and tell the script explicitly to find the video closest to the source material?

You have a collection of broadcast videos that originated in MPEG2 / MP2 format. For each video there is the original MPEG2 video and multiple alternative transcoded versions aimed at various devices, all saved in the same directory. Write a simple Perl script that runs on a Linux machine to select for each broadcast the video file from the directory that is the most similar to the original broadcast.

Gemini Ultra fails again. Despite explicitly telling it both that the original video file is available and to select the video closest to the original, it fails to recognize that this is a noop (it should just select the original video) and instead it outputs an elaborate sequence of pseudocode that requires the user to fill in code to take the video and sample it into frames and then use ImageMagick to perform perceptual hash similarity comparisons on each frame:

Determining the most similar video to a source broadcast gets quite complex as there's no single metric perfectly replicating visual similarity. Here's a foundational Perl script, along with key factors and areas where advanced techniques would be necessary:

sub calculate_similarity {
    my ($video_url, $source_frame_dir) = @_;

    # 1. Sample frames from the candidate video (needs logic - interval, etc.)
    #    ... use ffmpeg to extract frames

    # 2. Compare each sampled frame against corresponding source frames
    my $total_similarity = 0; 
    my $frame_count = 0;

    foreach my $sample_frame (@sampled_frames) {
        my $best_match_score = 0;
        # Compare against multiple source frames to account for slight timing offsets
        foreach my $source_frame (get_source_frames_around($sample_frame)) { 
            my $score = compare_images($sample_frame, $source_frame);  
            $best_match_score = $score if $score > $best_match_score;
        }
        $total_similarity += $best_match_score;
        $frame_count++;
    }

    return $total_similarity / $frame_count;  # Average similarity
}

sub compare_images {
    my ($image1, $image2) = @_;
    my $image = Image::Magick->new;
    my $result = $image->Compare($image1, $image2);
    my $metric = $result->Get('metric'); 

    # Translate the metric ... lower might mean more similar? 
    # This depends on the metric used by Image::Magick->Compare
    return ...; 
}

Limitations and Refinements Needed

    • Image Comparison is NOT Foolproof: Even perceptually similar images can have differing pixel values. Advanced image similarity techniques (SSIM, feature-based analysis) might be needed.
    • Frame Sampling: Selecting representative frames from the video is crucial.
    • Source Frame Matching: How do you find corresponding frames between the source and the candidate video? Timecodes (if available) may help.
    • 'Good' Similarity: You'll need to experiment with and calibrate the compare_images function to understand what metric values constitute a "good" similarity.

GPT-4 refuses to back away from bitrate as its primary metric and, like Gemini Ultra, fails to recognize that the original file is the trivial and best answer:

Selecting the video file that is most similar to the original source broadcast based on file analysis is a complex task that usually requires sophisticated video comparison algorithms. These algorithms often involve frame-by-frame analysis to check for differences in quality, color grading, compression artifacts, and other factors. However, such detailed analysis is beyond the scope of a simple Perl script.

As a more straightforward alternative, we can still use Perl to select the video file with the highest bitrate or resolution, under the assumption that these metrics are rough indicators of fidelity to the original broadcast. This approach isn't perfect, as it doesn't account for all aspects of video quality, but it's a feasible method for a script.

In real-world applications, we typically must balance video quality with file size. How might the two models help us think through the various tradeoffs that must be considered here?

You have a directory of videos. For each video, there are multiple versions of the video saved as .mpg and .mp4 files, transcoded for different devices. Explain step-by-step how to decide which version of which video to use for AI that balances video quality and file size. You want to maximize the video quality while minimizing file size. Find a balance between those two. Explain your reasoning step-by-step and provide a sample Perl script that implements it.

Gemini Ultra fixates on bitrate as its primary quality metric, using only a simple regex as a fixed boosting factor and doesn't incorporate pixel resolution at all. A high bit rate with a lower resolution will score higher than a high resolution video with slightly lower bitrate. A high-framerate low resolution video and a low-framerate high resolution video can have similar bitrates, which this explanation entirely ignores. Its only consideration of codec is a single fixed regex which doesn't normalize the bitrate comparison based on codec:

Here's a comprehensive approach combining the best aspects of expert analysis and balancing video quality with file size, along with a Perl script to get you started.

Step-by-Step Guide

    1. Identify Key Metrics:

      • Video Resolution: Higher resolution typically means better quality. Common resolutions include 720p (HD), 1080p (Full HD), and 4K (Ultra HD).
      • Bitrate: Higher bitrate leads to better image quality at the expense of larger file size. Bitrate is measured in kilobits per second (kbps) or megabits per second (Mbps).
      • Codec: The compression technology used (H.264/AVC, H.265/HEVC, VP9, etc.). Modern codecs like H.265 tend to offer better compression efficiency (smaller file size at the same quality).
    2. Collect Video Information:

      • Design a method to extract metadata (resolution, bitrate, codec, file size) from videos. You can use tools like ffprobe or integrate libraries like Perl's Media::Info.
    3. Compare Versions:

      • Prioritize Codec: Prioritize video files encoded with a more efficient codec (ideally H.265). If videos exist in both H.264 and H.265 versions, it's likely the H.265 version offers comparable quality at a smaller file size.
      • Resolution vs. Bitrate: Carefully assess resolution trade-offs against bitrate. If two videos have the same codec, slightly lower resolution with a significantly higher bitrate might provide a better overall quality/size balance. The difference might be unnoticeable, except with decreased file size.
    4. Establish Quality Threshold:

      • Determine a minimum acceptable quality for video for your AI task. Is 720p sufficient, or is 1080p necessary? Defining this helps avoid discarding videos exceeding a set resolution without considering other factors.
    5. Heuristic-Based Scoring:

      • Design a scoring system based on metrics, giving importance to:
        • Modern Codecs (H.265, VP9)
        • Bitrate (favoring larger if the same codec and acceptable resolution)
        • Resolution (secondary if acceptable threshold already met)

Sample Perl Script (Illustrative)

sub process_video {
    my $file = $_;
    return unless -f $file and ($file =~ /\.mp4|\.mpg$/i);

    my $info = Media::Info->new($file);
    my $width = $info->get('Video', 'Width') || 0;
    my $height = $info->get('Video', 'Height') || 0;
    my $bitrate = $info->get('General', 'Overall bit rate') || 0;
    my $codec = $info->get('Video', 'Format') || "";

    my $score = calculate_score($width, $height, $bitrate, $codec);

    # Store $file, $score, and other metadata to compare and choose later 
}

sub calculate_score {
    my ($width, $height, $bitrate, $codec) = @_;

    # Assign higher weights to codec and bitrate
    my $score = 0;
    $score += 5 if $codec =~ /H\.265|HEVC|VP9/;
    $score += $bitrate / 1000; # Normalize bitrate

    # Check against quality threshold
    if (($width >= 1280) && ($height >= 720)) {
        $score += 2; 
    }

    return $score;
}

GPT-4 fixates on filesize as its primary quality metric, along with using 720p as a minimum filter, but fails to take into account anything else:

To decide which version of a video to use for an AI application that balances video quality and file size, you need to consider a few key factors. Here's a step-by-step guide to help you make the decision:

1. Understand File Formats

    • .MPG: This is a common format for videos encoded with the MPEG-1 or MPEG-2 standard. It's typically used for videos with lower resolution and has a larger file size compared to modern formats.
    • .MP4: This is a more modern format that uses the MPEG-4 standard. It's better for higher resolution videos and generally offers good quality with smaller file sizes.

2. Analyze Video Quality Parameters

    • Resolution: Higher resolution (e.g., 1080p, 4K) means better quality but also larger file size.
    • Bitrate: This refers to the amount of data processed per unit of time. A higher bitrate usually results in better quality but also increases file size.
    • Frame Rate: Higher frame rates (e.g., 60fps) improve the smoothness of video but can increase file size.

3. Determine AI Application Requirements

    • Understand the specific needs of your AI application:
      • Does it require high resolution for accurate analysis?
      • Can it work effectively with lower frame rates or bitrates?

4. Establish Quality vs. File Size Balance

    • You need to find a compromise between video quality and file size:
      • Choose a resolution that is sufficient for the AI but not excessively high.
      • Opt for a bitrate that provides good quality without unnecessary data.
      • Select a frame rate that is adequate for the video content and AI needs.

5. Implement a Script for Selection

    • Use a script to automate the selection process. A Perl script can be written to:
      • List all the video files in the directory.
      • Analyze each video's parameters (resolution, bitrate, frame rate).
      • Compare these parameters against your pre-determined criteria.
      • Select the video version that best fits the balance between quality and file size.

Sample Perl Script

Here is a basic example of a Perl script that might be used to implement this. Note that you would need to have Perl installed on your system and possibly some additional modules for handling video metadata.

# Define your criteria
my $max_file_size = 500; # Maximum file size in MB
my $min_resolution = '720p'; # Minimum resolution
# ... other criteria ...

foreach my $file (@files) {
    my $info = Video::Info->new(-file=>$file);
    
    # Check file size
    my $size = -s $file; # File size in bytes
    next if $size > $max_file_size * 1024 * 1024;

    # Check resolution
    my ($width, $height) = ($info->width, $info->height);
    next if "$width x $height" lt $min_resolution;

    # ... additional checks ...

    print "Selected file: $file\n";
    # Implement the logic to use the selected file
}