As we head towards the launch of GDELT 5, we've been hard at work upgrading our GCE VMs to the latest GCP capabilities. One of the first and most important decisions when creating a GCE VM is deciding its CPU platform. While basic CPU characteristics like supported features, clockspeed, memory architecture, cost, etc, are always part of the selection equation, characteristics alone simply cannot predict how a given CPU will perform in real life for a given workload and thus benchmarking is still the gold standard. At the same time, it simply isn't cost effective to benchmark every single one of the hundreds of possible CPU platform + shape + storage + context + workload configurations for each workload. To explore how Gemini might be able to assist in CPU selection, we've been asking Gemini to recommend CPU platforms and shapes for each workload we've been testing and then comparing its recommendations against our actual benchmarks, with pretty incredible results. Amazingly, despite basing its recommendations solely on theoretical performance (we didn't actually provide it agentic access to real VMs to run benchmarks itself), Gemini with Search Grounding was remarkably adept at tracking down CPU characteristics and benchmarks from across the web relating to the actual software used by the described workload and combining them to generate recommendations that have thus far perfectly matched our actual benchmarks. Let's look at one such example: using FFMPEG to process MP4 video files to extract 1 frame per second into RAM that are then tar'd up and streamed to GCS and see how Gemini's recommendations match against our actual benchmarking.
Here is the prompt we used with Gemini 3.1 Pro with Search Grounding. We first asked Gemini to compare all of its various CPU families and it recommended the N4 family, but didn't clarify whether N4 or N4D was the better choice, so we asked it:
I have a GCE VM that will be running at 100% CPU saturation, using FFMPEG to process MP4 videos into 1fps frames written into /dev/shm RAM. Compare the N4 vs N4D CPU families: which is the ideal for this scenario?
For your highly specific workload—saturating a CPU with software-based FFmpeg decoding and writing uncompressed frames directly to a RAM disk (/dev/shm)—the AMD-based N4D CPU family is the clear, ideal choice over the Intel-based N4 family[1][2][3][4].
Because you are outputting to a RAM disk, storage I/O bottlenecks are entirely eliminated[2]. Your throughput will be dictated 100% by pure CPU compute power (clock speeds, IPC), SIMD instruction execution (AVX-512), and DDR5 memory bandwidth[1][5].
Here is how the N4 (Intel) and N4D (AMD) compare across the metrics that govern your workload:
1. Raw CPU Compute and Clock Speeds (Winner: N4D)
Because server-grade compute instances do not include hardware media decoders (like Intel QuickSync), your FFmpeg workflow relies entirely on brute-force software CPU decoding[2].
-
- N4D (AMD EPYC "Turin" / Zen 5): Reaches maximum boost clocks of up to 4.1 GHz[5]. The Zen 5 architecture boasts massive Instructions-Per-Clock (IPC) improvements, meaning it does significantly more work per cycle[2].
- N4 (Intel "Emerald Rapids" / 5th Gen Xeon): Features a sustained all-core turbo frequency of 2.9 GHz[5].
The raw frequency advantage alone gives the N4D a substantial edge in pushing maximum frames per second (FPS) through FFmpeg[5][6][7].
2. SIMD & AVX-512 Instruction Sets (Winner: N4D)
FFmpeg decoding relies heavily on vector math, specifically utilizing AVX2 and AVX-512 instructions[6][8].
-
- Historically, Intel maintained a wide lead in FFmpeg processing because older AMD chips either lacked AVX-512 or had to double-pump 256-bit instructions to process them.
- However, the AMD Turin CPUs in the N4D family feature native, full-width AVX-512 data paths[2][7][8]. Combined with the massive clock speed advantage, recent benchmarks demonstrate that AMD Turin instances can achieve up to a 1.4x+ performance uplift in FFmpeg encoding/decoding throughput over competing equivalent Intel Xeon instances[6][8].
3. Memory Bandwidth for /dev/shm (Tie / Slight N4D Edge)
Writing uncompressed bitmaps or raw video frames to /dev/shm (RAM) requires immense memory bandwidth to prevent the CPU from stalling while waiting for RAM write operations[2][7].
-
- Both N4 and N4D instances are built on state-of-the-art DDR5 memory architecture[1][5].
- Both families also run on Google Cloud's Titanium offload system, meaning network and hypervisor overhead is moved to a custom SmartNIC, dedicating 100% of the CPU's memory lanes to your FFmpeg container/application[2].
- However, AMD's Turin multi-chiplet design generally boasts slightly superior aggregate memory bandwidth per socket[7], which scales exceptionally well when all vCPUs are pinned at 100% saturation.
4. Price-to-Performance Unit Economics (Winner: N4D)
If you are running instances at 100% saturation, you want the absolute highest "extracted frames per dollar"[2].
On Google Cloud, the N4D instance pricing is typically about 5% to 10% cheaper per vCPU/hour than the Intel N4 equivalent[9]. Because the N4D processes more frames per second due to the higher clock speed and Zen 5 architecture, while simultaneously costing slightly less per hour, your unit economics (Cost per Video Processed) will be drastically lower[2][7][9].
Summary Recommendation
Spin up N4D instances (e.g., n4d-highcpu-* if you don't need much RAM capacity, as /dev/shm will consume your available memory)[1][3]. The combination of a 4.1 GHz max boost, native AVX-512, and lower hourly pricing makes the AMD EPYC Turin architecture the undisputed champion for highly parallelized, pure-CPU FFmpeg workloads[5][7][9].
Gemini selected N4D as the clear winner for this scenario. To compare them in real life, we spun up two VMs (test-n4 and test-n4d) in us-east4-c connected to a GCS bucket in the US multiregion. Both were the highmem-16 shape (16 vCPUs + 128GB RAM) with 100GB Hyperdisks. The N4 was Intel Emerald Rapids and the N4D was AMD Turin. The N4 instance cost $712/month and the N4D cost $666/month. You can see their precise configurations below:
#N4D-TEST >lscpu Vendor ID: AuthenticAMD Model name: AMD EPYC 9B45 CPU family: 26 Model: 2 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 1 BogoMIPS: 5391.89 Caches (sum of all): L1d: 384 KiB (8 instances) L1i: 256 KiB (8 instances) L2: 8 MiB (8 instances) L3: 32 MiB (1 instance) #N4-TEST >lscpu Vendor ID: GenuineIntel Model name: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz CPU family: 6 Model: 207 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 2 BogoMIPS: 4200.00 Caches (sum of all): L1d: 384 KiB (8 instances) L1i: 256 KiB (8 instances) L2: 16 MiB (8 instances) L3: 260 MiB (1 instance)
Let's benchmark running at half capacity: four FFMPEG processes running with 2 threads each, streaming from the same GCS MP4:
rm -rf /dev/shm/FRAMES1;mkdir /dev/shm/FRAMES1/; time /GDELT/GEN5/Core/GCSStreamUtil/gcs_streamutil --dir FROMGCS --chunk-size-mb=32 --gcsfile=gs://mybucket/myvideo.mp4 | ffmpeg -nostdin -threads 2 -hide_banner -loglevel panic -i - -vf "fps=1,scale=iw*sar:ih" -q:v 6 -start_number 0 -t 3660 "/dev/shm/FRAMES1/%06d.jpg"& rm -rf /dev/shm/FRAMES1;mkdir /dev/shm/FRAMES1/; time /GDELT/GEN5/Core/GCSStreamUtil/gcs_streamutil --dir FROMGCS --chunk-size-mb=32 --gcsfile=gs://mybucket/myvideo.mp4 | ffmpeg -nostdin -threads 2 -hide_banner -loglevel panic -i - -vf "fps=1,scale=iw*sar:ih" -q:v 6 -start_number 0 -t 3660 "/dev/shm/FRAMES1/%06d.jpg"& rm -rf /dev/shm/FRAMES1;mkdir /dev/shm/FRAMES1/; time /GDELT/GEN5/Core/GCSStreamUtil/gcs_streamutil --dir FROMGCS --chunk-size-mb=32 --gcsfile=gs://mybucket/myvideo.mp4 | ffmpeg -nostdin -threads 2 -hide_banner -loglevel panic -i - -vf "fps=1,scale=iw*sar:ih" -q:v 6 -start_number 0 -t 3660 "/dev/shm/FRAMES1/%06d.jpg"& rm -rf /dev/shm/FRAMES1;mkdir /dev/shm/FRAMES1/; time /GDELT/GEN5/Core/GCSStreamUtil/gcs_streamutil --dir FROMGCS --chunk-size-mb=32 --gcsfile=gs://mybucket/myvideo.mp4 | ffmpeg -nostdin -threads 2 -hide_banner -loglevel panic -i - -vf "fps=1,scale=iw*sar:ih" -q:v 6 -start_number 0 -t 3660 "/dev/shm/FRAMES1/%06d.jpg"&
You can see the final results below, with N4D a full minute faster than N4 (31% speedup):
#N4D real 2m33.442s user 4m40.387s sys 0m6.675s real 2m34.110s user 4m40.984s sys 0m6.520s real 2m34.127s user 4m41.055s sys 0m6.967s real 2m34.179s user 4m40.988s sys 0m6.640s #N4 real 3m30.373s user 6m26.522s sys 0m15.104s real 3m30.784s user 6m27.173s sys 0m15.024s real 3m30.863s user 6m26.092s sys 0m16.125s real 3m30.887s user 6m27.538s sys 0m15.079s
We get the same results if we copy the video to /dev/shm RAM first and run, demonstrating that the streaming ingest does not impact the results. Running eight processes in parallel, each with 2 threads to fully saturate the VM yielded:
#N4D real 3m27.185s user 5m48.836s sys 0m12.163s #N4 real 4m24.609s user 7m20.361s sys 0m21.795s
And what if we run 16 processes, each with two threads, designed to doubly oversubscribe the VM?
#N4D real 6m30.141s user 6m17.178s sys 0m13.095s #N4 real 7m50.113s user 7m31.656s sys 0m18.466s