GCP Tips & Tricks: The Networking That Supports GCS As A Global Storage Fabric For GCE

The modern public cloud makes it possible to create truly astonishingly scalable and high-performance global distributed computing architectures. Rather than a traditional cluster interconnect architecture, GDELT leverages GCP's networking infrastructure to place GCS front and center of its global architecture, using a single central set of multiregion buckets as its global storage backing and arraying all of its GCE VMs around this central fabric. Even our global crawler fleets stream data directly to GCS in parallel to their bespoke interconnects.

The trivial simplicity of setting up a GCS bucket and reading and writing to it from GCE VMs means we often don't think about the incredible bandwidth that supports modern HPC and distributed architectures. Within GDELT, we typically see between half a petabyte and multiple petabytes of data exchanged just between VMs and a single GCS bucket (internal traffic only, not counting external traffic to the outside world), capturing just how transparent and seamless, yet extraordinarily powerful, the GCS-GCE connection is and the kinds of architectures it makes possible.