One of the most critical questions in supporting IO-centric workloads is determining the right mix of storage devices. For some of our CPU and GPU intensive workloads like video analysis and transcoding, the compute side is slow enough that we simply live-stream the MPEG2 and MP4 video files directly from GCS to the processing engine and live-stream the results directly back to GCS, without using any local disk. For maximal IOP workloads we use RAM disk and for high IOP workloads we use Local SSD.
Yet, what about workloads that require random seeking and require persistence across instance reboots and crashes? For these use cases, GCP offers two very high performance block storage devices: Extreme Persistent Disks and Hyperdisks. However, both disk types only achieve their extreme performance levels (120,000 IOPs and 4GB/s read / 3GB write for Extreme PD and 350,000 IOPs and 5GB/s read/write for Hyperdisks) under specific machine types at very high core counts (64+ N2 vCPU SSI for Extreme PDs and 176 C3 vCPU SSI for Hyperdisks), with smaller systems either being unsupported or falling back to ordinary pd-ssd performance levels. For distributed applications like ElasticSearch, this means that an equivalent N2-based 2-vCPU cluster will yield vastly higher performance, topping at 480,000 IOPs and 7.68GB/s for the Extreme PD equivalent and 1.32 million IOPs and 21.12GB/s for the Hyperdisk equivalent. While these numbers are completely unsurprising and lie at the heart of the cluster-vs-big iron debate for application design, for the purposes of distributed applications like ElasticSearch that can make use of many small disks, it is important to remember that even the highest performance large disks cannot come close to the combined performance of many small disks, so rather than rush directly to the highest performance disks, developers should examine whether a more distributed architecture might actually be able to achieve much higher disk performance.
Extreme Persistent Disks act like ordinary persistent disk storage, but are supported only for M2 (208 or 416 vCPUs), M3 and N2 (64 or 80 vCPUs on Cascade Lake, 64+ on Ice Lake) machine types – for all other VMs they fall back to standard SSD Persistent Disk (pd-ssd) performance (or the provisioned IOPs if lower). They achieve their highest performance on N2 Ice Lake systems with 64+ vCPUs, maxing out at 120K IOPs and 4GB/s read or 3GB/s write throughput.
What about applications in which files don't need to exist on a single large system, but rather can be sharded across many smaller systems, such as search? A standard SSD Persistent Disk (pd-ssd) achieves 15K IOPs and 240MB/s read or write throughput for a 2-vCPU N2 system. Recall that a single 64-vCPU N2 system attached to a pd-extreme disk achieves 120K IOPs and 4GB/s reads or 3GB/s writes. Instead, 32 2-vCPU N2 systems together total 480,000 IOPs and 7.68GB/s read or write throughput. Critically, while pd-extreme systems max out at the numbers above, an N2 cluster can continue scaling upwards effectively linearly and can begin much smaller. An ElasticSearch cluster with a large evenly-sharded dataset will therefore benefit far more from the many small disks than the singular large disk.
Hyperdisks are available in Balanced, Throughput and Extreme offerings, topping out at 500,000 IOPs and 10GB/s read/write throughput on a C3 with 176 vCPUs. However, since a single Extreme volume maxes at 350,000 IOPs and 5GB/s, this requires attaching two Extreme volumes to the same VM and sharding across both. The equivalent 88-node 2-vCPU N2 cluster offers an astonishing 1.32 million IOPs and 21.12GB/s read/write throughput. As before, an ElasticSearch cluster with a large dataset evenly sharded across the 88 nodes would offer vastly higher performance (not taking into consideration scaling overhead from the high number of nodes and inter-instance communication).