We've run Elasticsearch clusters on GCP for almost a decade across many different iterations of hardware and cluster configurations. Yesterday we explored the hardware the Elastic uses to deploy Elasticsearch in their own managed Elastic Cloud offering on GCP. How do their configurations compare to what we've used internally and the lessons we've learned over the past decade?
Elastic's managed Elastic Cloud offering is essentially a multitenant virtualized environment layered upon the multitenant hardware environment of the cloud. For hot data nodes, Elastic primarily relies upon N2 VMs with 10, 16 or 32 vCPUs, 68GB of RAM (64GB of which half goes to the JVM heap + 4GB for utilities overhead) and either 8 or 16 locally attached Local SSD disks. User deployments are based on a fractional multiple of 1GB of RAM and a corresponding set of storage, with multiple deployments colocated on a single VM. These larger single VMs make it easier for Elastic to manage and maintain its fleet, rather than managing a vastly larger fleet of singular VMs sized precisely to each user.
In contrast, we have historically deployed fleets of 2-vCPU N1 VMs as our base node configuration, first with 8GB of RAM in our earliest incarnations when larger memory sizes were less common, then 16GB and in recent years 32GB RAM per node. Zonal Persistent SSD Disk offers performance of 30 IOPs and 0.48MiBps throughput per GiB, while N1 and N2 VMs exhibit a fixed upper cap of 15,000 IOPs and 240MB/s throughput for 2-7 vCPUs. Putting these together, it means that peak IOPs occurs at 15,000 / 30 = 500GB SSD and peak throughput occurs at 240 / 0.48 = 500GB SSD as well. Thus, an N1 or N2 VM achieves peak PD SSD disk performance using a disk of 500GB – anything larger than that will yield additional storage, but no additional IO performance. This means that on a per-GB IOP and throughput level, storage performance decreases past 500GB.
Thus, our recent generation fleets have consisted of 2-vCPU N1 VMs, each with 32GB of RAM and 500GB PD SSD, with a few storage-dense auxiliary fleets with the same compute hardware, but with 2TB PD SSDs for enhanced storage capacity at reduced per-GB IO performance.
Looking at our own cluster performance, due to our highly optimized index structuring and hardware allocation, we see instantaneous peaks of 240MB/s read throughput, but our steady state throughput is just a small fraction of that. Read IOPs peak at 15,000 IOPs, with instantaneous bursts as high as 20,000 IOPs, but our steady state IOP load is again just a fraction of this. In fact, our peak IO load only occurs during fixed management windows each day, when a set of bespoke monitoring systems scan the complete cluster membership of each Elasticsearch fleet to perform a variety of health, statistical and corrective management tasks. These tasks include offline statistical calculations used to synchronize and verify a number of our front-line caches that deflect an enormous percentage of our steady state query load away from the underlying Elasticsearch fleets at the cost of just minutes of reduced recency. Our steady state IO and CPU load are actually fairly low, offering the potential to increase storage density across our fleet, rather than resort to more exotic block storage arrangements.