Most new users of Google's Cloud Storage (GCS) likely focus on its "Standard" offering: a high performance massively scalable global object storage fabric. At the same time, GCS has multiple "storage class" options that offer substantial discounts for less-frequently-accessed data, but still offer realtime access to data without rehydration. How can these classes offer surprisingly cost-effective backup storage? At their cheapest, a petabyte can be backed up for just $1,118 a month, an exabyte for $1.1M a month and the entire Internet Archive could be backed up for just $1.33M a year.
Let's say your enterprise has 1 exabyte (1,000PB) of data that you want to robustly back up, but hopefully will never need. In the US, for maximal protection you would likely use US Multiregion storage. At the "Standard" class, this would cost $24,214,386.94 per month, plus a one-time replication fee of $18,626,451.49 for the networking cost of GCS replicating your data across multiple regions for maximal disaster recovery. At the same time, it is unlikely that you will be accessing all 1 exabyte of this data regularly. More likely, the majority of it takes the form of various cold and colder backups. For Nearline storage, the cost of our exabyte is reduced to $13,969,838.62 per month, for Coldline it becomes $6,519,258.02 per month and for Archival it is just $2,235,174.18 per month.
Amazingly, an exabyte of data can be securely stored in the cloud and replicated across multiple physically disparate geographic regions, ready for instantaneous access for just $2.2M a month. Accessing this data incurs a per-GB "retrieval cost", so it is only suitable for rarely-accessed data and it requires a commitment to store each file for a minimal number of days to achieve the greatest discount, but at the end of the day, the data requires no rehydration process – it is available instantaneously.
Even better, what if we already have one backup of our data and are merely looking for a second redundant backup? In that case, we could use Regional Archival class storage, which stores our data at only one GCP data center, rather than multiple centers. In that case, our cost to store one full exabyte of data is just $1,117,587.09 per month, with no replication cost, since the data isn't being copied to other data centers!
For just $1.1M a month we can store an entire exabyte in the cloud ready for instantaneous access.
Not every company has an exabyte of data they need to store. What about a more common use case of 1 petabyte? Storing 1PB for realtime access at the Standard Multiregion class costs $24,214.39 a month, plus a onetime replication fee of $18,626.45. Notably, for just $24,000 a month, that full petabyte is available at maximal performance for realtime compute and access across the company's entire GCP project. Most importantly, if the company wants to make that petabyte or portions of it available to others, it can simply set the permissions on each file to make it publicly available via a fully-managed HTTPS endpoint (no web servers required). What if the data is only for backup? Storing a petabyte at Nearline class is just $13,969.84 per month, while Coldline drops that to $6,519.26 and Archival reduces it all the way to $2,235.17 per month. Incredibly, storing an entire petabyte at Regional Archival level costs just $1,117.59 a month, with no one-time replication fee!
To put these costs in perspective, the entire Internet Archive's 99PB of data (as of December 2021) could be stored for realtime global access across multiple regions for $2,397,224.31 a month or $221,282.24 a month in Archival Multiregion class. If the Archive wanted to store a single backup in a single region at Archival class it would cost just $110,641.12 a month, or just around $1.33M a year.