Behind The Scenes: API Quotas & The Impact Of A Fraction Of A QPS

All hosted APIs have rate-limited quotas of some form to protect them from abuse and to ensure equal sharing of their underlying resources amongst all users. While these quotas can typically be increased, all systems have hard upper bounds due to capacity constraints. For the largest-scale projects, a key design goal is to reach as close as possible to this quota without exceeding it. Maintaining stable throughput at almost precisely the quota level at archive scale necessitates exceptionally precise global coordination amongst fleets of machines and geographically distributed clusters to ensure that across the entire fleet, the sum total QPS is just a fraction beneath the absolute maximum quota level. Sometimes the difference between a surge of 429 quota exceeded errors and perfect steady processing can be a fraction of a QPS just three decimals out. For example, below is what happens when the fleet-wide QPS was incrementally increased by just 0.001. This increase is effectively imperceptible in the top traffic graph, but causes the 429 error rate to surge to 5%, offering a stark reminder that at archive scale, even a rate change measured to three decimals can mean the difference between a stable system and a system that exceeds its maximum quota.