Most of the focus on AI today centers on model capabilities and scaling deployments, but even more important is securing those deployments and shielding them from wasteful and harmful traffic. Protecting AI deployments isn't just about API endpoint armoring, it starts at the individual systems powering those APIs. Let's look at a recent example of a just-launched GCE VM that received a sudden surge of more than 100,000 attempted connections in less than an hour just days after creation and a steady stream of hundreds of requests every half hour since it was launched, totaling more than a quarter-million attempts – all of them blocked by GCP VPC firewall rules.
To test a potential memory leak in an AI workflow we use, we launched a fresh standalone 4-core GCE VM in the us-west4 region (Las Vegas) on May 9th to run that library in a heavy loop to stress test it. This VM has an external IP address (to test for a subtle low-level interaction of having that enabled), but the VM makes no external network requests. Despite making no external requests, the VM began receiving requests immediately upon instance start, capturing the intense scanning that hyperscaler networks receive.
In all, more than 274,000 attempted requests were received from May 9-15, of which 98.3% were TCP and the remainder UDP. In all, 84.6% of requests came from America, 12.9% from Europe and 2% from Asia. The majority of attempts were unsurprisingly to the TELNET, HTTPS, HTTP, SSH, RDP, 8080, 9999, VNC, SMB, SIP, 8088 and 49800 ports, in that order. Fascinatingly, 62% of those attempts came from just 5 cloud-based IP addresses, while the rest were scattered across individual IPs each making a single request.
If we look at the complete traffic graph from the VM's May 9th launch to present, we can see a three-hour surge in traffic on May 13th that totaled 175,000 requests in all, culminating in a surge of 100K requests in a single hour:
Zooming in to this period we can see this three-hour period consisted of three bursts totaling 4K-9K requests every two minutes (each bar is a two minute period):
Other than this one-time burst, the level of background unwanted traffic is fairly consistent day-to-day, averaging around 300-500 requests every 30 minutes, capturing the constant background probing received by a typical GCE VM:
Again, all of this traffic was blocked by the VPC firewall and thus was never seen by the GCE VM itself, demonstrating the critical importance of firewalls!
