Scaling GDELT For A New Era: Agentic Gemini Human-In-The-Loop Web Server & Kernel Tuning

Kalev Leetaru

2 months ago

While we continue the architectural design of our new generation of web server fleet to handle GDELT's exponentially growing web traffic loads, we've been struggling to wring out just enough performance from our existing server fleet to reduce the high volume of connection timeouts. To date, we've done all of the usual kernel and server-level configuration adjustments and tuning, investing a huge amount of time in relatively modest and incremental performance improvements. Earlier today we decided to see what agentic Gemini could do. Given that this is mission critical infrastructure and we couldn't afford to just hand the keys to the live server over to Gemini, we used a human-in-the-middle approach in which we started with a simple prompt describing that we have massively overloaded web server VMs and describing the tuning that we had done to date and asked Gemini to diagnose the situation and recommend configuration changes.

Gemini immediately provided a list of diagnostic commands it wanted run to start with assessing the current state of networking, process table, CPU and memory, socket table, kernel params, etc. Each of these was manually vetted to ensure it was a read-only command with no risk and each was run and the output provided back to Gemini. Given this initial set of inputs, Gemini then switched to an agentic workflow, coming up with theoretical models and workflows, stopping at key intervals to request additional diagnostic commands be run and their outputs be provided back. It began to ask that specific configuration parameters be changed, each of which was manually vetted for correctness and system stability and interaction risk and then run, with Gemini then asking for the output of various diagnostics to assess each change's impact.

The end result has been an unimaginable improvement in the performance of the existing servers as Gemini was able to identify a number of incredibly creative and counterintuitive tweaks and configuration changes that would not traditionally yield such performance gains but Gemini correctly assessed their outsized impact on our specific load characteristics. The results from this agentic Gemini human-in-the-loop workflow have been so substantial that we are now planning to roll out this workflow across all our system tuning, creating bespoke performance tuning workflows for each of our fleets.